CN111669570B - Multi-angle free view video data processing method and device, medium and equipment - Google Patents

Multi-angle free view video data processing method and device, medium and equipment Download PDF

Info

Publication number
CN111669570B
CN111669570B CN201910173414.7A CN201910173414A CN111669570B CN 111669570 B CN111669570 B CN 111669570B CN 201910173414 A CN201910173414 A CN 201910173414A CN 111669570 B CN111669570 B CN 111669570B
Authority
CN
China
Prior art keywords
image
data
angle
images
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910173414.7A
Other languages
Chinese (zh)
Other versions
CN111669570A (en
Inventor
盛骁杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to CN201910173414.7A priority Critical patent/CN111669570B/en
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to PCT/US2020/021247 priority patent/WO2020181125A1/en
Priority to US16/810,695 priority patent/US11257283B2/en
Priority to US16/810,352 priority patent/US20200288097A1/en
Priority to US16/810,237 priority patent/US11037365B2/en
Priority to US16/810,565 priority patent/US11055901B2/en
Priority to US16/810,362 priority patent/US20200288108A1/en
Priority to PCT/US2020/021220 priority patent/WO2020181104A1/en
Priority to PCT/US2020/021167 priority patent/WO2020181074A1/en
Priority to US16/810,480 priority patent/US20200288098A1/en
Priority to PCT/US2020/021241 priority patent/WO2020181119A1/en
Priority to PCT/US2020/021164 priority patent/WO2020181073A1/en
Priority to PCT/US2020/021231 priority patent/WO2020181112A1/en
Priority to US16/810,586 priority patent/US20200286279A1/en
Priority to PCT/US2020/021141 priority patent/WO2020181065A1/en
Priority to PCT/US2020/021197 priority patent/WO2020181090A1/en
Priority to US16/810,614 priority patent/US20200288099A1/en
Priority to US16/810,634 priority patent/US11341715B2/en
Priority to US16/810,464 priority patent/US11521347B2/en
Priority to US16/810,681 priority patent/US20200288112A1/en
Priority to PCT/US2020/021195 priority patent/WO2020181088A1/en
Priority to PCT/US2020/021252 priority patent/WO2020181128A1/en
Priority to PCT/US2020/021187 priority patent/WO2020181084A1/en
Publication of CN111669570A publication Critical patent/CN111669570A/en
Publication of CN111669570B publication Critical patent/CN111669570B/en
Application granted granted Critical
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/243Image signal generators using stereoscopic image cameras using three or more 2D image sensors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/111Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
    • H04N13/117Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation the virtual viewpoint locations being selected by the viewers or determined by viewer tracking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/139Format conversion, e.g. of frame-rate or size
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/271Image signal generators wherein the generated image signals comprise depth maps or disparity maps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/282Image signal generators for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/21805Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6587Control parameters, e.g. trick play commands, viewpoint selection

Abstract

The embodiment of the invention discloses a multi-angle free view video data processing method and device, a medium and equipment, wherein the multi-angle free view video data processing method comprises the following steps: analyzing the acquired video data to obtain data combinations at different frame moments, wherein the data combinations comprise pixel data and depth data of a plurality of synchronous images, and the synchronous images have different viewing angles of a region to be watched; and for each frame time, based on the data combination, performing image reconstruction of a virtual view point, wherein the virtual view point is selected from a multi-angle free view angle range, and the multi-angle free view angle range is a range supporting view point switching viewing of a region to be viewed. The technical scheme in the embodiment of the invention can support test point switching play in a multi-angle free view angle range.

Description

Multi-angle free view video data processing method and device, medium and equipment
Technical Field
The present invention relates to the field of data processing, and in particular, to a method, an apparatus, a medium, and a device for processing multi-angle free view video data.
Background
In the field of data processing, video data may be received, based on which video is played to a user. The play of such video is typically based on a fixed viewing angle, with the user experience to be enhanced.
Disclosure of Invention
The technical problem solved by the embodiment of the invention is to provide a multi-angle free view video data processing method which supports test point switching play in a multi-angle free view range.
In order to solve the above technical problems, an embodiment of the present invention provides a multi-angle free view video data processing method, including: analyzing the acquired video data to obtain data combinations at different frame moments, wherein the data combinations comprise pixel data and depth data of a plurality of synchronous images, and the synchronous images have different viewing angles of a region to be watched; and for each frame time, based on the data combination, performing image reconstruction of a virtual view point, wherein the virtual view point is selected from a multi-angle free view angle range, and the multi-angle free view angle range is a range supporting view point switching viewing of a region to be viewed.
Optionally, each image in the synchronized multiple images, the depth data is a set of depth values corresponding to pixels of the image one-to-one.
Optionally, each image in the synchronized multiple images is the data obtained by downsampling a depth map, where the depth map is an image with depth value sets corresponding to pixels of the image one by one and arranged according to the pixels of the image.
Optionally, based on the data combination, performing image reconstruction of the virtual viewpoint includes: upsampling the depth data to obtain a depth value set corresponding to pixels of the image one by one; and carrying out image reconstruction of the virtual view point according to the pixel data of the synchronous multiple images and the depth value set.
Optionally, the reconstructing an image of the virtual viewpoint based on the data combination includes: determining parameter data of each image in the synchronous multiple images, wherein the parameter data comprises shooting positions and shooting angle data of the images; determining parameter data of the virtual viewpoint, wherein the parameter data of the virtual viewpoint comprises a virtual viewing position and a virtual viewing angle; determining a plurality of target images among the synchronized plurality of images; for each target image, mapping the depth data to the virtual viewpoint according to the relation between the parameter data of the virtual viewpoint and the parameter data of the image; and generating a reconstructed image according to the depth data mapped to the virtual viewpoint and the pixel data of the target image.
Optionally, determining the plurality of target images in the synchronized plurality of images includes: and selecting a target image from the plurality of images according to the relation between the parameter data of the virtual viewpoint and the parameter data of the images.
Optionally, all images in the synchronized multiple images are used as the target image.
Optionally, the parameters of the image further include internal parameter data, and the internal parameter data includes attribute data of a photographing apparatus of the image.
Optionally, before the image reconstruction of the virtual viewpoint, the method further includes: and receiving the parameter data of the virtual viewpoint.
Optionally, after performing image reconstruction of the virtual viewpoint, the method further includes: and sending the reconstructed image to an image display end.
The embodiment of the invention also provides a multi-angle free view video data processing method, which comprises the following steps: the multi-angle free view video data processing method is adopted to reconstruct images of virtual view points; and playing the video based on the reconstructed images at different frame moments.
Optionally, before the image reconstruction of the virtual viewpoint, the method further includes: and receiving an instruction of a user, and determining the virtual viewpoint according to the instruction of the user.
The embodiment of the invention also provides a multi-angle free view video data processing method, which comprises the following steps: receiving an image reconstructed from a virtual viewpoint, wherein the image reconstruction of the virtual viewpoint adopts the multi-angle free view video data processing method; and playing the video based on the reconstructed images at different frame moments.
Optionally, the reconstructed image is received from an edge computing node.
Optionally, the method further comprises: and sending the parameter data of the virtual viewpoint to an edge computing node.
The embodiment of the invention also provides a multi-angle free view video data processing device, which comprises: the analysis unit is suitable for analyzing the acquired video data to obtain data combinations at different frame moments, wherein the data combinations comprise pixel data and depth data of synchronous multiple images, and the synchronous multiple images have different viewing angles of a region to be watched; and the virtual viewpoint image reconstruction unit is suitable for carrying out image reconstruction of a virtual viewpoint based on the data combination for each frame time, wherein the virtual viewpoint is selected from a multi-angle free view angle range, and the multi-angle free view angle range is a range supporting viewpoint switching and watching of a region to be watched.
The embodiment of the invention also provides a multi-angle free view video data processing device, which comprises: the reconstruction unit is suitable for reconstructing the image of the virtual viewpoint by adopting the multi-angle free view video data processing device; and the playing unit is suitable for playing the video based on the reconstructed images at different frame moments.
The embodiment of the invention also provides a multi-angle free view video data processing device, which comprises: the receiving unit is suitable for receiving the image reconstructed from the virtual view point, and the image reconstruction of the virtual view point is carried out by adopting the multi-angle free view angle video data processing device; and the playing unit is suitable for playing the video based on the reconstructed images at different frame moments.
The embodiment of the invention also provides a computer readable storage medium, on which computer instructions are stored, which execute the steps of the multi-angle freeview video data processing method when running.
The embodiment of the invention also provides an edge computing node, which comprises a memory and a processor, wherein the memory stores computer instructions capable of running on the processor, and the processor executes the steps of the multi-angle freeview video data processing method when running the computer instructions.
The embodiment of the invention also provides a terminal, which comprises a memory and a processor, wherein the memory stores computer instructions capable of running on the processor, and the processor executes the steps of the multi-angle free view video data processing method when running the computer instructions.
The embodiment of the invention also provides mobile equipment, which comprises a communication component, a processor and a display component: the communication component is used for receiving multi-angle free view video data, and the multi-angle free view data comprises the data combination; the processor is used for rendering based on the multi-angle free view video data and generating videos corresponding to different virtual viewpoints; and the display component is used for displaying the videos corresponding to the different virtual viewpoints.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, the acquired video data is analyzed to obtain a plurality of data combinations for different frame moments, the data combinations comprise synchronous pixel data and depth data of a plurality of images, the synchronous images have different viewing angles of the region to be watched, for each frame moment, the image reconstruction of the virtual viewpoint is carried out based on the data combinations, and the video playing is carried out based on the reconstructed images at different frame moments. Therefore, the multi-angle free view video data processing method in the embodiment of the invention can support test point switching play in the range of the multi-angle free view.
Drawings
FIG. 1 is a schematic diagram of a region to be viewed in an embodiment of the invention;
FIG. 2 is a schematic diagram of an arrangement of an acquisition device according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a multi-angle free view display system according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an apparatus display in an embodiment of the invention;
FIG. 5 is a schematic diagram of an apparatus for operating and controlling in accordance with an embodiment of the present invention;
FIG. 6 is a schematic diagram of another embodiment of the present invention for manipulating a device;
FIG. 7 is a schematic diagram of another arrangement of the collection device according to an embodiment of the present invention;
FIG. 8 is a schematic illustration of another manipulation of a device in an embodiment of the invention;
FIG. 9 is a schematic diagram of another device display in an embodiment of the invention;
FIG. 10 is a flow chart of a method of setting up an acquisition device according to an embodiment of the present invention;
FIG. 11 is a schematic view of a multi-angle free view range according to an embodiment of the present invention;
FIG. 12 is a schematic view of another multi-angle range of free viewing angles in an embodiment of the present invention;
FIG. 13 is a schematic view of another multi-angle range of free view according to an embodiment of the present invention;
FIG. 14 is a schematic view of another multi-angle range of free view according to an embodiment of the present invention;
FIG. 15 is a schematic view of another multi-angle range of free viewing angles in an embodiment of the present invention;
FIG. 16 is a schematic diagram of another arrangement of acquisition devices in accordance with an embodiment of the present invention;
FIG. 17 is a schematic diagram of another arrangement of acquisition devices in accordance with an embodiment of the present invention;
FIG. 18 is a schematic diagram of another arrangement of acquisition devices in accordance with an embodiment of the present invention;
FIG. 19 is a flowchart of a multi-angle freeview data generation method according to an embodiment of the present invention;
FIG. 20 is a schematic diagram of distribution positions of pixel data and depth data of a single image according to an embodiment of the present invention;
FIG. 21 is a schematic diagram of distribution positions of pixel data and depth data of another single image according to an embodiment of the present invention;
FIG. 22 is a schematic diagram showing distribution positions of pixel data and depth data of an image according to an embodiment of the present invention;
FIG. 23 is a schematic diagram showing distribution positions of pixel data and depth data of another image according to an embodiment of the present invention;
FIG. 24 is a schematic diagram showing distribution positions of pixel data and depth data of another image according to an embodiment of the present invention;
FIG. 25 is a schematic diagram showing distribution positions of pixel data and depth data of another image according to an embodiment of the present invention;
FIG. 26 is a schematic illustration of image region stitching in an embodiment of the present invention;
FIG. 27 is a schematic view of a structure of a stitched image according to an embodiment of the present invention;
FIG. 28 is a schematic view of another exemplary structure of stitched images according to an embodiment of the present invention;
FIG. 29 is a schematic view of another exemplary structure of stitched images according to an embodiment of the present invention;
FIG. 30 is a schematic view of another exemplary structure of stitched images according to an embodiment of the present invention;
FIG. 31 is a schematic view of another exemplary structure of stitched images according to an embodiment of the present invention;
FIG. 32 is a schematic diagram of another exemplary structure of stitched images according to an embodiment of the present invention;
FIG. 33 is a schematic diagram of pixel data distribution of an image in accordance with an embodiment of the present invention;
FIG. 34 is a schematic diagram of pixel data distribution of another image in accordance with an embodiment of the invention;
FIG. 35 is a schematic diagram of data storage in stitched images according to an embodiment of the present invention;
FIG. 36 is a schematic view of data storage in another stitched image in accordance with an embodiment of the present invention;
FIG. 37 is a flowchart of a multi-angle freeview video data generation method according to an embodiment of the present invention;
FIG. 38 is a flowchart of a multi-angle freeview data processing method according to an embodiment of the present invention;
FIG. 39 is a flowchart of a method of virtual viewpoint image reconstruction in an embodiment of the present invention;
FIG. 40 is a flowchart of a multi-angle free view image data processing method according to an embodiment of the present invention;
FIG. 41 is a flowchart of a multi-angle free view video data processing method according to an embodiment of the present invention;
FIG. 42 is a flowchart of a multi-angle freeview interaction method according to an embodiment of the present invention;
FIG. 43 is a schematic view of another embodiment of the present invention for manipulating a device;
FIG. 44 is a schematic diagram of another device display in an embodiment of the invention;
FIG. 45 is a schematic diagram of another embodiment of the present invention for manipulating a device;
FIG. 46 is a schematic diagram of another device display in an embodiment of the invention;
FIG. 47 is a schematic diagram showing a multi-angle free view video data processing apparatus according to an embodiment of the present invention;
fig. 48 is a schematic diagram of a structure of a virtual viewpoint image reconstruction unit in an embodiment of the present invention;
fig. 49 is a schematic diagram of the structure of another virtual viewpoint image reconstruction unit in the embodiment of the present invention;
FIG. 50 is a schematic diagram illustrating another multi-angle free view video data processing method according to an embodiment of the present invention
Fig. 51 is a schematic structural diagram of another multi-angle freeview video data processing according to an embodiment of the present invention.
FIG. 52 is a schematic diagram of a multi-angle freeview data generation process according to an embodiment of the present invention;
FIG. 53 is a schematic diagram of a multi-camera 6DoF acquisition system in accordance with an embodiment of the present invention;
FIG. 54 is a schematic diagram of the generation and processing of 6DoF video data according to an embodiment of the present invention;
FIG. 55 is a schematic diagram of a header file according to an embodiment of the present invention;
FIG. 56 is a schematic diagram of a user side processing 6DoF video data according to an embodiment of the present invention;
FIG. 57 is a schematic diagram of input and output of reference software in an embodiment of the invention;
FIG. 58 is a schematic diagram of an algorithm architecture of reference software in an embodiment of the present invention.
Detailed Description
As described in the background, in the field of data processing, video data may be received, and video may be played to a user based on the video data. The play of such video is typically based on a fixed viewing angle, with the user experience to be enhanced.
In the embodiment of the invention, the acquired video data is analyzed to obtain a plurality of data combinations of different frame moments, the data combinations comprise synchronous pixel data and depth data of a plurality of images, the synchronous images have different viewing angles of the region to be watched, and for each frame moment, the image reconstruction of the virtual viewpoint is performed based on the data combinations, so that video playing can be performed based on the reconstructed images of different frame moments. And reconstructing the image of the virtual viewpoint, so that an image to be watched in the region to be watched based on the virtual viewpoint can be obtained. Therefore, the multi-angle free view video data processing method can support test point switching video playing in the multi-angle free view range.
In a data processing method capable of supporting user to perform view angle switching, image data is stored in a point cloud mode, and particularly three-dimensional positions and pixel information of all points in an area to be watched are expressed and stored, so that more storage resources are needed. Accordingly, processing the image data in this storage method requires more computing resources. If the data corresponding to different moments is stored in the mode, the data volume is larger, and accordingly, video playing is performed based on the data in the storage mode, so that the requirement of smooth video playing is difficult to meet.
According to the embodiment of the invention, the video is played by switching the viewpoint of video playing, and the image reconstruction is realized according to the data combination based on the reconstructed images at different frame moments. Compared with point cloud data, the data volume required to be processed in the embodiment of the invention is smaller.
In order to make the above objects, features and advantages of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.
As an example of the present invention, applicants disclose the following steps: the first step is acquisition and depth map computation, comprising three main steps, multi-camera video acquisition (Multi-camera Video Capturing), camera inside and outside parameters computation (Camera Parameter Estimation), and depth map computation (Depth Map Calculation), respectively. For multi-camera acquisition, it is desirable that the video acquired by each camera be aligned at the frame level. Referring to FIG. 52 in combination, texture Image (Texture Image), which is a plurality of images synchronized as described below, can be obtained by video capturing with multiple cameras; through Camera inside and outside Parameter calculation, a Camera Parameter (Camera Parameter) can be obtained, including internal Parameter data and external Parameter data described later; by Depth Map calculation, a Depth Map (Depth Map) can be obtained.
In this approach, no special camera, such as a light field camera, is required for video acquisition. Likewise, complex camera calibration before acquisition is not required. The positions of the multiple cameras can be laid out and arranged to better capture objects or scenes that need to be captured. Referring to fig. 53 in combination, a plurality of acquisition devices, for example, cameras 1 to N, may be provided in the region to be viewed.
After the above three steps are completed, a texture map acquired from multiple cameras, all camera parameters, and a depth map for each camera is obtained. These three pieces of data may be referred to as data files in multi-angle freeview video data, and may also be referred to as 6-degree-of-freedom video data (6 DoF video data). Because of this data, the client can generate a virtual viewpoint from the virtual 6 degree of freedom (Degree of Freedom, doF) location, thereby providing a 6DoF video experience.
Referring to fig. 54 in combination, 6DoF video data and indicative data may be compressed and transmitted to a user side, and the user side may obtain a 6DoF representation of the user side, that is, the aforementioned 6DoF video data and metadata, according to the received data. Wherein the indicative data may also be referred to as Metadata (Metadata),
Referring to fig. 55 in combination, metadata may be used to describe a data pattern of 6DoF video data, and may specifically include: stitching pattern metadata (Stitching Pattern metadata) for indicating storage rules for pixel data and depth data for a plurality of images in a stitched image; edge protection metadata (Padding pattern metadata) may be used to indicate the manner in which edge protection is performed in the stitched image, as well as Other metadata (Other metadata). Metadata may be stored in a header file, in a particular order of storage as shown in FIG. 51, or in other orders.
Referring to fig. 56 in combination, the client obtains 6DoF video data, which includes camera parameters, texture map, depth map, and description metadata (metadata), and in addition, interactive behavior data of the client. Through the data, the user side can perform 6DoF Rendering by adopting a Depth Image-Based Rendering (DIBR) mode, so that an Image of a virtual viewpoint is generated at a specific 6DoF position generated according to user behaviors, namely, the virtual viewpoint of the 6DoF position corresponding to the indication is determined according to the user indication.
In one embodiment implemented at test time, each test case contains 20 seconds of video data at a resolution of 30 frames/second, 1920 x 1080. There are a total of 600 frames of data for any one of the 30 cameras. The master folder contains a texture map folder, and a depth map folder. Under the texture map folder, secondary directories from 0 to 599 can be found, which represent 600 frames of content corresponding to 20 seconds of video, respectively. And under each secondary catalog, 30 texture maps acquired by the cameras are contained, named from 0.yuv to 29.yuv in a yuv420 format, and under the folder of the depth maps, each secondary catalog contains 30 depth maps calculated by a depth estimation algorithm. Each depth map corresponds to a texture map under the same name. The texture map and corresponding depth map of the plurality of cameras are all of a certain frame time in the 20 second video.
All depth maps in the test case are generated through a preset depth estimation algorithm. In testing, these depth maps may provide good virtual viewpoint reconstruction quality at virtual 6DoF locations. In one case, the reconstructed image of the virtual viewpoint may be generated directly from the given depth map. Alternatively, the depth map may be generated or modified from the original texture map by a depth calculation algorithm.
In addition to the depth map and texture map, the test case also contains a sfm file, which is a file describing parameters of all 30 cameras. The data of this file is written in binary format, the specific data format being described below. In consideration of adaptability to different cameras, a fisheye camera model with distortion parameters is adopted in the test. Reference may be made to DIBR reference software we provide to see how to read and use camera parameter data from the file. The camera parameter data contains the following fields:
(1) krt_r is the rotation matrix of the camera;
(2) krt_cc is the optical center position of the camera;
(3) krt_WorldPosition is the three-dimensional space coordinates of the camera;
(4) krt_kc is a distortion coefficient of the camera;
(5) src_width is the width of the calibration image;
(6) src_height is the height of the calibration image;
(7) The fisheye_radius and lens_ fov are parameters of the fisheye camera.
In the technical scheme of the invention, a user can find out how to read the codes of the corresponding parameters in the sfm file in detail from a preset parameter reading function (set_sfm_ parameters function).
In DIBR reference software, the camera parameters, texture map, depth map, and the 6DoF position of the virtual camera are received as inputs, while the generated texture map and depth map at the virtual 6DoF position are output. The 6DoF position of the virtual camera is the 6DoF position determined from the user behavior as described above. The DIBR reference software may be software that implements virtual viewpoint-based image reconstruction in embodiments of the present invention.
Referring to fig. 57 in combination, in reference software, camera parameters, texture map, depth map, and virtual camera's 6DoF position are received as inputs, while the generated texture map and the generated depth map at the virtual 6DoF position are output.
Referring to fig. 58 in combination, the software may include the following processing steps: camera selection (Camera selection), forward mapping of depth maps (Forward Projection of Depth map), depth map post-processing (Postprocessing), inverse mapping of Texture maps (Backward projection of Texture map), fusion of multi-Camera mapped Texture maps (Texture Fusion), and hole filling of images (Inpainting).
In the reference software, the two cameras closest to the virtual 6DoF position may be selected by default for virtual viewpoint generation.
In the step of post-processing of the depth map, the quality of the depth map may be improved by various methods, such as foreground edge protection, pixel level filtering, etc.
For the output generated image, a method of capturing texture maps from two cameras and fusing them is used. The fused weights are global weights, determined by the distance of the virtual viewpoint's position from the reference camera position. In the case where a pixel of the output virtual viewpoint image is mapped to only one camera, that mapped pixel may be directly employed as the value of the output pixel.
After the fusion step, if the pixels still having holes are not mapped to, the hole pixels can be filled by using an image filling method.
For the output depth map, a depth map mapped from one of the cameras to a virtual viewpoint position may be employed as the output for convenience of error and analysis.
It is to be understood that the foregoing examples are merely illustrative, and not restrictive, of the specific embodiments, and the technical solutions of the present invention will be further described below.
Referring to fig. 1 in combination, the area to be watched may be a basketball court, and a plurality of acquisition devices may be provided to acquire data from the area to be watched.
For example, referring in conjunction with FIG. 2, the basket may be at a height H above the basket LK Several acquisition devices are arranged along a path, e.g. 6 may be arranged along an arcCollecting devices, i.e. CJ collecting devices 1 To CJ 6 . It will be appreciated that the location, number and manner of support of the collection devices may be varied and are not limited in this regard.
The acquisition device may be a camera or video camera capable of simultaneous photographing, for example, a camera or video camera capable of simultaneous photographing via a hardware synchronization line. And acquiring data of the region to be watched by a plurality of acquisition devices to obtain a plurality of synchronous images or video streams. According to the video streams acquired by the plurality of acquisition devices, a plurality of synchronous frame images can be obtained and used as a plurality of synchronous images. It will be appreciated that ideally, synchronization refers to corresponding to the same time, but that errors and deviations may also be tolerated.
Referring to fig. 3 in combination, in an embodiment of the present invention, data acquisition may be performed for a region to be viewed by an acquisition system 31 comprising a plurality of acquisition devices; the acquired synchronized multiple images may be processed by the acquisition system 31 or by the server 32 to generate multi-angle freeview data capable of supporting virtual viewpoint switching by the display device 33. The display device 33 may present a reconstructed image generated based on the multi-angle freeview data, the reconstructed image corresponding to a virtual viewpoint, and may present reconstructed images corresponding to different virtual viewpoints according to user instructions, switching viewing positions and viewing angles.
In a specific implementation, the image reconstruction is performed, and the process of obtaining the reconstructed image may be performed by the device 33 for displaying, or may be performed by a device located in the content delivery network (Content Delivery Network, CDN) in an edge computing manner. It will be appreciated that fig. 3 is merely an example and is not limiting of the acquisition system, server, display device, and specific implementation. The process of image reconstruction based on the multi-angle freeview data will be described in detail later with reference to fig. 38 to 41, and will not be described again here.
With reference to fig. 4, the previous example is followed in which the user can view the area to be viewed, in this embodiment a basketball court, through the display device. As described above, the viewing position and viewing angle can be switched.
For example, the user may slide on the screen to switch virtual viewpoints. In an embodiment of the present invention, referring to fig. 5 in combination, when a user's finger slides the screen rightward, the virtual viewpoint for viewing can be switched. With continued reference to FIG. 2, the position of the virtual viewpoint before sliding may be VP 1 After the sliding screen switches virtual viewpoints, the position of the virtual viewpoints may be VP 2 . Referring to fig. 6 in combination, after sliding the screen, the reconstructed image of the screen presentation may be as shown in fig. 6. The reconstructed image may be obtained by performing image reconstruction based on multi-angle freeview data generated from data acquired by a plurality of acquisition devices in an actual acquisition context.
It will be appreciated that the image viewed prior to switching may also be a reconstructed image. The reconstructed image may be a frame image in a video stream. The manner of switching the virtual viewpoint according to the user instruction may be varied, and is not limited herein.
In particular implementations, the virtual viewpoint may be represented in coordinates of 6 degrees of freedom (Degree of Freedom, doF), wherein the spatial position of the virtual viewpoint may be represented as (x, y, z), and the viewing angle may be represented as three directions of rotation
The virtual viewpoint is a three-dimensional concept and three-dimensional information is required for generating a reconstructed image. In a specific implementation, the multi-angle freeview data may include depth data for providing third dimensional information outside the planar image. The amount of data for depth data is smaller than for other implementations, such as providing three-dimensional information via point cloud data. The specific implementation of generating the multi-angle freeview data will be described in detail below with reference to fig. 19 to 37, and will not be described again here.
In the embodiment of the invention, the switching of the virtual view points can be performed within a certain range, and the range is the multi-angle free view angle range. That is, the virtual viewpoint position and the viewing angle can be arbitrarily switched within the multi-angle free viewing angle range.
The multi-angle free view angle range is related to the arrangement of the acquisition equipment, and the wider the shooting coverage range of the acquisition equipment is, the larger the multi-angle free view angle range is. The quality of the picture displayed by the display device is related to the number of the acquisition devices, and in general, the more the number of the acquisition devices is set, the less the hole area in the displayed picture is.
Referring to fig. 7, if two rows of collecting devices with different heights are provided on the basketball court, the collecting devices are respectively the collecting devices CJ of the upper row 1 To CJ 6 And a lower row of collection devices CJ 1 To CJ 6 Compared with the arrangement of only one row of acquisition equipment, the multi-angle free view angle range is larger.
Referring to fig. 8 in combination, a user's finger may slide upward, switching virtual viewpoints for viewing. Referring to fig. 9 in combination, after sliding the screen, the image presented by the screen may be as shown in fig. 9.
In a specific implementation, if only one row of acquisition devices is arranged, a certain degree of freedom in the up-down direction can be obtained in the process of reconstructing an image to obtain a reconstructed image, and the range of the multi-angle free view angle is smaller than that of two rows of acquisition devices in the up-down direction.
It will be understood by those skilled in the art that the foregoing embodiments and the corresponding drawings are only illustrative, and are not limited to the setting of the acquisition device and the association relationship between the multi-angle free view ranges, nor to the operation manner and the obtained display effect of the display device. The specific embodiment of performing virtual viewpoint switching viewing on the region to be viewed according to the user instruction will be described in detail later with reference to fig. 43 to 47, and will not be described herein.
The following is further explained with particular reference to the method for setting up the acquisition device.
Fig. 10 is a flowchart of a method for setting an acquisition device according to an embodiment of the present invention, which specifically includes the following steps:
step S101, determining a multi-angle free view angle range, and supporting to switch and watch virtual viewpoints of a region to be watched in the multi-angle free view angle range;
step S102, determining a setting position of the acquisition equipment at least according to the multi-angle free view angle range, wherein the setting position is suitable for setting the acquisition equipment, and carrying out data acquisition on the to-be-watched area.
It will be appreciated by those skilled in the art that a full free view may refer to a 6 degree of freedom view, i.e., a view in which a user may freely switch the spatial position and view of a virtual viewpoint at the device that is being displayed. Wherein the spatial position of the virtual viewpoint can be expressed as (x, y, z), and the viewing angle can be expressed as three rotation directions The 6-degree-of-freedom directions are total, and are therefore called 6-degree-of-freedom viewing angles.
As described above, in the embodiment of the present invention, the switching of the virtual viewpoint may be performed within a certain range, which is the multi-angle free view range. That is, the virtual viewpoint position and the viewing angle can be arbitrarily switched within the multi-angle free viewing angle range.
The multi-angle free view angle range can be determined according to the requirements of application scenes. For example, in some scenarios, the area to be viewed may have a core viewpoint, such as the center of a stage, or the center point of a basketball court, or the basketball court's basketry, or the like. In these scenarios, the multi-angle freeview range may include a planar or stereoscopic region that contains the core viewpoint. It will be appreciated that the area to be viewed may be a point, plane or volumetric area, without limitation.
As described above, the multi-angle free view range may be a variety of areas, and is further illustrated with reference to fig. 11 to 15.
Referring to fig. 11 in combination, the core viewpoint is denoted by O-point, and the multi-angle free view angle range may be a sector area, such as sector area a, which is located in the same plane as the core viewpoint and centered on the core viewpoint 1 OA 2 Or sector area B 1 OB 2 Or may be a circular surface centered on the O point.
Sector area A with multi-angle free view angle range 1 OA 2 For example, the position of the virtual viewpoint may be continuously switched within the region, e.g., from A 1 Along arc line section A 1 A 2 Continuous switch to A 2 Alternatively, it may follow an arc segment L 1 L 2 Switching is performed or otherwise switching of positions is performed within the multi-angle free view range. Accordingly, the viewing angle of the virtual viewpoint can also be transformed within the region.
With further reference to fig. 12, the core viewpoint may be the center point E of the basketball court, and the multi-angle free view range may be a sector area, such as sector area F, centered on the center point E and located in the same plane as the center point E 121 EF 122 . The center point E of the basketball court may be located on the ground or, alternatively, the center point E of the basketball court may be located at a distance from the ground. Arc end point F of sector area 121 And arc end point F 122 May be the same, e.g. height H in the figure 121
Referring to fig. 13 in combination, the core viewpoint is represented by point O, and the multi-angle freeview range may be a portion of a sphere centered on the core viewpoint, e.g., region C 1 C 2 C 3 C 4 Illustrating a partial region of a sphere, the multi-angle free view range may be region C 1 C 2 C 3 C 4 A stereoscopic range formed with point O. Any point within this range can be used as the position of the virtual viewpoint.
With further reference to fig. 14, the core viewpoint may be the center point E of the basketball court, and the multi-angle viewing range may be a portion of a sphere centered on the center point E, such as region F 131 F 132 F 133 F 134 Illustrating a partial region of the sphere, the multi-angle free view range may be region F 131 F 132 F 133 F 134 A stereoscopic range formed with the center point E.
In the scene with the core viewpoint, the positions of the core viewpoints may be various, and the multi-angle free view angle ranges may be various, which are not listed here. It will be appreciated that the various embodiments described above are merely examples and are not limiting of the range of multi-angle free viewing angles, and that the shapes shown therein are not limiting of the actual scene and application.
In a specific implementation, the core view point may be determined according to a scene, or there may be multiple core view points in a shooting scene, and the multi-angle free view angle range may be a superposition of multiple sub-ranges.
In other applications, the multi-angle freeview range may also be coreless, for example, in some applications it may be desirable to provide multi-angle freeview viewing of ancient buildings, or to provide multi-angle freeview viewing of paintings. Accordingly, the multi-angle free view range may be determined according to the needs of these scenes.
It is understood that the shape of the free view angle range may be arbitrary, and any point in the multi-angle free view angle range may be used as a position.
Referring to fig. 15, the multi-angle free view range may be a cube D 1 D 2 D 3 D 4 D 5 D 6 D 7 D 8 The area to be watched is the surface D 1 D 2 D 3 D 4 Cube D 1 D 2 D 3 D 4 D 5 D 6 D 7 D 8 Any point in the virtual viewpoint may be used as the position of the virtual viewpoint, and the viewing angle of the virtual viewpoint, that is, the viewing angle, may be various. For example, can be on the surface D 5 D 6 D 7 D 8 Selecting position E 6 Edge E 6 D 1 Can also be viewed along E 6 D 9 Point D 9 Selected from the area to be viewed.
In a specific implementation, after the multi-angle free view range is determined, the position of the acquisition device may be determined according to the multi-angle free view range.
Specifically, the setting position of the acquisition device may be selected within the multi-angle free view range, for example, the setting position of the acquisition device may be determined in a boundary point of the multi-angle free view range.
Referring to fig. 16 in combination, the core viewpoint may be the center point E of the basketball court, and the multi-angle free view range may be a sector area, such as sector area F, centered on the center point E and located in the same plane as the center point E 61 EF 62 . The acquisition device may be disposed within a multi-angle viewing range, e.g., along arc F 65 F 66 And (5) setting. The areas not covered by the acquisition equipment can be reconstructed by using an algorithm. In particular implementations, the acquisition device may also follow an arc F 61 F 62 Setting, and setting acquisition equipment at the end points of the arc lines to improve the quality of the reconstructed image. Each acquisition device may be positioned toward the center point E of the basketball court. The position of the acquisition device may be represented by spatial position coordinates and the orientation may be represented by three directions of rotation.
In a specific implementation, the number of settable setting positions may be 2 or more, and correspondingly, 2 or more collecting devices may be set. The number of acquisition devices may be determined based on the quality requirements of the reconstructed image or video. The number of acquisition devices may be greater in scenes where the picture quality requirements for the reconstructed image or video are higher, and may be smaller in scenes where the picture quality requirements for the reconstructed image or video are lower.
With continued reference to fig. 16, it will be appreciated that if a higher quality of the reconstructed image or video frame is sought, the voids in the reconstructed frame are reduced, and the arc F may be followed 61 F 62 A greater number of acquisition devices may be provided, for example 40 cameras may be provided.
Referring to fig. 17 in combination, the core viewpoint may be the center point E of the basketball court, and the multi-angle viewing angle range may be a portion of a sphere centered on the center point E, such as region F 61 F 62 F 63 F 64 Illustrating a partial region of the sphere, the multi-angle free view range may be region F 61 F 62 F 63 F 64 A stereoscopic range formed with the center point E. The acquisition device may be disposed within a multi-angle viewing range, e.g., along arc F 65 F 66 And arc F 67 F 68 And (5) setting. Similar to the previous example, the areas not covered by the acquisition device may be reconstructed using an algorithm. In particular implementations, the acquisition device may also follow an arc F 61 F 62 Arc F 63 F 64 Setting, and setting acquisition equipment at the end points of the arc lines to improve the quality of the reconstructed image.
Each acquisition device may be positioned toward the center point E of the basketball court. It will be appreciated that although not shown, the number of acquisition devices may be along arc F 61 F 62 Arc F 63 F 64 Is a combination of the above.
As previously described, in some application scenarios, the region to be viewed may include a core viewpoint, and the multi-angle freeview range includes a region where the view points to the core viewpoint. In such an application scenario, the setting position of the acquisition device may be selected from an arc-shaped area with the concave direction pointing towards the core viewpoint.
When the region to be watched comprises a core viewpoint, the setting position is selected in an arc-shaped region pointing to the core viewpoint in the concave direction, so that the acquisition equipment is distributed in an arc shape. Because the viewing area includes the core viewpoint, the viewing angle is directed toward the core viewpoint, in this scenario, the arcuately arranged acquisition devices may employ fewer acquisition devices, covering a larger multi-angle free viewing angle range.
In a specific implementation, the setting position of the acquisition device may be determined in combination with the viewing angle range and the boundary shape of the region to be viewed. For example, the setting position of the acquisition device may be determined at preset intervals along the boundary of the region to be viewed within the viewing angle range.
Referring to fig. 18 in combination, the multi-angle view range may be coreless, e.g., the virtual viewpoint position may be selected from hexahedron F 81 F 82 F 83 F 84 F 85 F 86 F 87 F 88 And watching the area to be watched from the virtual viewpoint position. The boundary of the area to be watched may be a ground boundary line of a court. The acquisition device can follow the intersection line B of the ground boundary line and the region to be watched 89 B 94 Arrangement, for example, can be at position B 89 To position B 94 6 acquisition devices were set. The degree of freedom in the up-down direction can be realized by an algorithm, or can be an intersection line B at a horizontal projection position 89 B 94 A row of collection devices is further arranged.
In a specific implementation, the multi-angle free view angle range may also support viewing the region to be viewed from an upper side of the region to be viewed, where the upper side is a direction away from the horizontal plane.
Correspondingly, the unmanned aerial vehicle can be used for carrying the acquisition equipment, so that the acquisition equipment is arranged on the upper side of the area to be watched, the acquisition equipment can also be arranged on the top of the building where the area to be watched is located, and the top is a structural body of the building in the direction away from the horizontal plane.
For example, the collection device may be provided on top of a basketball venue or may be carried by an unmanned aerial vehicle to hover over the basketball venue. The collection device can be arranged at the top of a venue where a stage is located, or can be carried by an unmanned aerial vehicle.
By arranging the acquisition device on the upper side of the region to be viewed, the multi-angle free view angle range can be made to include the view angle above the region to be viewed.
In a specific implementation, the acquisition device may be a camera or video camera, and the acquired data may be picture or video data.
It will be appreciated that the manner in which the acquisition device is positioned at the setting location may be varied, for example, by being supported by a support frame at the setting location, or other arrangements may be possible.
In addition, it is to be understood that the foregoing embodiments are merely illustrative and not restrictive of the manner in which the collection device may be configured. In various application scenes, the specific implementation modes of determining the setting position of the acquisition equipment and setting the acquisition equipment to acquire according to the multi-angle free view angle range are all within the protection scope of the invention.
The method of generating multi-angle freeview data is described further below with particular reference thereto.
As described above, with continued reference to fig. 3, the acquired synchronized multiple images may be processed by the acquisition system 31 or by the server 32 to generate multi-angle freeview data capable of supporting virtual viewpoint switching by the display device 33, and the multi-angle freeview data may indicate third dimensional information outside the two-dimensional image through the depth data.
Specifically, referring to fig. 19 in combination, generating the multi-angle freeview data may include the steps of:
step S191, acquiring a plurality of synchronous images, wherein the shooting angles of the plurality of images are different;
step S192 of determining depth data of each image based on the plurality of images;
step S193 stores pixel data of each image into a first field and stores the depth data into at least one second field associated with the first field for each of the images.
The synchronized plurality of images may be images captured by a camera or frame images in video data captured by a camera. In generating the multi-angle freeview data, depth data for each image may be determined based on the plurality of images.
Wherein the depth data may comprise depth values corresponding to pixels of the image. The distance of the acquisition device to the various points in the region to be viewed can be taken as the above-mentioned depth value, which can directly reflect the geometry of the visible surface in the region to be viewed. The depth value may be a distance from each point in the region to be viewed to the optical center along the optical axis of the camera, and the origin of the camera coordinate system may be the optical center. It will be appreciated by those skilled in the art that the distance may be a relative value, with the plurality of images being on the same basis.
Further, the depth data may include depth values that are in one-to-one correspondence with pixels of the image, or may be a partial value selected from a set of depth values that are in one-to-one correspondence with pixels of the image.
It will be appreciated by those skilled in the art that the depth value set may be stored in the form of a depth map, and in a specific implementation, the depth data may be data obtained by downsampling an original depth map, and the depth value set corresponding to pixels of the image one by one is stored in the form of the original depth map according to the pixel arrangement of the image.
In a specific implementation, the pixel data of the image stored in the first field may be raw image data, such as data obtained from an acquisition device, or may be data obtained by reducing the resolution of the raw image data. Further, the pixel data of the image may be original pixel data of the image or reduced resolution pixel data. The pixel data of the image may be any of YUV data and RGB data, or may be other data capable of expressing the image.
In a specific implementation, the depth data stored into the second field may be the same or different in number of pixels corresponding to the pixel data of the image stored into the first field. The number can be determined according to the bandwidth limitation of data transmission with the device side processing the multi-angle free view image data, and if the bandwidth is smaller, the data volume can be reduced by the mode of downsampling or reducing resolution, etc.
In a specific implementation, for each image, the pixel data of the image may be sequentially stored in a plurality of fields according to a preset sequence, and these fields may be continuous or may be distributed at intervals from the second field. A field storing pixel data of an image may be used as the first field. The following examples are illustrative.
Referring to fig. 20, pixel data of one image, which is illustrated as pixels 1 to 6 in the figure, and other pixels not shown, may be stored in a predetermined order in a plurality of consecutive fields, which may be used as the first field; the depth data corresponding to the image, which is illustrated by the depth values 1 to 6 in the figure, and other depth values not shown, may be stored in a predetermined order in a plurality of consecutive fields, which may be used as the second fields. The preset sequence may be sequentially stored row by row according to the distribution position of the image pixels, or may be other sequences.
Referring to fig. 21, pixel data of one image and corresponding depth values may be alternately stored in a plurality of fields. The plurality of fields of pixel data may be stored as a first field, and the plurality of fields of depth values may be stored as a second field.
In implementations, storing the depth data may be performed in the same order as storing the pixel data of the image such that each of the first fields may be associated with each of the second fields. And further, the depth value corresponding to each pixel can be embodied.
In implementations, pixel data as well as depth data for multiple images may be stored in a variety of ways. The following examples are further described.
Referring to fig. 22 in combination, each pixel of image 1, illustrated as image 1 pixel 1, image 1 pixel 2, and other pixels not shown, may be stored in a continuous field, which may be the first field. The depth data of image 1, illustrated by image 1 depth value 1, image 1 depth value 2, and other depth data not shown in the figure, may be stored in fields adjacent to the first field, which may be used as the second field. Similarly, for pixel data of image 2, it may be stored in a first field, and depth data of image 2 may be stored in an adjacent second field.
It can be understood that each image in the image stream continuously acquired by one acquisition device of the synchronous multiple acquisition devices, or each frame image in the video stream, can be respectively used as the image 1; similarly, among the plurality of synchronous acquisition devices, an image acquired in synchronization with the image 1 may be taken as the image 2. The acquisition device may be an acquisition device as in fig. 2, or an acquisition device in other scenarios.
Referring to fig. 23 in combination, pixel data of image 1 and pixel data of image 2 may be stored in a plurality of adjacent first fields, and depth data of image 1 and depth data of image 2 may be stored in a plurality of adjacent second fields.
Referring to fig. 24 in combination, pixel data of each of a plurality of images may be stored in a plurality of fields, respectively, which may be the first fields. The fields storing pixel data may be interleaved with the fields storing depth values.
Referring to fig. 25, pixel data and depth values of different images may also be arranged in a cross manner, for example, image 1 pixel 1, image 1 depth value 1, image 2 pixel 1, and image 2 depth value 1 … may be sequentially stored until the pixel data and depth values corresponding to the first pixel of each of the plurality of images are completed, and adjacent fields thereof store image 1 pixel 2, image 1 depth value 2, image 2 pixel 2, and image 2 depth value 2 … until the pixel data and depth data of each image are stored.
In summary, the field storing the pixel data of each image may be the first field, and the field storing the depth data of the image may be the second field. For each image, a first field and a second field associated with the first field may be stored.
It will be appreciated by those skilled in the art that the various embodiments described above are merely examples and are not specific limitations on the type, size, and arrangement of fields.
Referring to fig. 3 in combination, the multi-angle freeview data including the first field and the second field may be stored in a server 32 in the cloud, transmitted to the CDN or to a device 33 for display, and image reconstruction is performed.
In a specific implementation, the first field and the second field may be pixel fields in a stitched image, where the stitched image is used to store pixel data of the multiple images and the depth data. By adopting the image format to store the data, the data volume can be reduced, the duration of data transmission can be reduced, and the occupation of resources can be reduced.
The stitched image may be an image in a variety of formats, such as BMP format, JPEG format, PNG format, and the like. These image formats may be compressed formats or may be uncompressed formats. Those skilled in the art will appreciate that images in various formats may include fields corresponding to individual pixels, referred to as pixel fields. The size of the stitched image, that is, the parameters such as the number of pixels and the aspect ratio of the stitched image, can be determined according to the needs, specifically, the number of synchronous multiple images, the data amount to be stored in each image, the data amount of depth data to be stored in each image, and other factors.
In particular implementations, in a synchronized plurality of images, the depth data corresponding to the pixels of each image and the number of bits of pixel data may be associated with the format of the stitched image.
For example, when the format of the stitched image is BMP format, the depth value may range from 0 to 255, which is 8bit data that may be stored as a gray value in the stitched image; alternatively, the depth value may be a 16bit data, and may be stored as a gray value at two pixel positions in the stitched image, or stored in two channels at one pixel position in the stitched image.
When the format of the stitched image is PNG format, the depth value may also be 8bit or 16bit data, and in PNG format, the 16bit depth value may be stored as a gray value at one pixel position in the stitched image.
It should be understood that the above embodiments are not limited to the storage manner or the number of data bits, and that other data storage manners that can be implemented by those skilled in the art fall within the scope of the present invention.
In a specific implementation, the stitched image may be divided into an image area and a depth map area, where a pixel field of the image area stores pixel data of the plurality of images, and a pixel field of the depth map area stores depth data of the plurality of images; the pixel field of the pixel data of each image is stored in the image area as the first field, and the pixel field of the depth data of each image is stored in the depth map area as the second field.
In one embodiment, the image region may be a continuous region, and the depth map region may be a continuous region.
Further, in a specific implementation, the stitched image may be equally divided, and the equally divided two portions may be respectively used as the image area and the depth map area. Alternatively, the stitched image may be divided in a non-halving manner according to the pixel data amount and the depth data amount of the image to be stored
For example, referring to fig. 26, if one pixel is illustrated in each minimum square, the image area may be an area 1 within the dashed frame, that is, an upper half area of the stitched image after being divided into upper and lower parts, and a lower half area of the stitched image may be a depth map area.
It will be appreciated that fig. 26 is merely illustrative, and that the minimum number of tiles therein is not a limitation on the number of pixels of the stitched image. The dividing method may be to divide the spliced image into left and right.
In a specific implementation, the image area may include a plurality of image sub-areas, each for storing one of the plurality of images, and a pixel field of each image sub-area may be used as the first field; accordingly, the depth map region may include a plurality of depth map sub-regions, each for storing depth data of one of the plurality of images, and a pixel field of each of the depth map sub-regions may be a second field.
The number of image sub-regions and the number of depth map sub-regions may be equal, and are equal to the number of synchronized multiple images. In other words, the number of cameras may be equal to the number of cameras described above.
The split images will be further described with reference to fig. 27 by continuing to divide the split images into upper and lower parts. In fig. 27, the upper half of the stitched image is an image area, and is divided into 8 image sub-areas, and the pixel data of 8 synchronized images are stored respectively, and the shooting angles of each image are different, that is, the viewing angles are different. The lower half part of the spliced image is a depth map area, divided into 8 depth map subareas, and the depth maps of the 8 images are respectively stored.
In connection with the above, the pixel data of the 8 images synchronized, that is, the view 1 image to the view 8 image, may be the original image obtained from the camera, or may be the reduced resolution image of the original image. The depth data is stored in a partial region of the stitched image, which may also be referred to as a depth map.
As described above, in a specific implementation, the stitched image may also be divided in a non-halving manner. For example, referring to fig. 28, the number of pixels occupied by the depth data may be less than the number of pixels occupied by the pixel data of the image, and the image region and the depth map region may be of different sizes. For example, the depth data may be obtained by performing quarter downsampling on the depth map, and then the division manner shown in fig. 28 may be adopted. The number of pixels occupied by the depth map may also be greater than the detailed number of pixel data occupied by the image.
It will be appreciated that fig. 28 is not limited to dividing the stitched image in a non-equal manner, and in a specific implementation, the pixel amount and aspect ratio of the stitched image may be varied, and the dividing manner may be varied.
In a specific implementation, the image region or the depth map region may also include a plurality of regions. For example, as shown in fig. 29, the image region may be one continuous region and the depth map region may include two continuous regions.
Alternatively, referring to fig. 30 and 31, the image region may include two consecutive regions, and the depth map region may also include two consecutive regions. The image areas and the depth areas may be arranged at intervals.
Still alternatively, referring to fig. 32, the image sub-regions included in the image region may be arranged at intervals from the depth map sub-regions included in the depth map region. The image region may include a number of consecutive regions equal to the image sub-region, and the depth map region may include a number of consecutive regions equal to the depth map sub-region.
In a specific implementation, for each image, the pixel data may be stored in the image sub-regions according to the order of arrangement of the pixel points. For the depth data of each image, the depth data may be stored in the depth map sub-area according to the arrangement order of the pixel points.
Referring to fig. 33 to 35 in combination, image 1 is illustrated with 9 pixels in fig. 33, image 2 is illustrated with 9 pixels in fig. 34, and image 1 and image 2 are two images of different angles that are synchronized. From image 1 and image 2, depth data corresponding to image 1 may be obtained, including image 1 depth value 1 through image 1 depth value 9, and depth data corresponding to image 2 may be obtained, including image 2 depth value 1 through image 2 depth value 9.
Referring to fig. 35, in storing the image 1 to the image subregion, the image 1 may be stored to the upper left image subregion in the order of arrangement of the pixel points, that is, in the image subregion, the arrangement of the pixel points may be the same as that of the image 1. Image 2 is stored to the image subregion, which can likewise be the image subregion stored to the upper right in this way.
Similarly, the depth data of image 1 may be stored to the depth map sub-area in a similar manner, and in the case where the depth values correspond one-to-one to the pixel values of the image, may be stored in a manner as shown in fig. 35. If the depth value is obtained by downsampling the original depth map, the depth value can be stored in the depth map subarea according to the pixel point arrangement sequence of the depth map obtained by downsampling.
As will be appreciated by those skilled in the art, the compression rate at which an image is compressed is related to the association of individual pixels in the image, the stronger the association the higher the compression rate. Because the shot image corresponds to the real world, the relevance of each pixel point is strong, and the pixel data and the depth data of the image are stored according to the arrangement sequence of the pixel points, the compression rate is higher when the spliced image is compressed, namely, the compressed data size is smaller under the condition that the data size before compression is the same.
By dividing the spliced image into an image area and a depth map area, when a plurality of image subareas are adjacent in the image area or a plurality of depth map subareas are adjacent in the depth map area, the data stored in each image subarea are obtained by shooting images or frame images in videos of the area to be watched at different angles, and the depth maps are stored in the depth map area, so that higher compression rate can be obtained when the spliced image is compressed.
In a specific implementation, all or part of the image sub-region and the depth map sub-region may be edge protected. The form of edge protection may be varied, for example, taking the view angle 1 depth map in fig. 31 as an example, redundant pixels may be disposed at the periphery of the original view angle 1 depth map; or the number of pixels of the original view angle 1 depth map is kept unchanged, redundant pixels which do not store actual pixel data are reserved on the periphery, and the original view angle 1 depth map is reduced and then stored in other pixels; or may be otherwise such that eventually redundant pixels are left between the view 1 depth map and other images surrounding it.
Because the spliced image comprises a plurality of images and depth maps, the adjacent boundaries of the images have poor relevance, and the quality loss of the images and the depth maps in the spliced image can be reduced when the spliced image is compressed by performing edge protection.
In implementations, pixel fields of the image sub-regions may store three channel data, and pixel fields of the depth map sub-regions may store single channel data. The pixel field of the image sub-region is used to store pixel data, typically three-channel data, such as RGB data or YUV data, for any one of a plurality of synchronized images.
The depth map sub-region is used for storing depth data of an image, and can be stored by adopting a single channel of a pixel field if the depth value is 8-bit binary data, and can be stored by adopting a double channel of the pixel field if the depth value is 16-bit binary data. Alternatively, the depth value may be stored in a larger pixel area. For example, if the synchronized images are 1920×1080 images and the depth value is 16-bit binary data, the depth value may be stored in 1920×1080 image areas 2 times, and each image area is stored as a single channel. The stitched image may also be partitioned in connection with the specific storage mode.
The uncompressed data amount of the stitched image is stored in a manner that each channel of each pixel occupies 8 bits, and can be calculated according to the following formula: the number of synchronized images (data amount of pixel data of image+data amount of depth map).
If the original image is 1080P resolution, i.e. 1920×1080 pixels, the original depth map may occupy 1920×1080 pixels, which is a single channel. The amount of pixel data of the original image is: 1920×1080×3 bits, the data size of the original depth map is 1920×1080×8 bits, if the number of cameras is 30, the pixel data size of the stitched image is 30×1920 (1920×1080×8×3+1920×1080×8 bits), which is about 237M, and if the image is not compressed, the image occupies more system resources and has larger delay. Especially in the case of smaller bandwidth, for example, 1Mbps, about 237s is needed for transmission of an uncompressed spliced image, which is poor in real-time performance and needs to be improved.
The data amount of the spliced image can be reduced by regularly storing to obtain a higher compression rate, reducing the resolution of the original image, or taking the pixel data after the resolution reduction as the pixel data of the image, or performing one or more modes such as downsampling on one or more of the original depth maps.
For example, if the resolution of the original image is 4K, i.e., 4096×2160 pixels, and downsampled to 540P, i.e., 960×540 pixels, the number of pixels of the stitched image is about one sixteenth before downsampling. Any one or more of the other ways of reducing the amount of data described above may be combined such that the amount of data is less.
It can be understood that if the bandwidth is supported, and the decoding capability of the device performing the data processing can support the stitched image with higher resolution, the stitched image with higher resolution can also be generated, so as to improve the image quality.
It will be appreciated by those skilled in the art that in different application scenarios, the pixel data and depth data of the synchronized multiple images may also be stored in other ways, for example, in pixel units to the stitched image. Referring to fig. 33, 34 and 36, for the images 1 and 2 shown in fig. 33 and 34, it is possible to store them in the spliced image in the manner of fig. 36.
In summary, the pixel data and the depth data of the image may be stored in the stitched image, and the stitched image may be divided into the image area and the depth map area in various manners, or may not be divided, and the pixel data and the depth data of the image may be stored in a predetermined order.
In an implementation, the synchronized plurality of images may also be synchronized plurality of frame images resulting from decoding the plurality of videos. The video may be acquired by a plurality of cameras, which may be set identically or similarly to the cameras that acquired the images in the foregoing.
In a specific implementation, the generating of the multi-angle freeview image data may further include generating an association field, which may indicate an association of the first field with the at least one second field. The first field stores pixel data of one image of the synchronized multiple images, and the second field stores depth data corresponding to the image, which correspond to the same shooting angle, that is, the same viewing angle. The association between the two can be described by an association field.
Taking fig. 27 as an example, in fig. 27, the area for storing the view 1 image to the view 8 image is 8 first fields, the area for storing the view 1 depth map to the view 8 depth map is 8 second fields, and for the first field for storing the view 1 image and the second field for storing the view 1 depth map, there is an association relationship, and similarly, there is an association relationship between the fields for storing the view 2 image and the view 2 depth map.
The association field may indicate the association between the first field and the second field of each of the synchronized multiple images in multiple manners, and may specifically be a content storage rule of pixel data and depth data of the synchronized multiple images, that is, indicate the association between the first field and the second field by indicating the storage manner described in the foregoing.
In a specific implementation, the association field may only include different mode numbers, and the device performing data processing may learn, according to the mode number of the field and the data stored in the device performing data processing, a storage manner of the pixel data and the depth data in the acquired multi-angle freeview image data. For example, if the received pattern number is 1, the storage method is analyzed as follows: the spliced image is equally divided into an upper region and a lower region, the upper region is an image region, the lower region is a depth map region, and an image of a certain position of the upper region is associated with a depth map stored in a position corresponding to the lower region.
It can be understood that, in the foregoing embodiments, the manner of storing the stitched image, for example, the storage manner illustrated in fig. 27 to 36, may have a corresponding description of the association field, so that the device performing data processing may obtain the associated image and depth data according to the association field.
As described above, the picture format of the stitched image may be any of image formats such as BMP, PNG, JPEG, webp, or may be other image formats. The storage manner of the pixel data and the depth data in the multi-angle freeview image data is not limited to the manner of stitching the images. The information can be stored in various modes, and can also be provided with corresponding association field description.
Similarly, the storage mode may be indicated by a pattern number. For example, in the storage method shown in fig. 23, the association relationship field may store the pattern number 2, and after the data processing device reads the pattern number, the pixel data of the synchronized multiple images may be sequentially stored, and the lengths of the first field and the second field may be analyzed, and after the storage of the multiple first fields is completed, the depth data of each image may be stored in the same storage order as the images. The device for data processing can further determine the association relationship between the pixel data and the depth data of the image according to the association relationship field.
It is understood that the storage modes of the pixel data and the depth data of the synchronous multiple images may be various, and the expression modes of the association relationship fields may be various. The content may be indicated by the mode number described above, or may be indicated directly. The device performing data processing may determine the association between the pixel data of the image and the depth data according to the content of the association field, in combination with stored data or other a priori knowledge, for example, the content corresponding to each mode number, or a specific number of synchronized multiple images.
In an implementation, the generating of the multi-angle freeview image data may further include: parameter data of each image is calculated and stored based on the synchronized plurality of images, the parameter data including a photographing position and photographing angle data of the image.
In combination with the shooting position and shooting angle of each of the synchronous multiple images, the device for processing the data can determine the virtual viewpoint in the same coordinate system with the device according to the needs of the user, reconstruct the image based on the multi-angle free view angle image data, and show the expected viewing position and view angle for the user.
In a specific implementation, the parameter data may further include internal parameter data including attribute data of a photographing apparatus of the image. The foregoing image capturing position and capturing angle data may also be referred to as external parameter data, and the internal parameter data and the external parameter data may be referred to as pose data. By combining the internal parameter data and the external parameter data, the factors indicated by the internal parameter data such as lens distortion and the like can be considered in image reconstruction, and further the image of the virtual viewpoint can be reconstructed more accurately.
In an implementation, the generating of the multi-angle freeview image data may further include: a parameter data storage address field is generated, the parameter data storage address field being used to indicate a storage address of the parameter data. The device performing the data processing may acquire the parameter data from the storage address of the parameter data.
In an implementation, the generating of the multi-angle freeview image data may further include: a data combination memory address field is generated for indicating a memory address of the data combination, i.e. a memory address of a first field and a second field of each of the synchronized plurality of images. The device performing the data processing may acquire the pixel data and the depth data of the synchronized plurality of images from the storage space corresponding to the storage address of the data combination, from which point of view the data combination includes the pixel data and the depth data of the synchronized plurality of images.
It will be appreciated that the multi-angle freeview image data may include specific data such as pixel data, depth data, and parameter data of the image, and other indicative data, such as the aforementioned generated association field, parameter data storage address field, data combination storage address field, and the like. These indicative data may be stored in a header file to instruct the device performing the data processing to acquire the data combination, as well as parameter data, etc.
In a specific implementation, the term explanation, specific implementation and beneficial effects involved in the various embodiments of generating the multi-angle freeview data may refer to other embodiments, and various specific implementations in the multi-angle freeview interaction method may be implemented in combination with other embodiments.
The multi-angle freeview data may be multi-angle freeview video data, and is further described below with particular reference to a method of generating multi-angle freeview video data.
Referring to fig. 37 in combination, the multi-angle freeview video data generating method may include the steps of:
step S371, acquiring a plurality of videos with synchronous frames, wherein the shooting angles of the plurality of videos are different;
step S372, analyzing each video to obtain an image combination of a plurality of frame moments, wherein the image combination comprises a plurality of frame images with synchronous frames;
step S373, determining depth data of each frame image in the image combination based on the image combination of each frame time in the plurality of frame times;
step S374, generating a spliced image corresponding to each frame time, wherein the spliced image comprises a first field for storing pixel data of each frame image in the image combination and a second field for storing depth data of each frame image in the image combination;
step S375, generating video data based on a plurality of the stitched images.
In this embodiment, the capturing device may be a camera, and a plurality of videos in frame synchronization may be acquired by a plurality of cameras. Each video includes a plurality of frame images at a plurality of frame moments, and a plurality of image combinations may correspond to different frame moments, respectively, each image combination including a plurality of frame images in frame synchronization.
In a specific implementation, depth data of each frame image in the image combination is determined based on the image combination of each frame time in the plurality of frame times.
With the foregoing embodiment, if the frame image in the original video is 1080P resolution, i.e. 1920×1080 pixels, the original depth map may occupy 1920×1080 pixels, which is a single channel. The amount of pixel data of the original image is: 1920×1080×3 bits, the data size of the original depth map is 1920×1080×8 bits, if the number of cameras is 30, the pixel data size of the stitched image is 30×1920 (1920×1080×8×3+1920×1080×8 bits), which is about 237M, and if the image is not compressed, the image occupies more system resources and has larger delay. Particularly, when the bandwidth is small, for example, when the bandwidth is 1Mbps, an uncompressed stitched image needs about 237s to be transmitted, and if the original stitched image is transmitted at the frame rate, it is difficult to realize real-time video playing.
By regular storage, higher compression rate can be obtained when compression of the video format is performed, or resolution of the original image is reduced, pixel data after resolution reduction is used as pixel data of the image, or one or more of the original depth maps are downsampled, or one or more of modes such as video compression code rate is improved, so that the data size of the spliced image can be reduced.
For example, if the resolution of the frame image in the original video, i.e., the acquired multiple videos, is 4K, i.e., 4096×2160 pixels, and downsampled to 540P, i.e., 960×540 pixels, the number of pixels of the stitched image is about one sixteenth before downsampling. Any one or more of the other ways of reducing the amount of data described above may be combined such that the amount of data is less.
It can be understood that if the bandwidth is supported, and the decoding capability of the device performing the data processing can support the stitched image with higher resolution, the stitched image with higher resolution can also be generated, so as to improve the image quality.
In a specific implementation, the video data is generated based on a plurality of the stitched images, may be generated based on all or part of the stitched images, and may be specifically determined according to a frame rate of a video to be generated and a frame rate of an acquired video, or may be determined according to a bandwidth of communication with a data processing device.
In a specific implementation, the video data is generated based on a plurality of the stitched images, and the video data may be generated by encoding and packaging the plurality of stitched images in the sequence of frame moments.
Specifically, the encapsulation format may be any one of AVI, quickTime File Format, MPEG, WMV, real Video, flash Video, matroska, or other encapsulation formats, and the encoding format may be an h.261, h.263, h.264, h.265, MPEG, AVS, or other encoding formats.
In a specific implementation, the generating of the multi-angle freeview image data may further include generating an association field, which may indicate an association of the first field with the at least one second field. The first field stores pixel data of one image of the synchronized multiple images, and the second field stores depth data corresponding to the image, which correspond to the same shooting angle, that is, the same viewing angle.
In an implementation, the generating of the multi-angle freeview video data may further include: parameter data of each frame image is calculated and stored based on the synchronized plurality of frame images, the parameter data including a photographing position and photographing angle data of the frame image.
In a specific implementation, the plurality of frame images that are frame-synchronized in the image combination at different moments in the plurality of synchronized videos may correspond to the same parameter data, and the parameter data may be calculated in any one group of image combinations.
In an implementation, the generating of the multi-angle freeview range image data may further include: a parameter data storage address field is generated, the parameter data storage address field being used to indicate a storage address of the parameter data. The device performing the data processing may acquire the parameter data from the storage address of the parameter data.
In an implementation, the generating of the multi-angle freeview range image data may further include: a video data storage address field is generated, the video image storage address field being used to indicate a storage address of the generated video data.
It will be appreciated that the multi-angle freeview video data may include generated video data, as well as other indicative data, such as the aforementioned generated association field, parameter data storage address field, video data storage address field, etc. These indicative data may be stored in a header file to instruct the device performing the data processing to acquire video data, parameter data, and the like.
The term interpretation, implementation, and benefits involved in various embodiments of generating multi-angle freeview video data may be found in other embodiments, and various implementations in multi-angle freeview interaction methods may be implemented in combination with other embodiments.
The multi-angle freeview data processing is described further below with particular reference thereto.
Fig. 38 is a flowchart of a multi-angle free view data processing method in an embodiment of the present invention, which specifically includes the following steps:
step S381, obtaining a data header file;
step S382, determining the definition format of the data file according to the analysis result of the data header file;
step S383, based on the definition format, reading a data combination from a data file, wherein the data combination comprises pixel data and depth data of a plurality of synchronous images, the synchronous images are different in viewing angle of a region to be watched, and the pixel data and the depth data of each image in the synchronous images have an association relation;
and step 384, performing image or video reconstruction of a virtual viewpoint according to the read data combination, wherein the virtual viewpoint is selected from a multi-angle free view angle range, and the multi-angle free view angle range is a range supporting switching view of the virtual viewpoint for the region to be watched.
The multi-angle free view angle data in the embodiment of the invention is data capable of supporting image or video reconstruction of virtual view points in a multi-angle free view angle range. A header file and a data file may be included. The header file may indicate a defined format of the data file so that the device performing data processing on the multi-angle freeview data can parse the required data from the data file according to the header file, as will be further described below.
Referring to fig. 3 in combination, the device performing data processing may be a device located in the CDN, or the device 33 performing display, or may be a device performing data processing. Both the data file and the header file may be stored in the cloud server 32, or in some application scenarios, the header file may be stored in a device that performs data processing, and the header file is obtained locally.
In a specific implementation, the stitched image in the foregoing embodiments may be used as a data file in the embodiments of the present invention. In an application scenario where bandwidth is limited, the stitched image may be split into multiple portions for multiple transmissions. Correspondingly, the data header file can comprise a segmentation mode, and the device for processing the data can combine the segmented multiple parts according to the indication in the data header file to obtain a spliced image.
In implementations, the defined format may include a storage format, and the header file may include a field indicating the storage format of the data combination, which may be numbered to indicate the storage format, or written directly to the storage format. Accordingly, the parsing result may be a number of a storage format, or a storage format.
Accordingly, the device performing data processing may determine the storage format according to the parsing result. For example, a specific storage format may be determined based on the number, as well as the stored supporting data; or the storage format may be obtained directly from a field indicating the storage format of the data combination. In other embodiments, if the storage format is fixed in advance, the fixed storage format may be recorded in the apparatus for performing data processing.
In implementations, the storage format may be a picture format or a video format. As described above, the picture format may be any one of image formats such as BMP, PNG, JPEG, webp, or may be other image formats; the Video formats may include encapsulation formats, which may be any of AVI, quickTime File Format, MPEG, WMV, real Video, flash Video, matroska, etc., or may be other encapsulation formats, and encoding formats, which may be h.261, h.263, h.264, h.265, MPEG, AVS, etc., or may be other encoding formats.
The storage format may be a picture format or other formats other than a video format, and is not limited herein. Various storage formats for performing subsequent image or video reconstruction of a virtual viewpoint are within the scope of the present invention, either by indication via a header file or by stored supporting data, enabling the device performing the data processing to obtain the required data.
In a specific implementation, when the storage format of the data combination is a video format, the number of the data combinations may be multiple, and each data combination may be a data combination corresponding to different frame moments after the video is unpacked and decoded.
In a specific implementation, the defined format may include a content storage rule of the data combination, and the header file may include a field indicating the content storage rule of the data combination. By the content storage rule, the apparatus performing data processing can determine the association relationship between the pixel data and the depth data in each image. The field indicating the content storage rule of the data combination may also be referred to as an association field, and the field may indicate the content storage rule of the data combination with a number, or may also be written directly.
Accordingly, the device performing data processing may determine the content storage rule of the data combination according to the parsing result. For example, specific content storage rules may be determined based on the number, as well as stored supporting data; or the content storage rules for the data combination may be obtained directly from the fields of the content storage rules that indicate the data combination.
In other embodiments, if the content storage rule is fixed in advance, the content storage rule of the fixed data combination may also be recorded in the device performing the data processing. The following further describes the specific implementation of the data combination in terms of the content storage rules for the data combination and the device performing the data processing in combination with the indication of the header file.
In a specific implementation, the storage rule of the pixel data and the depth data of the synchronized plurality of images may be specifically a storage rule of the pixel data and the depth data of the synchronized plurality of images in the stitched image.
As previously described, the storage format of the data combination may be a picture format or a video format, and accordingly, the data combination may be a picture format or a frame image in video. The image or frame image stores pixel data and depth data of each of a plurality of synchronized images, and from this point of view, an image or frame image obtained by decoding according to a picture format or a video format may also be referred to as a stitched image. The storage rules of the pixel data and the depth data of the synchronized plurality of images may be storage locations in the stitched image, which may be diverse. Various storage manners of the pixel data and the depth data of the synchronized images in the stitched image may be referred to in the foregoing description, and will not be described herein.
In a specific implementation, the content storage rule of the data combination may be used to indicate to the device performing data processing that the pixel data and the depth data of the synchronized multiple images are stored in multiple manners in the stitched image, or may be a storage manner that indicates, for each image, the first field and the second field in other storage manners, that is, a storage rule that indicates the pixel data and the depth data of the synchronized multiple images.
As described above, the header file may include a field indicating the content storage rule of the data combination, and the field may use a number to indicate the content storage rule of the data combination, or the rule may be directly written in the header file, or the fixed content storage rule of the data combination may be recorded in the device performing the data processing.
The content storage rule may correspond to any one of the storage modes, and the device for performing data processing may analyze the storage mode according to the content storage rule, further analyze the data combination, and determine an association relationship between pixel data and depth data of each of the plurality of images.
In a specific implementation, the content storage rule may be indicated by the distribution of the image region and the depth map region by the pixel data of each of the synchronized multiple images and the storage location of the depth data in the stitched image.
The indication may be a pattern number, for example, if the pattern number is 1, the content storage rule may be parsed as: the spliced image is equally divided into an upper region and a lower region, the upper region is an image region, the lower region is a depth map region, and an image of a certain position of the upper region is associated with a depth map stored in a position corresponding to the lower region. The device performing the data processing may further determine a specific storage manner based on the rule. For example, the storage manner shown in fig. 27 or 28, or other storage manners may be further determined by combining the number of the synchronized images, the storage order of the pixel data and the depth data, the proportional relationship between the pixel data and the pixel data occupied by the pixel points, and the like.
In a specific implementation, the content storage rule may also be indicated by the distribution of the image sub-region and the depth map sub-region through the pixel data of each image in the synchronized plurality of images and the storage position of the depth data in the stitched image. The pixel data of each image in the synchronous multiple images are stored in the image subareas, and the depth data of each image in the synchronous multiple images are stored in the depth map subareas.
For example, the content storage rule may be that the image sub-regions and the depth map sub-regions are arranged alternately column by column, and similar to the previous example, the device performing data processing may further determine a specific storage manner based on the rule. For example, the storage mode of fig. 31, or other storage modes, may be further determined by combining the number of the synchronous multiple images, the storage sequence of the pixel data and the depth data, the proportional relation of the pixel points occupied by the depth data and the pixel data, and the like.
As described above, the first field storing the pixel data and the second field storing the depth data may be pixel fields in the stitched image, or may be fields stored in other forms. Those skilled in the art will appreciate that the content storage rules may be an indication that is tailored to a particular storage mode so that the device performing the data processing is informed of the corresponding storage mode.
In particular implementations, the content storage rules may also include more information for supporting the manner in which the data combination is parsed by the device performing the data processing. For example, the foregoing image sub-regions and all or part of the depth map sub-regions may be included for edge protection, as well as edge protection. The content storage rules may also include resolution relationships of pixel data and depth data of the image, and the like.
The device performing the data processing may determine the specific storage mode based on the stored information or information obtained from other fields of the header file. For example, the number of the synchronized images may be obtained through a header file, and specifically may be obtained through a definition format of a data file parsed in the header file.
After determining a specific storage mode, the device performing data processing may parse out the pixel data of the synchronous multiple images and the depth data corresponding to the pixel data.
In a specific implementation, the resolution of the pixel data and the depth data may be the same, and then the pixel data of each pixel point of each image and the corresponding depth value may be further determined.
As mentioned above, the depth data may also be downsampled data, and the defined format in the header file may have a corresponding field for indicating, and the device performing data processing may perform corresponding upsampling to determine the pixel data of each pixel of each image and the corresponding depth value.
Correspondingly, the rendering and displaying can be performed according to the read data combination by determining the pixel data and the corresponding depth value of each pixel point of each image and the position of the virtual viewpoint to be displayed, and the rendering and displaying can be performed after the image is reconstructed. For video, the reconstructed image in the embodiment of the invention can be a frame image, and the frame image is displayed according to the sequence of frame moments, so that the video can be played for a user to finish video reconstruction. That is, video reconstruction may include reconstruction of frame images in video, the particular implementation of which is the same as or similar to the reconstruction of the images.
In a specific implementation, referring to fig. 39, performing image reconstruction of a virtual viewpoint may include the steps of:
step S391, determining parameter data of each image in the synchronized plurality of images, wherein the parameter data includes shooting position and shooting angle data of the image;
step S392, determining parameter data of the virtual viewpoint, where the parameter data of the virtual viewpoint includes a virtual viewing position and a virtual viewing angle;
step S393 of determining a plurality of target images among the synchronized plurality of images;
step S394, for each target image, mapping the depth data to the virtual viewpoint according to the relation between the parameter data of the virtual viewpoint and the parameter data of the image;
step S395, generating a reconstructed image according to the depth data mapped to the virtual viewpoint and the pixel data of the target image.
Wherein generating the reconstructed image further may include: a pixel value is determined for each pixel of the reconstructed image. Specifically, for each pixel, if the pixel data mapped to the virtual viewpoint is 0, hole filling may be performed by using the pixel data around one or more target images. For each pixel point, if the pixel data mapped to the virtual viewpoint is a plurality of data which are not zero, the weight value of each data can be determined, and finally the numerical value of the pixel point is determined.
In an embodiment of the present invention, when generating the reconstructed image, forward mapping may be performed first, and the texture map of the corresponding group in the image combination of the video frame is projected to a three-dimensional euclidean space by using depth information, that is: mapping the depth maps of the corresponding groups onto the virtual viewpoint positions at the user interaction time according to a space geometrical relationship to form a virtual viewpoint position depth map, and then performing reverse mapping to project three-dimensional space points onto an imaging plane of a virtual camera, namely: copying the pixel points in the texture maps of the corresponding groups into virtual texture maps corresponding to the generated virtual viewpoint positions according to the mapped depth maps to form virtual texture maps corresponding to the corresponding groups. And then fusing the virtual texture maps corresponding to the corresponding groups to obtain a reconstructed image of the virtual viewpoint position at the user interaction moment. By adopting the method to reconstruct the image, the sampling precision of the reconstructed image can be improved.
The pre-processing may be performed before the forward mapping is performed. Specifically, the homography matrix of the forward mapping depth value and the texture reverse mapping may be calculated according to the corresponding parameter data of the corresponding group in the image combination of the video frame. In implementations, the depth level may be converted to a depth value using a Z-transform.
In the forward mapping process of the depth map, a formula can be used to map the depth map of the corresponding group to the depth map of the virtual viewpoint position, and then the depth value of the corresponding position is copied. In addition, there may be noise in the depth map of the corresponding group, and some sampled signals may be included in the mapping process, so that the generated depth map of the virtual viewpoint position may have small noise holes. For this problem, median filtering may be employed to remove noise.
In a specific implementation, other post-processing can be performed on the virtual viewpoint position depth map obtained after forward mapping according to requirements, so as to further improve the quality of the generated reconstructed image. In an embodiment of the present invention, before reverse mapping is performed, a foreground and background occlusion relationship is processed on a virtual viewpoint position depth map obtained by forward mapping, so that a generated depth map can more truly reflect a position relationship of an object in a scene seen by the virtual viewpoint position.
For reverse mapping, in particular, the positions of the corresponding set of texture maps in the virtual texture map may be calculated from the virtual viewpoint position depth map obtained by the forward mapping, after which texture values of the corresponding pixel positions are copied, wherein holes in the depth map may be marked as 0 or as no texture values in the virtual texture map. Hole expansion may be performed for the region marked as hole, avoiding synthetic artefacts.
And then fusing the generated virtual texture maps of the corresponding groups to obtain a reconstructed image of the virtual viewpoint position at the user interaction moment. In practice, the fusion may also be carried out in a number of ways, as exemplified by the two examples below.
In an embodiment of the present invention, the weighting process is performed first, and then the hole filling is performed. Specifically: and carrying out weighting processing on pixels at corresponding positions in the virtual texture map corresponding to each corresponding group in the image combination of the video frame at the user interaction time to obtain pixel values at corresponding positions in the reconstructed image of the virtual viewpoint position at the user interaction time. And then, for the position with zero pixel value in the reconstructed image of the virtual viewpoint position at the user interaction moment, filling the hole by using pixels around the pixels in the reconstructed image to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment.
In another embodiment of the present invention, hole filling is performed first, and then weighting is performed. Specifically: and respectively filling holes at positions with zero pixel values in the virtual texture images corresponding to the corresponding groups in the image combination of the video frames at the user interaction time by using surrounding pixel values, and then carrying out weighting processing on the pixel values in the corresponding positions in the virtual texture images corresponding to the corresponding groups after filling the holes to obtain a reconstructed image of the virtual viewpoint position at the user interaction time.
The weighting processing in the above embodiment may specifically be a weighted average manner, or may use different weighting coefficients according to parameter data or the positional relationship between the photographing apparatus and the virtual viewpoint. In an embodiment of the present invention, the weighting is performed according to the position of the virtual viewpoint and the reciprocal of the position distance of each acquisition device, namely: the closer the acquisition device is to the virtual viewpoint position, the greater the weight.
In specific implementation, a preset hole filling algorithm may be adopted to fill holes as required, which is not described herein.
In a specific implementation, the shooting position and shooting angle data of the image may be referred to as external parameter data, and the parameter data may further include internal parameter data, that is, attribute data of a shooting device of the image. Distortion parameters and the like can be embodied through the internal parameter data, and the mapping relation is determined more accurately by combining the internal parameters.
In a specific implementation, the parameter data may be obtained from a data file, and in particular, may be obtained from a corresponding storage space according to a storage address of the parameter data in the header file.
In a specific implementation, the determining target image may select a plurality of images with the closer position distance between the image capturing viewpoint and the virtual coordinate viewpoint according to the 6-degree-of-freedom coordinates of the virtual viewpoint and the 6-degree-of-freedom coordinates of the viewpoint of the virtual audience at the image capturing position, that is, the image viewpoint.
In a specific implementation, all of the plurality of synchronized images may be used as the target image. More images are selected as target images, so that the reconstructed images are higher in quality, and the selection of the target images can be determined according to requirements without limitation.
As described above, the depth data may be a set of depth values corresponding to pixels of the image one by one, and the depth data mapped to the virtual viewpoint is also data corresponding to pixels of the image one by one. The reconstructed image is generated, specifically, for each pixel position, data of the corresponding position can be obtained from the pixel data of the target image according to the depth data, and the reconstructed image is generated. When data is acquired from a plurality of target images at one pixel position, a weighting calculation can be performed on the plurality of data to improve the quality of the reconstructed images.
It will be appreciated by those skilled in the art that the process of performing image reconstruction of a virtual viewpoint based on multi-angle freeview image data in the embodiments of the present invention may be varied and is not limited thereto.
The term interpretation, specific implementation and advantageous effects involved in the multi-angle freeview data processing method may refer to other embodiments, and various specific implementations in the multi-angle freeview interaction method may be implemented in combination with other embodiments.
The multi-angle freeview data described above may be multi-angle freeview image data, and is described further below with particular reference to multi-angle freeview image data processing.
Fig. 40 is a flowchart of a multi-angle freeview image data processing method according to an embodiment of the present invention, which specifically includes the following steps:
step S401, acquiring a data combination stored in a picture format, wherein the data combination comprises pixel data and depth data of a plurality of synchronous images, and the synchronous images are different in viewing angle of a region to be watched;
and step S402, based on the data combination, performing image reconstruction of a virtual view point, wherein the virtual view point is selected from a multi-angle free view angle range, and the multi-angle free view angle range is a range supporting switching view of the virtual view point of the region to be watched.
The method for obtaining the data combination in the picture format may adopt the implementation method in the foregoing embodiment, analyze the header file, and read the header file from the data file. The way in which the image reconstruction of the virtual viewpoint is performed can also be seen from the foregoing.
In a specific implementation, the obtaining of the data combination stored in the picture format and the image reconstruction of the virtual viewpoint can be completed by the edge computing node. As previously described, the edge computing node may be a node that communicates closely with a display device displaying the reconstructed image, maintaining a high bandwidth low latency connection, such as by wifi, 5g, or the like. In particular, the edge computing node may be a base station, a mobile device, a vehicle-mounted device, a home router with sufficient computing power. Referring to fig. 3 in combination, the edge computing node may be a device located at the CDN.
Correspondingly, before the image of the virtual viewpoint is reconstructed, the parameter data of the virtual viewpoint can be received, and after the image of the virtual viewpoint is reconstructed, the reconstructed image can be sent to the display device.
The image is reconstructed through the edge computing node, so that the requirement on display equipment can be reduced, equipment with lower computing capacity can also receive user instructions, and multi-angle free view experience is provided for the user.
For example, in a 5G scenario, the communication speed between a User Equipment (UE) and a base station, especially a base station of a current serving cell is fast, and a User may determine parameter data of a virtual viewpoint by indicating the UE, and calculate a reconstructed image by using the base station of the current serving cell as an edge calculation node. The device for displaying can receive the reconstructed image and provide multi-angle free view service for the user.
It will be appreciated that in implementations, the device that performs the image reconstruction and the display may be the same device. The device may receive a user indication, determine a virtual viewpoint from the user in real time. After the image of the virtual viewpoint is reconstructed, the reconstructed image may be displayed.
In implementations, the implementation of receiving the user indication and generating the virtual viewpoint according to the user indication may be varied, the virtual viewpoint being a viewpoint within the range of free view angles. Therefore, in the embodiment of the invention, the user can be supported to freely switch the virtual view points in the multi-angle free view angle range.
It is to be understood that the term interpretation, specific implementation and beneficial effects involved in the multi-angle freeview picture data processing method may refer to other embodiments, and that various specific implementations in the multi-angle freeview interaction method may be implemented in combination with other embodiments.
The multi-angle freeview data described above may also be multi-angle freeview video data, and is described in more detail below with respect to multi-angle freeview video data processing.
Fig. 41 is a flowchart of a multi-angle freeview video data processing method according to an embodiment of the present invention, which may include the following steps:
step S411, analyzing the acquired video data to obtain data combinations of different frame moments, wherein the data combinations comprise synchronous pixel data and depth data of a plurality of images, and the synchronous images have different viewing angles of a region to be watched;
Step S412, for each frame time, based on the data combination, performing image reconstruction of a virtual viewpoint, where the virtual viewpoint is selected from a multi-angle free view angle range, and the multi-angle free view angle range is a range supporting switching view of virtual viewpoints for a region to be watched, and the reconstructed image is used for video playing.
In a specific implementation, the format of the acquired video data may be various, the analysis of the acquired video data may be based on the video format to perform decapsulation and decoding to obtain frame images at different frame moments, and the data combination may be obtained from the frame images, that is, the frame images may store pixel data and depth data of multiple images that are synchronized. From this point of view, the frame image may also be referred to as a stitched image.
Wherein, the video data can be obtained from the data file according to the data header file, and the specific implementation manner of obtaining the data combination can be seen from the foregoing. The specific implementation of the image reconstruction of the virtual viewpoint can also be seen from the foregoing. After the reconstructed image of each frame time is obtained, video playing can be performed according to the sequence of the frame times.
In a specific implementation, the data combination of different frame moments and the image reconstruction of the virtual viewpoint can be completed by the edge computing node.
Correspondingly, before the image of the virtual viewpoint is reconstructed, the parameter data of the virtual viewpoint can be received, and after the image of the virtual viewpoint is reconstructed, the reconstructed image at each frame moment can be sent to the display device.
It will be appreciated that in implementations, the device that performs the image reconstruction and the display may be the same device.
It will be appreciated that the term interpretation, specific implementation and benefits involved in the multi-angle freeview video data processing method may refer to other embodiments, and that various specific implementations in the multi-angle freeview interaction method may be implemented in combination with other embodiments.
The multi-angle freeview interaction method is described in more detail below.
Fig. 42 is a flowchart of a multi-angle free view interaction method in an embodiment of the present invention, which specifically includes the following steps:
step S421, receiving a user instruction;
step S422, determining a virtual viewpoint according to the user instruction, where the virtual viewpoint is selected from a multi-angle free view angle range, and the multi-angle free view angle range is a range supporting switching viewing of virtual viewpoints for a region to be viewed;
Step S423, displaying display content for viewing the region to be viewed based on the virtual viewpoint, where the display content is generated based on a data combination and the virtual viewpoint, the data combination includes pixel data and depth data of a plurality of synchronous images, and an association exists between the image data and the depth data of each image, and the synchronous plurality of images have different viewing angles for the region to be viewed.
In the embodiment of the present invention, the virtual viewpoint may be a viewpoint within a multi-angle free view angle range, and a specific multi-angle view angle range may be associated with data combination.
In implementations, a user indication may be received, and a virtual viewpoint is determined within a range of freeview angles based on the user indication. The manner in which the user indication and the virtual viewpoint are determined from the user indication may be varied, as further illustrated below.
In an implementation, determining the virtual viewpoint according to the user indication may include: a base view for viewing the region to be viewed is determined, the base view including a position and a viewing angle of the base view. At least one of the position and the viewing angle of the virtual viewpoint may be changed based on the base viewpoint, and the user indication may have an association with the change pattern of the change. And determining a virtual viewpoint by taking the basic viewpoint as a reference according to the user instruction, the basic viewpoint and the association relation.
The base view may include a position and a viewing angle of a user viewing a region to be viewed. Further, the base view point may be a position and a view angle corresponding to a screen displayed by the device for displaying when receiving the user instruction, for example, referring to fig. 4, if receiving the user instruction, the image displayed by the device is shown in fig. 4, and referring to fig. 2, the position of the base view point may be VP shown in fig. 2 1 . It will be appreciated that the position and viewing angle of the base view may be preset, or the base view may be a virtual view previously determined according to the user instruction, or the base view may be expressed using coordinates of 6 DoF. The association relationship between the user indication and the virtual viewpoint based on the change mode of the base viewpoint can be a preset association relationship.
In a specific implementation, the manner of receiving the user instruction may be varied, and the following description will be given separately.
In one implementation, a path of the contact point on the touch sensitive screen may be detected, and the path may include a start point, an end point, and a direction of movement of the contact point as the user indication.
Accordingly, the association relationship between the path and the virtual viewpoint may be varied based on the manner of change of the base viewpoint.
For example, the paths may be 2, and at least one contact point of the 2 paths moves in a direction away from each other, and the position of the virtual viewpoint moves in a direction approaching the region to be viewed.
Referring to fig. 43 and 11 in combination, vector F in fig. 43 1 Sum vector F 2 It is possible to respectively schematic 2 paths under which, if the base view is B in fig. 11 2 Then the virtual viewpoint may be B 3 . That is, the area to be viewed is enlarged for the user.
It is to be understood that fig. 43 is only illustrative, and in a specific application scenario, the starting point, the ending point and the direction of 2 paths may be various, and at least one contact point in the 2 paths may move in a direction away from each other. One of the 2 paths may be a path of an unmoved contact point, and may include only a start point.
In an embodiment of the present invention, the display image before enlargement may be as shown in fig. 4, and the image after enlargement may be as shown in fig. 44.
In a specific implementation, the enlarged center point may be determined according to the position of the contact point, or a preset point may be used as the center point, and the image may be enlarged at the center point. The magnification, that is, the magnitude of the virtual viewpoint movement, may be associated with the magnitude of the approach of the contact point in the 2 paths, and the association relationship may be preset.
In a specific implementation, if at least one contact point in the 2 paths moves in a direction approaching to the other, the position of the virtual viewpoint may move in a direction away from the region to be viewed.
Referring to fig. 45 and 11 in combination, vector F in fig. 45 3 Sum vector F 4 Can respectively schematic 2 paths, in whichIf the base view point is B in FIG. 11 3 Then the virtual viewpoint may be B 2 . That is, for the user, the area to be viewed is reduced.
It is to be understood that fig. 45 is only illustrative, and in a specific application scenario, the starting point, the ending point and the direction of the 2 paths may be various, and at least one contact point in the 2 paths may move in a direction approaching to the other. One of the 2 paths may be a path of an unmoved contact point, and may include only a start point.
In an embodiment of the present invention, the display image before reduction may be as shown in fig. 44, and the image after reduction may be as shown in fig. 4.
In a specific implementation, the reduced center point may be determined according to the position of the contact point, or a preset point may be used as the center point, and the image may be reduced by using the center point. The reduced magnification, that is, the magnitude of the virtual viewpoint movement, may be associated with the magnitude of the approach of the contact point in the 2 paths, and the association relationship may be preset.
In a specific implementation, the association relationship between the path and the virtual viewpoint based on the change manner of the base viewpoint may also include: the number of paths is 1, the moving distance of the contact point is related to the changing amplitude of the visual angle, and the moving direction of the contact point is related to the changing direction of the visual angle.
For example, referring to fig. 5 and 13 in combination, if the received user indication is 1 path, vector D in fig. 5 is used 52 Schematically, if the base view is point C in FIG. 13 2 The virtual viewpoint may be point C 1
In an embodiment of the present invention, the display before the viewing angle is switched may be referred to as fig. 5, and the display of the display device after the viewing angle is switched may be as shown in fig. 6.
If the received user indication is 1 path, such as vector D in FIG. 8 81 Schematically, if the base view is point C in FIG. 13 2 The virtual viewpoint may be point C 3
In an embodiment of the present invention, the display before the viewing angle is switched may be referred to as fig. 8, and the display of the display device after the viewing angle is switched may be as shown in fig. 9.
It will be appreciated by those skilled in the art that the various embodiments described above are merely illustrative and not limiting of the association between user indications and virtual viewpoints.
In particular implementations, the user indication may include a voice control instruction, which may be in a natural language format, such as "zoom in", "zoom out", "view left", etc. Correspondingly, determining the virtual viewpoint according to the user indication, performing voice recognition on the user indication, and determining the virtual viewpoint according to a preset association relation between the indication and the virtual viewpoint based on the change mode of the base viewpoint and based on the base viewpoint.
In a specific implementation, the user indication may also include a selection of a preset viewpoint for viewing the region to be viewed. The preset view point may be various according to the region to be viewed, and may include a position and a viewing angle. For example, if the viewing area is a basketball game area, the predetermined viewpoint may be positioned below the backboard to allow the user to view the basketball game area with a viewing angle that is the viewing angle of the spectator at the scene or a training viewing angle. Accordingly, a preset viewpoint may be used as the virtual viewpoint.
In implementations, the user indication may also include a selection of a particular object in the region to be viewed. The particular object may be determined by image recognition techniques. For example, in basketball games, each player in the game scene may be identified according to face recognition technology, the user may be provided with options of the relevant player, and the virtual viewpoint may be determined according to the user's selection of a particular player, and the user may be provided with a picture under the virtual viewpoint.
In a specific implementation, the user indication may further include at least one of a position and a viewing angle of the virtual viewpoint, for example, 6DoF coordinates of the virtual viewpoint may be directly input.
In specific implementations, the manner of receiving the user indication may be various, for example, various manners such as a signal that a touch point is on a touch-sensitive screen, a signal that an acoustic sensor is detected, a signal that a sensor capable of reflecting the posture of the device is detected, such as a gyroscope and a gravity sensor; the corresponding user indication may be a path of the point of contact on the touch sensitive screen, a voice command, a gesture operation, etc. The content indicated by the user may be various, and for example, various means for indicating at least one of a preset viewpoint, a specific viewing object, and a position and a viewing angle of the virtual viewpoint based on a change in the base viewpoint may be indicated. The specific implementation of determining the virtual viewpoint according to the user direction may also be varied.
Specifically, in combination with the above manner of receiving the user indication, the detection of the various sensing devices may be performed at preset time intervals, where the time intervals correspond to the frequency of detection, for example, the detection may be performed at a frequency of 25 times/second, so as to obtain the user indication.
It will be appreciated that the manner in which the user indication is received, the content of the user indication, and the manner in which the virtual viewpoint is determined based on the user indication may be combined or replaced without limitation.
In specific implementation, after receiving the trigger instruction, the user instruction may be received in response to the trigger instruction, so that misoperation of the user may be avoided. The trigger instruction may be a preset button for clicking the screen area, or a voice control signal is used as the trigger instruction, or the foregoing manner in which the user indicates that the user can use the trigger instruction, or other manners are also possible.
In implementations, the user indication may be received during the playing of the video or the presentation of the image. During the process of displaying the image, receiving a user instruction, wherein the data combination can be the data combination corresponding to the image. In the video playing process, receiving a user instruction, wherein the data combination can be a data combination corresponding to a frame image in the video. And the display content for watching the region to be watched based on the virtual viewpoint can be an image reconstructed based on the virtual viewpoint.
In the video playing process, after receiving the user indication to generate the virtual viewpoint, the display content for watching the to-be-watched area based on the virtual viewpoint may be a frame image of a reconstructed multi-frame generated based on the virtual viewpoint. That is, in the course of switching virtual viewpoints, the video may be continuously played, before the virtual viewpoints are re-determined according to the user's instruction, the video may be played at the original virtual viewpoints, and after the virtual viewpoints are re-determined, a reconstructed frame image based on the virtual viewpoints may be generated to be played at the position and viewing angle of the switched virtual viewpoints.
Further, in the video playing process, after receiving the user indication to generate the virtual viewpoint, the display content for watching the to-be-watched area based on the virtual viewpoint may be a frame image of a reconstructed multi-frame generated based on the virtual viewpoint. That is, in the process of switching the virtual viewpoint, the video may be continuously played, the video may be played in an original configuration before the virtual viewpoint is determined, and after the virtual viewpoint is determined, a reconstructed frame image based on the virtual viewpoint may be generated to play at the position and viewing angle of the switched virtual viewpoint. Alternatively, video playback may be paused and the virtual viewpoint switched.
Referring to fig. 4 and 6 in combination, in the course of image presentation, a user instruction may be received, a virtual viewpoint may be generated to switch viewing according to the user instruction, and display contents may be changed from the image shown in fig. 4 to the image shown in fig. 6.
When the video is played to the frame image shown in fig. 4, the virtual viewpoint is switched, and the frame image shown in fig. 6 is displayed. The video playing may be continuously performed by displaying the frame image based on the virtual viewpoint until a new user instruction is received, for example, when the new user instruction is received when the frame image shown in fig. 46 is played, the virtual viewpoint may be switched according to the new user instruction to continue the video playing.
It is to be understood that the term explanation, specific implementation and beneficial effects involved in the multi-angle freeview interaction method may refer to other embodiments, and that various specific implementations of the multi-angle freeview interaction method may be implemented in combination with other embodiments.
The embodiment of the invention also provides a multi-angle free view video data processing device, referring to fig. 47, which specifically may include:
the parsing unit 471 is adapted to parse the acquired video data to obtain data combinations with different frame moments, where the data combinations include pixel data and depth data of multiple synchronous images, and the multiple synchronous images have different viewing angles of the region to be watched;
The virtual viewpoint image reconstruction unit 472 is adapted to perform image reconstruction of a virtual viewpoint based on the data combination for each frame time, the virtual viewpoint being selected from a multi-angle free view angle range, the multi-angle free view angle range being a range supporting switching view of a viewpoint for a region to be viewed.
Referring to fig. 48, in a specific implementation of the present invention, the virtual viewpoint image reconstruction unit 472 may include:
an up-sampling subunit 481, adapted to up-sample the depth data to obtain a set of depth values corresponding to pixels of the image one by one;
the virtual view image reconstruction subunit 482 is adapted to perform image reconstruction of the virtual view based on the synchronized pixel data of the plurality of images and the set of depth values.
Referring to fig. 49, in another specific implementation of the present invention, the virtual viewpoint image reconstruction unit 472 may include:
an image parameter data determination subunit 491 adapted to determine parameter data of each of the synchronized plurality of images, the parameter data including a shooting position and shooting angle data of the image;
a virtual viewpoint parameter data determination subunit 492 adapted to determine parameter data of the virtual viewpoint, the parameter data of the virtual viewpoint including a virtual viewing position and a virtual viewing angle;
A target image determination subunit 493 adapted to determine a plurality of target images among the synchronized plurality of images;
a mapping subunit 494 adapted to map, for each target image, the depth data to the virtual viewpoint according to a relationship between the parameter data of the virtual viewpoint and the parameter data of the image;
an image generation subunit 495 adapted to generate a reconstructed image from the depth data mapped to the virtual viewpoint and the pixel data of the target image.
In a specific implementation, the target image determining subunit 493 is further adapted to select a target image from the plurality of images according to a relationship between the parameter data of the virtual viewpoint and the parameter data of the image.
With continued reference to fig. 47, in an implementation, the multi-angle freeview video data processing apparatus may further include: the virtual viewpoint parameter data receiving unit 473 is adapted to receive the parameter data of the virtual viewpoint before performing image reconstruction of the virtual viewpoint.
Further, the multi-angle freeview video data processing apparatus may further include: and a transmitting unit 474, adapted to reconstruct the image of the virtual viewpoint, and transmit the reconstructed image to the image display end.
The explanation, principle, specific implementation and beneficial effects of the term related to the multi-angle free view video data processing device in the embodiment of the present invention can be referred to the multi-angle free view video data processing method in the embodiment of the present invention, and are not described herein.
The embodiment of the present invention further provides another multi-angle free view video data processing device, referring to fig. 50, may specifically include:
a reconstruction unit 501 adapted to perform image reconstruction of a virtual viewpoint using the multi-angle freeview video data processing apparatus described above;
the playing unit 502 is adapted to perform video playing based on the reconstructed images at the different frame moments.
Further, the multi-angle freeview video data processing apparatus may further include: and a receiving unit 503, adapted to receive an instruction of a user before performing image reconstruction of the virtual viewpoint, and determine the virtual viewpoint according to the instruction of the user.
The explanation, principle, specific implementation and beneficial effects of the term related to the multi-angle free view video data device in the embodiment of the present invention can be referred to the multi-angle free view video data processing method in the embodiment of the present invention, and are not described herein.
The embodiment of the present invention further provides another multi-angle free view video data apparatus, referring to fig. 51, may specifically include:
A receiving unit 511 adapted to receive an image reconstructed from a virtual viewpoint, the image reconstruction from the virtual viewpoint being performed using the multi-angle free view video data processing apparatus described above;
the playing unit 512 is adapted to perform video playing based on the reconstructed images at the different frame moments.
Further, the multi-angle freeview video data processing apparatus may further include: a transmitting unit 513 adapted to transmit the parameter data of the virtual viewpoint to an edge computing node.
The explanation, principle, specific implementation and beneficial effects of the term related to the multi-angle free view video data processing device in the embodiment of the present invention can be referred to the multi-angle free view video data method in the foregoing embodiment of the present invention, and are not repeated here.
Embodiments of the present invention also provide a computer readable storage medium having stored thereon computer instructions that, when executed, perform the steps of the multi-angle freeview video data processing method.
The computer readable storage medium may be an optical disc, a mechanical hard disc, a solid state disc, or the like.
The embodiment of the invention also provides an edge computing node, which comprises a memory and a processor, wherein the memory stores computer instructions capable of running on the processor, and the processor executes the steps of the multi-angle freeview video data method when running the computer instructions.
As previously described, the edge computing node may be a node that communicates closely with a display device displaying the reconstructed image, maintaining a high bandwidth low latency connection, such as by wifi, 5g, or the like. In particular, the edge computing node may be a base station, a mobile device, a vehicle-mounted device, a home router with sufficient computing power.
The embodiment of the invention also provides a terminal, which comprises a memory and a processor, wherein the memory stores computer instructions capable of running on the processor, and the processor executes the steps of the multi-angle free view video data processing method when running the computer instructions. The terminal may be a smart phone, a tablet computer, or other suitable terminals.
The embodiment of the invention also provides mobile equipment, which comprises a communication component, a processor and a display component: the communication component is used for receiving multi-angle free view video data, and the multi-angle free view data comprises the data combination; the processor is used for rendering based on the multi-angle free view video data and generating videos corresponding to different virtual viewpoints; and the display component is used for displaying the videos corresponding to the different virtual viewpoints. The mobile device may be a smart phone, tablet computer, or the like, as appropriate.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and the scope of the invention should be assessed accordingly to that of the appended claims.

Claims (25)

1. A multi-angle freeview video data processing method, comprising:
analyzing the acquired video data to obtain data combinations at different frame moments, wherein the data combinations comprise pixel data and depth data of a plurality of synchronous images, the depth data are data obtained by downsampling an original depth image, the synchronous images are different in view angles of areas to be watched, the data combinations are used for indicating the association relation between the pixel data and the depth data of the images, the video data are acquired from spliced images in a data file, the number of the spliced images is divided into an image area and a depth image area, the image area comprises a plurality of image subareas, and each image subarea is used for storing one of the images; the depth map region comprises a plurality of depth map sub-regions, each depth map sub-region for storing depth data for one of the plurality of images;
And for each frame moment, based on the data combination, reconstructing an image of a virtual viewpoint, wherein the virtual viewpoint is selected from a multi-angle free view angle range, the multi-angle free view angle range is a range supporting viewpoint switching viewing of a region to be viewed, and the reconstructed image is used for video playing.
2. The multi-angle freeview video data processing method according to claim 1, wherein each of the synchronized plurality of images, the depth data is a set of depth values for a one-to-one correspondence of pixels of the image.
3. The multi-angle freeview video data processing method according to claim 1, wherein each of the synchronized plurality of images is an image in which depth value sets corresponding one-to-one to pixels of the image are arranged according to pixel points of the image.
4. A multi-angle freeview video data processing method according to claim 3, wherein performing image reconstruction of a virtual viewpoint based on the data combination comprises:
upsampling the depth data to obtain a depth value set corresponding to pixels of the image one by one;
And carrying out image reconstruction of the virtual view point according to the pixel data of the synchronous multiple images and the depth value set.
5. The multi-angle freeview video data processing method according to claim 1, wherein said performing image reconstruction of virtual views based on said data combination comprises:
determining parameter data of each image in the synchronous multiple images, wherein the parameter data comprises shooting positions and shooting angle data of the images;
determining parameter data of the virtual viewpoint, wherein the parameter data of the virtual viewpoint comprises a virtual viewing position and a virtual viewing angle;
determining a plurality of target images among the synchronized plurality of images;
for each target image, mapping the depth data to the virtual viewpoint according to the relation between the parameter data of the virtual viewpoint and the parameter data of the image;
and generating a reconstructed image according to the depth data mapped to the virtual viewpoint and the pixel data of the target image.
6. The multi-angle freeview video data processing method according to claim 5, wherein determining a plurality of target images among the synchronized plurality of images comprises: and selecting a target image from the plurality of images according to the relation between the parameter data of the virtual viewpoint and the parameter data of the images.
7. The multi-angle freeview video data processing method according to claim 6, wherein all of the plurality of synchronized images are taken as the target image.
8. The multi-angle freeview video data processing method according to claim 6, wherein the parameters of the image further include internal parameter data including attribute data of a photographing apparatus of the image.
9. The multi-angle freeview video data processing method according to claim 1, further comprising, before performing image reconstruction of the virtual viewpoint: and receiving the parameter data of the virtual viewpoint.
10. The multi-angle freeview video data processing method according to claim 9, further comprising, after performing image reconstruction of the virtual viewpoint: and sending the reconstructed image to an image display end.
11. A multi-angle freeview video data processing method, comprising:
image reconstruction of a virtual viewpoint using the multi-angle freeview video data processing method according to any one of claims 1 to 10;
and playing the video based on the reconstructed images at different frame moments.
12. The multi-angle freeview video data processing method according to claim 11, further comprising, before performing image reconstruction of the virtual viewpoint: and receiving an instruction of a user, and determining the virtual viewpoint according to the instruction of the user.
13. A multi-angle freeview video data processing method, comprising:
receiving an image reconstructed from a virtual viewpoint, wherein the image reconstruction from the virtual viewpoint adopts the multi-angle free view video data processing method according to any one of claims 1 to 10;
and playing the video based on the reconstructed images at different frame moments.
14. The multi-angle freeview video data processing method according to claim 13, wherein said reconstructed image is received from an edge computing node.
15. The multi-angle freeview video data processing method according to claim 13, further comprising: and sending the parameter data of the virtual viewpoint to an edge computing node.
16. A multi-angle freeview video data processing apparatus, comprising:
the analysis unit is suitable for analyzing the acquired video data to obtain data combinations of different frame moments, the data combinations comprise synchronous pixel data and depth data of a plurality of images, the synchronous images are different in viewing angle of an area to be watched, the depth data are data obtained after downsampling of an original depth image, the data combinations are used for indicating the association relation between the pixel data and the depth data of the images, the video data are acquired from spliced images in a data file, the spliced image is divided into an image area and a depth image area, the image area comprises a plurality of image subareas, and each image subarea is used for storing one of the images; the depth map region comprises a plurality of depth map sub-regions, each depth map sub-region for storing depth data for one of the plurality of images;
And the virtual viewpoint image reconstruction unit is suitable for carrying out image reconstruction of a virtual viewpoint based on the data combination for each frame time, wherein the virtual viewpoint is selected from a multi-angle free view angle range, and the multi-angle free view angle range is a range supporting viewpoint switching and watching of a region to be watched.
17. A multi-angle freeview video data processing apparatus, comprising:
a reconstruction unit adapted to perform image reconstruction of a virtual viewpoint using the multi-angle freeview video data processing apparatus according to claim 16;
and the playing unit is suitable for playing the video based on the reconstructed images at different frame moments.
18. A multi-angle freeview video data processing apparatus, comprising:
a receiving unit adapted to receive an image reconstructed from a virtual viewpoint, the image reconstruction from the virtual viewpoint being performed using the multi-angle freeview video data processing apparatus according to claim 16;
and the playing unit is suitable for playing the video based on the reconstructed images at different frame moments.
19. A computer readable storage medium having stored thereon computer instructions, which when run perform the steps of the multi-angle freeview video data processing method according to any of claims 1 to 10.
20. A computer readable storage medium having stored thereon computer instructions, which when run perform the steps of the multi-angle freeview video data processing method according to any of claims 11 to 12.
21. A computer readable storage medium having stored thereon computer instructions, which when run perform the steps of the multi-angle freeview video data processing method of any of claims 13 to 15.
22. An edge computing node device comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the multi-angle freeview video data processing method of any of claims 1 to 10.
23. A terminal comprising a memory and a processor, said memory having stored thereon computer instructions executable on said processor, characterized in that said processor executes the steps of the multi-angle freeview video data processing method according to any of claims 11 to 12 when said computer instructions are executed.
24. A terminal comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the multi-angle freeview video data processing method of any of claims 13 to 15.
25. A mobile device comprising a communication component, a processor, and a display component, characterized in that:
-said communication component for receiving multi-angle freeview video data comprising said data combination in the multi-angle freeview video data processing method according to any of claims 1 to 8;
the processor is used for rendering based on the multi-angle free view video data and generating videos corresponding to different virtual viewpoints;
and the display component is used for displaying the videos corresponding to the different virtual viewpoints.
CN201910173414.7A 2019-03-07 2019-03-07 Multi-angle free view video data processing method and device, medium and equipment Active CN111669570B (en)

Priority Applications (23)

Application Number Priority Date Filing Date Title
CN201910173414.7A CN111669570B (en) 2019-03-07 2019-03-07 Multi-angle free view video data processing method and device, medium and equipment
PCT/US2020/021195 WO2020181088A1 (en) 2019-03-07 2020-03-05 Method, apparatus, medium, and device for generating multi-angle free-respective image data
US16/810,586 US20200286279A1 (en) 2019-03-07 2020-03-05 Method, apparatus, medium, and device for processing multi-angle free-perspective image data
US16/810,237 US11037365B2 (en) 2019-03-07 2020-03-05 Method, apparatus, medium, terminal, and device for processing multi-angle free-perspective data
US16/810,565 US11055901B2 (en) 2019-03-07 2020-03-05 Method, apparatus, medium, and server for generating multi-angle free-perspective video data
US16/810,362 US20200288108A1 (en) 2019-03-07 2020-03-05 Method, apparatus, terminal, capturing system and device for setting capturing devices
PCT/US2020/021220 WO2020181104A1 (en) 2019-03-07 2020-03-05 Method, apparatus, medium, and server for generating multi-angle free-perspective video data
PCT/US2020/021167 WO2020181074A1 (en) 2019-03-07 2020-03-05 Method, apparatus, medium, terminal, and device for multi-angle free-perspective interaction
US16/810,480 US20200288098A1 (en) 2019-03-07 2020-03-05 Method, apparatus, medium, terminal, and device for multi-angle free-perspective interaction
PCT/US2020/021241 WO2020181119A1 (en) 2019-03-07 2020-03-05 Video reconstruction method, system, device, and computer readable storage medium
PCT/US2020/021164 WO2020181073A1 (en) 2019-03-07 2020-03-05 Method, apparatus, terminal, capturing system and device for setting capturing devices
PCT/US2020/021231 WO2020181112A1 (en) 2019-03-07 2020-03-05 Video generating method, apparatus, medium, and terminal
PCT/US2020/021247 WO2020181125A1 (en) 2019-03-07 2020-03-05 Method, apparatus, medium, and device for processing multi-angle free-perspective video data
US16/810,352 US20200288097A1 (en) 2019-03-07 2020-03-05 Method, apparatus, medium, terminal, and device for multi-angle free-perspective interaction
PCT/US2020/021187 WO2020181084A1 (en) 2019-03-07 2020-03-05 Method, apparatus, medium, terminal, and device for multi-angle free-perspective interaction
US16/810,614 US20200288099A1 (en) 2019-03-07 2020-03-05 Video generating method, apparatus, medium, and terminal
US16/810,634 US11341715B2 (en) 2019-03-07 2020-03-05 Video reconstruction method, system, device, and computer readable storage medium
US16/810,464 US11521347B2 (en) 2019-03-07 2020-03-05 Method, apparatus, medium, and device for generating multi-angle free-respective image data
US16/810,681 US20200288112A1 (en) 2019-03-07 2020-03-05 Method, apparatus, medium, and device for processing multi-angle free-perspective video data
US16/810,695 US11257283B2 (en) 2019-03-07 2020-03-05 Image reconstruction method, system, device and computer-readable storage medium
PCT/US2020/021252 WO2020181128A1 (en) 2019-03-07 2020-03-05 Image reconstruction method, system, device and computer-readable storage medium
PCT/US2020/021197 WO2020181090A1 (en) 2019-03-07 2020-03-05 Method, apparatus, medium, and device for processing multi-angle free-perspective image data
PCT/US2020/021141 WO2020181065A1 (en) 2019-03-07 2020-03-05 Method, apparatus, medium, terminal, and device for processing multi-angle free-perspective data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910173414.7A CN111669570B (en) 2019-03-07 2019-03-07 Multi-angle free view video data processing method and device, medium and equipment

Publications (2)

Publication Number Publication Date
CN111669570A CN111669570A (en) 2020-09-15
CN111669570B true CN111669570B (en) 2023-12-19

Family

ID=72381317

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910173414.7A Active CN111669570B (en) 2019-03-07 2019-03-07 Multi-angle free view video data processing method and device, medium and equipment

Country Status (1)

Country Link
CN (1) CN111669570B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114840165A (en) * 2022-04-22 2022-08-02 南方科技大学 Image display method, image display device, apparatus, and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102447925A (en) * 2011-09-09 2012-05-09 青岛海信数字多媒体技术国家重点实验室有限公司 Method and device for synthesizing virtual viewpoint image
CN102714741A (en) * 2009-10-14 2012-10-03 汤姆森特许公司 Filtering and edge encoding
CN102970554A (en) * 2011-08-30 2013-03-13 奇景光电股份有限公司 System and method of handling data frames for stereoscopic display
CN104574311A (en) * 2015-01-06 2015-04-29 华为技术有限公司 Image processing method and device
CN105611268A (en) * 2015-12-15 2016-05-25 联想(北京)有限公司 Information processing method and electronic apparatus
CN109361913A (en) * 2015-05-18 2019-02-19 韩国电子通信研究院 For providing the method and apparatus of 3-D image for head-mounted display

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5858380B2 (en) * 2010-12-03 2016-02-10 国立大学法人名古屋大学 Virtual viewpoint image composition method and virtual viewpoint image composition system
KR102483838B1 (en) * 2015-04-19 2023-01-02 포토내이션 리미티드 Multi-Baseline Camera Array System Architecture for Depth Augmentation in VR/AR Applications
WO2017023746A1 (en) * 2015-07-31 2017-02-09 Hsni, Llc Virtual three dimensional video creation and management system and method
SG11201803682RA (en) * 2015-11-11 2018-06-28 Sony Corp Encoding apparatus and encoding method, decoding apparatus and decoding method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102714741A (en) * 2009-10-14 2012-10-03 汤姆森特许公司 Filtering and edge encoding
CN102970554A (en) * 2011-08-30 2013-03-13 奇景光电股份有限公司 System and method of handling data frames for stereoscopic display
CN102447925A (en) * 2011-09-09 2012-05-09 青岛海信数字多媒体技术国家重点实验室有限公司 Method and device for synthesizing virtual viewpoint image
CN104574311A (en) * 2015-01-06 2015-04-29 华为技术有限公司 Image processing method and device
CN109361913A (en) * 2015-05-18 2019-02-19 韩国电子通信研究院 For providing the method and apparatus of 3-D image for head-mounted display
CN105611268A (en) * 2015-12-15 2016-05-25 联想(北京)有限公司 Information processing method and electronic apparatus

Also Published As

Publication number Publication date
CN111669570A (en) 2020-09-15

Similar Documents

Publication Publication Date Title
US11521347B2 (en) Method, apparatus, medium, and device for generating multi-angle free-respective image data
CN111669567B (en) Multi-angle free view video data generation method and device, medium and server
CN111669561B (en) Multi-angle free view image data processing method and device, medium and equipment
CN111669518A (en) Multi-angle free visual angle interaction method and device, medium, terminal and equipment
CN111667438B (en) Video reconstruction method, system, device and computer readable storage medium
CN111669564B (en) Image reconstruction method, system, device and computer readable storage medium
WO2022002181A1 (en) Free viewpoint video reconstruction method and playing processing method, and device and storage medium
WO2022001865A1 (en) Depth map and video processing and reconstruction methods and apparatuses, device, and storage medium
CN111669604A (en) Acquisition equipment setting method and device, terminal, acquisition system and equipment
CN111669569A (en) Video generation method and device, medium and terminal
CN111669570B (en) Multi-angle free view video data processing method and device, medium and equipment
CN111669603B (en) Multi-angle free visual angle data processing method and device, medium, terminal and equipment
CN111669568A (en) Multi-angle free visual angle interaction method and device, medium, terminal and equipment
CN111669571B (en) Multi-angle free view image data generation method and device, medium and equipment
CN112738646B (en) Data processing method, device, system, readable storage medium and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40036443

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant