CN111669603B

CN111669603B - Multi-angle free visual angle data processing method and device, medium, terminal and equipment

Info

Publication number: CN111669603B
Application number: CN201910172742.5A
Authority: CN
Inventors: 盛骁杰
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-03-07
Filing date: 2019-03-07
Publication date: 2023-03-21
Anticipated expiration: 2039-03-07
Also published as: CN111669603A

Abstract

The embodiment of the invention discloses a multi-angle free visual angle data processing method, a device, a medium, a terminal and equipment, wherein the multi-angle free visual angle data processing method comprises the following steps: acquiring a data header file; determining a defined format of the data file according to the analysis result of the data header file; reading a data combination from a data file based on the defined format, wherein the data combination comprises pixel data and depth data of a plurality of synchronous images, the synchronous images have different visual angles of regions to be watched, and the pixel data and the depth data of each image in the synchronous images have an association relation; and according to the read data combination, reconstructing an image or a video of a virtual viewpoint, wherein the virtual viewpoint is selected from a multi-angle free visual angle range. The technical scheme of the embodiment of the invention can support the user to switch and watch the virtual viewpoint of the area to be watched.

Description

Multi-angle free visual angle data processing method and device, medium, terminal and equipment

Technical Field

The invention relates to the field of data processing, in particular to a multi-angle free visual angle data processing method and device, a medium, a terminal and equipment.

Background

In the field of image and video processing, image or video data can be received and displayed to a user or video can be played. Such presentations and playback are typically based on a fixed perspective, and the user experience is subject to improvement.

Disclosure of Invention

The technical problem to be solved by the embodiment of the invention is to provide a multi-angle free visual angle data processing method.

To solve the foregoing technical problem, an embodiment of the present invention provides a multi-angle free view data processing method, including: acquiring a data header file; determining a defined format of the data file according to the analysis result of the data header file; reading a data combination from a data file based on the defined format, wherein the data combination comprises pixel data and depth data of a plurality of synchronous images, the synchronous images have different visual angles of regions to be watched, and the pixel data and the depth data of each image in the synchronous images have an association relation; and reconstructing an image or a video of a virtual viewpoint according to the read data combination, wherein the virtual viewpoint is selected from a multi-angle free visual angle range, and the multi-angle free visual angle range is a range supporting the switching and watching of the virtual viewpoint on the area to be watched.

Optionally, the defined format includes a storage format of the data combination.

Optionally, the storage format of the data combination is a video format.

Optionally, the number of the data combinations is multiple, and the data combinations respectively correspond to different frame times.

Optionally, the storage format of the data combination is a picture format.

Optionally, the defined format includes a content storage rule of the data combination, and the content storage rule includes a storage rule of the pixel data and the depth data of the synchronized plurality of images.

Optionally, the storage rule of the pixel data and the depth data of the synchronized multiple images includes: storage locations of pixel data and depth data of the synchronized plurality of images in a stitched image, the stitched image being an image or frame image read from the data file according to a picture format or a video format of the defined formats.

Optionally, the pixel data of each of the plurality of synchronized images is stored in an image region, the depth data is stored in a depth map region, and the storage location of the pixel data and the depth data of each of the plurality of synchronized images in the stitched image is indicated by the distribution of the image region and the depth map region.

Optionally, the pixel data of each of the plurality of synchronized images is stored in an image sub-region, the depth data of each of the plurality of synchronized images is stored in a depth map sub-region, and the storage locations of the pixel data and the depth data of each of the plurality of synchronized images in the stitched image are indicated by the distribution of the image sub-region and the depth map sub-region.

Optionally, the storage rule of the pixel data and the depth data of the synchronized multiple images further includes: and all or part of the image subarea and the depth map subarea are subjected to edge protection.

Optionally, the storage rule of the pixel data and the depth data of the synchronized multiple images further includes: the resolution of the pixel data and the depth data of the image are the same.

Optionally, the data combination includes a first field storing pixel data of each image in the synchronized plurality of images, and a second field associated with the first field storing depth data of the image.

Optionally, the defined format includes an association relationship between the first field and the second field.

Optionally, the data file further includes parameter data of each of the plurality of synchronized images, where the parameter data includes shooting position and shooting angle data of the image.

Optionally, the parameter data further includes internal parameter data, and the internal parameter data includes attribute data of a photographing device of the image.

Optionally, the defined format further includes a storage address of the parameter data.

Optionally, reconstructing an image or video of a virtual viewpoint according to the read data combination includes: and reconstructing an image or a video of the virtual viewpoint based on the data combination according to the relation between the virtual viewpoint and the parameter data.

Optionally, the defined format further includes storage addresses of the pixel data and the depth data of the synchronized plurality of images.

Optionally, the defined format of the data file includes the number of the synchronized plurality of images.

The embodiment of the invention also provides a multi-angle free visual angle data processing device, which comprises: the data header file acquisition unit is suitable for acquiring a data header file; the data header file analysis unit is suitable for determining the definition format of the data file according to the analysis result of the data header file; the data combination reading unit is suitable for reading a data combination from a data file based on the defined format, wherein the data combination comprises pixel data and depth data of a plurality of synchronous images, the synchronous images have different visual angles of regions to be watched, and the pixel data and the depth data of each image in the synchronous images have an association relation; and the reconstruction unit is suitable for reconstructing images or videos of virtual viewpoints according to the read data combination, the virtual viewpoints are selected from a multi-angle free visual angle range, and the multi-angle free visual angle range is a range supporting the switching and watching of the virtual viewpoints of the region to be watched.

The embodiment of the invention also provides a computer readable storage medium, which stores computer instructions, and the computer instructions execute the steps of the multi-angle free visual angle data processing method when running.

The embodiment of the invention also provides a terminal, which comprises a memory and a processor, wherein the memory is stored with computer instructions capable of being run on the processor, and the processor executes the steps of the multi-angle free view data processing method when running the computer instructions.

An embodiment of the present invention further provides a mobile device, including a communication component, a processor, and a display component: the communication component is used for receiving multi-angle free visual angle data, and the multi-angle free visual angle data comprises a data header file and the data file; the processor is used for rendering based on the multi-angle free visual angle data to obtain display content, and a virtual viewpoint of the display content is selected from the multi-angle free visual angle range; the display component is used for displaying the display content.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, the data header file is analyzed, the definition format of the data file is determined according to the analysis result, the data combination is obtained from the data file based on the definition format, the data combination comprises the pixel data and the depth data of a plurality of synchronous images, the visual angles of the plurality of synchronous images in the area to be watched are different, and the image or video reconstruction of the virtual viewpoint can be carried out based on the read data combination, so that the user can be supported to carry out the switching watching of the virtual viewpoint on the area to be watched.

Drawings

FIG. 1 is a schematic diagram of a region to be viewed according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an arrangement of a collecting apparatus according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a multi-angle free-view display system according to an embodiment of the present invention;

FIG. 4 is a schematic illustration of a device display in an embodiment of the present invention;

FIG. 5 is a schematic diagram of an embodiment of the present invention;

FIG. 6 is a schematic diagram of another embodiment of the present invention;

FIG. 7 is a schematic diagram of another arrangement of the acquisition device in the embodiment of the present invention;

FIG. 8 is a schematic illustration of another manipulation of the apparatus in an embodiment of the present invention;

FIG. 9 is a schematic illustration of a display of another apparatus in an embodiment of the invention;

FIG. 10 is a flow chart of a method for setting up a collection device according to an embodiment of the present invention;

FIG. 11 is a diagram illustrating a multi-angle free viewing range according to an embodiment of the present invention;

FIG. 12 is a diagram illustrating another multi-angle free viewing range in an embodiment of the present invention;

FIG. 13 is a diagram illustrating another multi-angle free viewing range in an embodiment of the present invention;

FIG. 14 is a diagram illustrating another multi-angle free view range in an embodiment of the present invention;

FIG. 15 is a diagram illustrating another multi-angle free viewing range in an embodiment of the present invention;

FIG. 16 is a schematic diagram of another arrangement of the acquisition equipment in the embodiment of the invention;

FIG. 17 is a schematic diagram of another arrangement of the collecting apparatus in the embodiment of the present invention;

FIG. 18 is a schematic diagram of another arrangement of the acquisition equipment in the embodiment of the invention;

FIG. 19 is a flowchart of a multi-angle freeview data generating method according to an embodiment of the present invention;

FIG. 20 is a diagram illustrating distribution positions of pixel data and depth data of a single image according to an embodiment of the present invention;

FIG. 21 is a diagram illustrating distribution positions of pixel data and depth data of another single image according to an embodiment of the present invention;

FIG. 22 is a diagram illustrating distribution positions of pixel data and depth data of an image according to an embodiment of the present invention;

FIG. 23 is a diagram illustrating distribution positions of pixel data and depth data of another image according to an embodiment of the present invention;

FIG. 24 is a diagram illustrating distribution positions of pixel data and depth data of another image according to an embodiment of the present invention;

FIG. 25 is a diagram illustrating distribution positions of pixel data and depth data of another image according to an embodiment of the present invention;

FIG. 26 is a schematic illustration of image region stitching according to an embodiment of the present invention;

FIG. 27 is a schematic structural diagram of a stitched image in an embodiment of the present invention;

FIG. 28 is a schematic structural diagram of another stitched image in an embodiment of the present invention;

FIG. 29 is a schematic structural diagram of another stitched image in an embodiment of the present invention;

FIG. 30 is a schematic structural diagram of another stitched image in an embodiment of the present invention;

FIG. 31 is a schematic structural diagram of another stitched image in an embodiment of the present invention;

FIG. 32 is a schematic structural diagram of another stitched image in an embodiment of the present invention;

FIG. 33 is a diagram illustrating a pixel data distribution of an image in accordance with an embodiment of the present invention;

FIG. 34 is a schematic diagram of a pixel data distribution of another image in an embodiment of the invention;

FIG. 35 is a diagram illustrating data storage in a stitched image, in accordance with an embodiment of the present invention;

FIG. 36 is a schematic illustration of data storage in another stitched image in an embodiment of the present invention;

FIG. 37 is a flowchart illustrating a multi-angle freeview video data generating method according to an embodiment of the present invention;

FIG. 38 is a flowchart illustrating a multi-angle freeview data processing method according to an embodiment of the present invention;

FIG. 39 is a flowchart illustrating a method for reconstructing a virtual viewpoint image according to an embodiment of the present invention;

FIG. 40 is a schematic structural diagram of a multi-angle freeview processing apparatus according to an embodiment of the present invention;

FIG. 41 is a flowchart illustrating a multi-angle freeview image data processing method according to an embodiment of the present invention;

FIG. 42 is a flowchart illustrating a multi-angle freeview video data processing method according to an embodiment of the present invention;

FIG. 43 is a flowchart illustrating a multi-angle free-view interaction method according to an embodiment of the present invention;

FIG. 44 is a schematic illustration of another embodiment of the invention for operating a device;

FIG. 45 is a schematic illustration of a display of another device in an embodiment of the present invention;

FIG. 46 is a schematic illustration of another embodiment of the invention for operating the device;

FIG. 47 is a schematic illustration of a display of another apparatus in an embodiment of the invention;

FIG. 48 is a diagram illustrating a multi-angle freeview data generating process according to an embodiment of the present invention;

FIG. 49 is a schematic diagram of a multi-camera 6DoF acquisition system in an embodiment of the present invention;

FIG. 50 is a diagram illustrating the generation and processing of 6DoF video data according to an embodiment of the present invention;

FIG. 51 is a diagram illustrating a structure of a header file according to an embodiment of the present invention;

FIG. 52 is a diagram illustrating a user-side processing of 6DoF video data according to an embodiment of the present invention;

FIG. 53 is a schematic diagram of the inputs and outputs of a reference software in an embodiment of the invention;

FIG. 54 is a diagram of an algorithm architecture of a reference software according to an embodiment of the present invention.

Detailed Description

As mentioned above, in the field of image and video processing, image or video data can be received and displayed to a user or a video can be played. Such presentations and plays are typically based on a fixed perspective, and the user experience is yet to be enhanced.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

As an example embodiment of the present invention, the applicant discloses the following steps: the first step is acquisition and Depth Map Calculation, which includes three main steps, i.e., multi-Camera Video Capturing, camera Parameter Estimation, and Depth Map Calculation. For multi-camera acquisition, it is desirable that the video acquired by the various cameras be frame-level aligned. Referring to fig. 48 in combination, texture images (Texture images), i.e., synchronized images, described later, may be obtained by video capture with multiple cameras; camera parameters (Camera Parameter) can be obtained by calculating internal and external parameters of the Camera, including internal Parameter data and external Parameter data described later; through the Depth Map calculation, a Depth Map (Depth Map) can be obtained.

In this solution, no special camera, such as a light field camera, is required for video acquisition. Likewise, complicated camera calibration prior to acquisition is not required. The multiple cameras can be arranged and positioned to better capture objects or scenes to be captured. Referring to fig. 49 in combination, a plurality of capturing devices, for example, cameras 1 to N, may be provided in the area to be viewed.

After the above three steps are processed, the texture map, all the camera parameters and the depth map of each camera acquired from the multiple cameras are obtained. These three portions of data may be referred to as data files in multi-angle freeview video data, and may also be referred to as 6-degree-of-freedom video data (6 DoF video data). With the data, the user end can generate a virtual viewpoint according to a virtual 6 Degree of Freedom (DoF) position, thereby providing a video experience of 6 DoF.

Referring to fig. 50 in combination, the 6DoF video data and the indicative data can be compressed and transmitted to the user side, and the user side can obtain the 6DoF expression of the user side, that is, the aforementioned 6DoF video data and the metadata, according to the received data. Wherein indicative data may also be referred to as Metadata (Metadata),

referring to fig. 51 in combination, the metadata may be used to describe a data schema of the 6DoF video data, and specifically may include: stitching Pattern metadata (Stitching Pattern metadata) indicating storage rules of pixel data and depth data of a plurality of images in a stitched image; edge protection metadata (Padding pattern metadata), which may be used to indicate the way edge protection is performed in the stitched image, and Other metadata (Other metadata). The metadata may be stored in a header file, and the specific order of storage may be as shown in FIG. 51, or in other orders.

Referring to fig. 52, the client obtains 6DoF video data, which includes camera parameters, texture map and depth map, and description metadata (metadata), and in addition, interactive behavior data of the client. Through these data, the user side may perform 6DoF Rendering in a Depth Image-Based Rendering (DIBR) manner, so as to generate an Image of a virtual viewpoint at a specific 6DoF position generated according to a user behavior, that is, according to a user instruction, determine a virtual viewpoint at the 6DoF position corresponding to the instruction.

In one embodiment implemented at test time, each test case contains 20 seconds of video data, 30 frames/second, 1920 x 1080 resolution. For any of the 30 cameras, there are a total of 600 frames. The main folder contains a texture map folder and a depth map folder. Under the texture map folder, secondary directories from 0-599 can be found, which represent 600 frames of content for 20 seconds of video, respectively. Under each secondary directory, 30 texture maps acquired by the cameras are contained, and named from 0.yuv to 29.yuv in the format of yuv 420. Each depth map corresponds to a texture map by the same name. The texture maps and corresponding depth maps of multiple cameras are all of a certain frame instant in the 20 second video.

All the depth maps in the test case are generated by a preset depth estimation algorithm. In tests, these depth maps may provide good virtual viewpoint reconstruction quality over a virtual 6DoF position. In one case, the reconstructed image of the virtual viewpoint can be generated directly from the given depth map. Alternatively, the depth map may be generated or refined by a depth calculation algorithm from the original texture map.

The test case contains, in addition to the depth map and texture map, an sfm file, which is a parameter used to describe all 30 cameras. The data of this file is written in binary format, and the specific data format is described below. In consideration of adaptability to different video cameras, a fisheye camera model with distortion parameters is adopted in the test. Reference can be made to the DIBR reference software we provide to see how to read and use camera parameter data from the document. The camera parameter data contains some of the following fields:

(1) krt _ R is the rotation matrix of the camera;

(2) krt _ cc is the optical center position of the camera;

(3) krt _ world position is the three dimensional space coordinates of the camera;

(4) krt _ kc is the distortion coefficient of the camera;

(5) src _ width is the width of the calibration image;

(6) src _ height is the height of the calibration image;

(7) fishereye _ radius and lens _ fov are parameters of a fisheye camera.

In the technical solution of the present invention, the user can find out how to read the codes of the corresponding parameters in the sfm file in detail from the preset parameter reading function (set _ sfm _ parameters function).

In DIBR reference software, camera parameters, texture maps, depth maps, and the 6DoF position of the virtual camera are received as inputs, while the generated texture map and depth map at the virtual 6DoF position are output. The 6DoF position of the virtual camera is the aforementioned 6DoF position determined from the user behavior. The DIBR reference software may be software implementing virtual viewpoint-based image reconstruction in an embodiment of the present invention.

Referring to fig. 53 in conjunction, in the reference software, the camera parameters, texture map, depth map, and 6DoF position of the virtual camera are received as inputs, while the generated texture map and the generated depth map at the virtual 6DoF position are output.

Referring collectively to FIG. 54, the software may include several process steps: camera selection (Camera selection), forward mapping of Depth map (Depth map), depth map post-processing (Postprocessing), texture map reverse mapping (Backward Projection of Texture map), multi-Camera mapping Texture map Fusion (Texture Fusion), and hole filling of image (Inpainting).

In the reference software, the two cameras closest to the virtual 6DoF position may be selected by default for virtual viewpoint generation.

In the step of post-processing of the depth map, the quality of the depth map may be improved by a number of methods, such as foreground edge protection, filtering at the pixel level, etc.

For the output generated image, a method of fusing texture maps captured from two cameras is used. The fusion weight is a global weight determined by the distance of the position of the virtual viewpoint from the reference camera position. In the case where a pixel of the output virtual visual point image is mapped by only one camera, that mapped pixel may be directly employed as the value of the output pixel.

After the fusion step, if the pixels with holes are not mapped, the pixels with holes can be filled by adopting an image filling method.

For the output depth map, for convenience of error and analysis, a depth map mapped from one of the cameras to the virtual viewpoint position may be used as an output.

It is to be understood that the above examples are only illustrative and not restrictive of the specific embodiments, and the technical solutions of the present invention will be further described below.

Referring to fig. 1, the area to be watched may be a basketball court, and a plurality of collecting devices may be provided to collect data of the area to be watched.

For example, with combined reference to FIG. 2, it may be at a height H above the basket _LK Several acquisition devices are arranged along a certain path, for example, 6 acquisition devices may be arranged along an arc, that is, an acquisition device CJ ₁ To CJ ₆ . It is understood that the arrangement position, number and supporting manner of the collecting device can be various, and is not limited herein.

The acquisition devices may be cameras or video cameras capable of taking pictures synchronously, for example, may be cameras or video cameras capable of taking pictures synchronously via a hardware synchronization line. Data acquisition is carried out on the region to be watched through a plurality of acquisition devices, and a plurality of synchronous images or video streams can be obtained. According to the video streams collected by the plurality of collecting devices, a plurality of synchronous frame images can be obtained as a plurality of synchronous images. It will be appreciated that synchronisation is ideally meant to correspond to the same time instant, but that errors and deviations may also be tolerated.

With reference to fig. 3 in combination, in the embodiment of the present invention, data acquisition may be performed on an area to be viewed through an acquisition system 31 including a plurality of acquisition devices; the acquired synchronized images may be processed by the acquisition system 31 or by the server 32 to generate multi-angle free view data that can support the display device 33 to perform virtual viewpoint switching. The displaying device 33 may present reconstructed images generated based on the multi-angle free view data, the reconstructed images corresponding to virtual viewpoints, present reconstructed images corresponding to different virtual viewpoints according to a user instruction, and switch viewing positions and viewing angles.

In a specific implementation, the process of reconstructing the image to obtain the reconstructed image may be implemented by the device 33 for displaying, or may be implemented by a device located in a Content Delivery Network (CDN) in an edge computing manner. It is to be understood that fig. 3 is an example only and is not limiting of the acquisition system, the server, the device performing the display, and the specific implementation. The process of image reconstruction based on multi-angle freeview data will be described in detail later with reference to fig. 38 to 42, and will not be described in detail herein.

With reference to fig. 4, following the previous example, the user may watch the to-be-watched area through the display device, in this embodiment, the to-be-watched area is a basketball court. As described above, the viewing position and the viewing angle are switchable.

For example, the user may slide on the screen to switch the virtual viewpoint. In an embodiment of the present invention, with reference to fig. 5, when the user's finger slides the screen to the right, the virtual viewpoint for viewing can be switched. With continued reference to FIG. 2, the location of the virtual viewpoint prior to sliding may be VP ₁ After the sliding screen switches the virtual viewpoint, the position of the virtual viewpoint may be VP ₂ . Referring collectively to FIG. 6, after sliding the screen, the reconstructed image is displayed on the screenAs can be seen in fig. 6. The reconstructed image can be obtained by image reconstruction based on multi-angle free view data generated by data acquired by a plurality of acquisition devices in an actual acquisition situation.

It is to be understood that the image viewed before switching may be a reconstructed image. The reconstructed image may be a frame image in a video stream. In addition, the manner of switching the virtual viewpoint according to the user instruction may be various, and is not limited herein.

In a specific implementation, the virtual viewpoint may be represented by coordinates of 6 degrees of freedom, where the spatial position of the virtual viewpoint may be represented as (x, y, z), the viewing angle may be represented as three rotational directions (θ,

the virtual viewpoint is a three-dimensional concept, and three-dimensional information is required for generating a reconstructed image. In a specific implementation, the multi-angle freeview data may include depth data for providing third-dimensional information outside the planar image. The amount of depth data is small compared to other implementations, such as providing three-dimensional information through point cloud data. The specific implementation of generating the multi-angle freeview data will be described in detail later with reference to fig. 19 to 37, and will not be described in detail here.

In the embodiment of the invention, the switching of the virtual viewpoint can be performed within a certain range, namely a multi-angle free visual angle range. That is, within the multi-angle free view range, the virtual viewpoint position and the view can be arbitrarily switched.

The multi-angle free visual angle range is related to the arrangement of the collecting equipment, and the wider the shooting coverage range of the collecting equipment is, the larger the multi-angle free visual angle range is. The quality of the picture displayed by the display device is related to the number of the acquisition devices, and generally, the larger the number of the acquisition devices is, the fewer the hollow areas in the displayed picture are.

Referring to fig. 7, if two rows of upper and lower collecting devices with different heights are installed in the basketball court, the upper row of collecting devices CJ are respectively arranged ₁ To CJ ₆ And is given belowRow collecting device CJ ₁ To CJ ₆ Compared with the arrangement of only one row of collecting equipment, the multi-angle free visual angle range is larger.

Referring to fig. 8 in combination, the user's finger can slide upward, switching the virtual viewpoint from which to view. Referring collectively to fig. 9, after sliding the screen, the image of the screen presentation may be as shown in fig. 9.

In specific implementation, if only one row of collecting devices is arranged, a certain degree of freedom in the up-down direction can be obtained in the process of obtaining a reconstructed image through image reconstruction, and the multi-angle free visual angle range of the degree of freedom is smaller than that of the arrangement of two rows of collecting devices in the up-down direction.

It can be understood by those skilled in the art that the foregoing embodiments and the corresponding drawings are only exemplary and illustrative, and do not limit the setting of the capturing device and the association relationship between the multi-angle free viewing angle ranges, nor the operation manner and the obtained display effect of the device for displaying. The specific implementation of performing virtual viewpoint switching viewing on the region to be viewed according to the user instruction will be further described in detail with reference to fig. 43 to 47, and will not be described herein again.

The following further elaborations are made in particular with regard to the method of setting up the acquisition device.

Fig. 10 is a flowchart of a method for setting acquisition equipment in an embodiment of the present invention, which may specifically include the following steps:

step S101, determining a multi-angle free visual angle range, and supporting the switching and watching of virtual viewpoints of an area to be watched in the multi-angle free visual angle range;

and S102, determining a setting position of acquisition equipment at least according to the multi-angle free visual angle range, wherein the setting position is suitable for setting the acquisition equipment and carrying out data acquisition on the area to be watched.

It will be understood by those skilled in the art that a fully free viewing angle may refer to a 6-degree-of-freedom viewing angle, i.e. a spatial position and a viewing angle at which a user can freely switch a virtual viewpoint at a device performing a display. Wherein the spatial position of the virtual viewpoint may be represented as (x, y, z),the viewing angle can be expressed as three rotational directions (theta,

) And has 6 freedom directions, so the method is called 6-freedom view angle.

As described above, in the embodiment of the present invention, the switching of the virtual viewpoint may be performed within a certain range, which is a multi-angle free view range. That is, within the multi-angle free view range, the virtual viewpoint position and the view can be arbitrarily switched.

The multi-angle free visual angle range can be determined according to the requirements of application scenes. For example, in some scenarios, the area to be viewed may have a core viewpoint, such as the center of a stage, or the center point of a basketball court, or the basket of a basketball court, etc. In these scenarios, the multi-angle freeview range may include a planar or stereoscopic region that contains the core viewpoint. It is understood that the region to be viewed may be a point, a plane or a stereoscopic region, and is not limited thereto.

As mentioned above, the multi-angle free viewing angle range may be a variety of regions, which will be further exemplified below with reference to fig. 11 to 15.

Referring to fig. 11, the core viewpoint is represented by O point, and the multi-angle free view range may be a sector area, such as sector area a, with the core viewpoint as the center of the circle and located on the same plane as the core viewpoint ₁ OA ₂ Or sector area B ₁ OB ₂ Or a circular surface centered on the point O.

Takes the multi-angle free visual angle range as a sector area A ₁ OA ₂ For example, the position of the virtual viewpoint may be continuously switched within the area, e.g., from A ₁ Along the arc segment A ₁ A ₂ Continuously switching to A ₂ Alternatively, the arc line segment L may be followed ₁ L ₂ Switching is performed or otherwise the position is switched within the multi-angle free view range. Accordingly, the view angle of the virtual viewpoint may also be transformed within the region.

Further reference to the drawings12, the core viewpoint may be a central point E of the basketball court, and the multi-angle free view range may be a sector area, such as sector area F, with the central point E as a center of circle and located on the same plane as the central point E ₁₂₁ EF ₁₂₂ . The center point E of the basketball court may be located on the ground of the court, or the center point E of the basketball court may be a certain height from the ground. Arc end point F of sector area ₁₂₁ And arc end point F ₁₂₂ May be the same, e.g. height H in the figure ₁₂₁ 。

Referring to FIG. 13 in conjunction, where the core viewpoint is represented by point O, the multi-angle free view range may be a portion of a sphere centered at the core viewpoint, e.g., region C ₁ C ₂ C ₃ C ₄ Indicating a partial area of a sphere, the multi-angle free view range may be area C ₁ C ₂ C ₃ C ₄ The stereo range formed with point O. Any point within the range can be used as the position of the virtual viewpoint.

With further reference to FIG. 14, the core viewpoint may be a center point E of the basketball court, and the multi-angle viewing range may be a portion of a sphere centered at center point E, such as area F ₁₃₁ F ₁₃₂ F ₁₃₃ F ₁₃₄ Indicating a partial area of a sphere, the multi-angle free view range may be area F ₁₃₁ F ₁₃₂ F ₁₃₃ F ₁₃₄ A stereo range formed with the center point E.

In a scene with a core viewpoint, the position of the core viewpoint may be various, and the multi-angle free viewing angle range may also be various, which is not listed here. It is to be understood that the above embodiments are only examples and are not limiting on the multi-angle free view range, and the shapes shown therein are not limiting on actual scenes and applications.

In specific implementation, the core viewpoint may be determined according to a scene, in one shooting scene, there may also be multiple core viewpoints, and the multi-angle free view range may be a superposition of multiple sub-ranges.

In other application scenarios, the multi-angle free view range may also be coreless, for example, in some application scenarios, it is desirable to provide multi-angle free view viewing of historic buildings or exhibition of paintings. Accordingly, the multi-angle free view range can be determined according to the needs of the scenes.

It is understood that the shape of the free view range may be arbitrary, and any point within the multi-angle free view range may be used as a position.

Referring to FIG. 15, the multi-angle free view range may be a cube D ₁ D ₂ D ₃ D ₄ D ₅ D ₆ D ₇ D ₈ The region to be viewed is a surface D ₁ D ₂ D ₃ D ₄ Then cube D ₁ D ₂ D ₃ D ₄ D ₅ D ₆ D ₇ D ₈ Any point in the virtual viewpoint can be used as the position of the virtual viewpoint, and the visual angle of the virtual viewpoint, that is, the viewing angle, can be various. For example, may be on the surface D ₅ D ₆ D ₇ D ₈ Selecting position E ₆ Along E ₆ D ₁ Can also be viewed along E ₆ D ₉ Viewed at an angle of (D), point D ₉ Selected from the area to be viewed.

In a specific implementation, after the multi-angle free view range is determined, the position of the acquisition equipment can be determined according to the multi-angle free view range.

Specifically, the setting position of the capturing device may be selected within the multi-angle free view range, for example, the setting position of the capturing device may be determined in a boundary point of the multi-angle free view range.

Referring to fig. 16, the core viewpoint may be a central point E of the basketball court, and the multi-angle free view range may be a sector area, such as sector area F, with the central point E as a center of circle and located on the same plane as the central point E ₆₁ EF ₆₂ . The acquisition device may be arranged within a multi-angular viewing angle range, e.g. along arc F ₆₅ F ₆₆ And (4) setting. The image reconstruction can be performed by using an algorithm in the area not covered by the acquisition equipment. In particular embodiments, the collectionThe device may also follow an arc F ₆₁ F ₆₂ And setting acquisition equipment at the end point of the arc line so as to improve the quality of the reconstructed image. Each collection device may be positioned toward a center point E of the basketball court. The position of the acquisition device may be represented by spatial position coordinates and the orientation may be represented by three rotational directions.

In specific implementation, the number of the settable setting positions can be 2 or more, and correspondingly, 2 or more acquisition devices can be set. The number of the acquisition devices can be determined according to the quality requirement of the reconstructed image or video. In scenes with higher picture quality requirements for the reconstructed image or video, the number of capture devices may be greater, while in scenes with lower picture quality requirements for the reconstructed image or video, the number of capture devices may be less.

With continued reference to fig. 16, it can be appreciated that the reduction of voids in the reconstructed picture, in the pursuit of higher reconstructed image or video picture quality, can be along an arc F ₆₁ F ₆₂ A greater number of acquisition devices are provided, for example, 40 cameras may be provided.

Referring to fig. 17 in combination, the core viewpoint may be a center point E of the basketball court, and the multi-angle view range may be a portion of a sphere centered at the center point E, such as a region F ₆₁ F ₆₂ F ₆₃ F ₆₄ Indicating a partial area of a sphere, the multi-angle free view range may be area F ₆₁ F ₆₂ F ₆₃ F ₆₄ A solid range formed with the center point E. The acquisition device may be arranged within a multi-angle view range, e.g. along arc F ₆₅ F ₆₆ And arc F ₆₇ F ₆₈ And (4) setting. Similar to the previous example, the image reconstruction may be performed using an algorithm in the area not covered by the acquisition device. In particular implementations, the acquisition device may also follow an arc F ₆₁ F ₆₂ And arc line F ₆₃ F ₆₄ And setting acquisition equipment at the end point of the arc line so as to improve the quality of the reconstructed image.

Each collection device may be positioned toward a center point E of the basketball court. It is understood that althoughNot shown in the figures, the number of acquisition devices may be along the arc F ₆₁ F ₆₂ And arc line F ₆₃ F ₆₄ More.

As mentioned before, in some application scenarios, the region to be viewed may comprise a core viewpoint, and correspondingly, the multi-angle free view range comprises a region where views point to the core viewpoint. In this application scenario, the position of the capture device may be selected from an arc-shaped region with the direction of the depression pointing towards the core viewpoint.

When the region to be watched comprises the core watching point, the setting position is selected in the arc region pointing to the core watching point in the depression direction, so that the acquisition equipment is arranged in an arc shape. Because the watching area comprises the core watching point, and the visual angle points to the core watching point, in the scene, the arc-shaped arrangement of the acquisition equipment can adopt less acquisition equipment to cover a larger multi-angle free visual angle range.

In a specific implementation, the setting position of the acquisition device can be determined by combining the view angle range and the boundary shape of the region to be watched. For example, the setting position of the capturing device may be determined at preset intervals along the boundary of the region to be viewed within the viewing angle range.

Referring to FIG. 18 in combination, the multi-angle view range may be coreless, e.g., the virtual viewpoint position may be selected from hexahedrons F ₈₁ F ₈₂ F ₈₃ F ₈₄ F ₈₅ F ₈₆ F ₈₇ F ₈₈ And viewing the region to be viewed from the virtual viewpoint position. The boundary of the region to be viewed can be the ground boundary line of the court. The collecting equipment can be arranged along the intersection B of the ground boundary line and the region to be watched ₈₉ B ₉₄ Arranged, for example, in position B ₈₉ To position B ₉₄ 6 acquisition devices are provided. The degree of freedom in the up-down direction can be realized by an algorithm, or the horizontal projection position can be an intersecting line B ₈₉ B ₉₄ And then a row of collecting equipment is arranged.

In specific implementation, the multi-angle free viewing angle range can also support the viewing of the region to be viewed from the upper side of the region to be viewed, which is a direction away from the horizontal plane.

Correspondingly, can carry on collection equipment through unmanned aerial vehicle to set up collection equipment at the upside that waits to watch the region, also can set up collection equipment at the top of waiting to watch the building at region place, the top does the structure of building in the direction of keeping away from the horizontal plane.

For example, the collecting device can be arranged at the top of a basketball court or can be carried by a unmanned aerial vehicle to hover on the upper side of the basketball court. Can set up collection equipment at the venue top at stage place, perhaps also can carry on through unmanned aerial vehicle.

By arranging the acquisition equipment at the upper side of the area to be watched, the multi-angle free visual angle range can comprise visual angles above the area to be watched.

In a specific implementation, the capturing device may be a camera or a video camera, and the captured data may be picture or video data.

It will be appreciated that the manner of arranging the collecting device in the setting position may be varied, for example, it may also be supported in the setting position by a supporting frame, or it may be in other arrangements.

In addition, it is to be understood that the above embodiments are only for illustration and not to limit the arrangement of the acquisition device. In various application scenes, the specific implementation modes of determining the setting position of the acquisition equipment and setting the acquisition equipment for acquisition according to the multi-angle free visual angle range are all within the protection scope of the invention.

The following is further described with particular reference to a method of generating multi-angle freeview data.

As mentioned above, with continued reference to fig. 3, the acquisition system 31 or the server 32 may process the acquired synchronized multiple images to generate multi-angle free view data capable of supporting the display device 33 to perform virtual viewpoint switching, where the multi-angle free view data may indicate third-dimensional information outside the two-dimensional image through depth data.

Specifically, referring to fig. 19 in combination, the generating of the multi-angle freeview data may include the steps of:

step S191, a plurality of synchronous images are obtained, and shooting angles of the images are different;

step S192 of determining depth data of each image based on the plurality of images;

step S193, for each of the images, storing pixel data of each image in a first field, and storing the depth data in at least one second field associated with the first field.

The synchronized plurality of images may be images captured by a camera or frame images in video data captured by a video camera. In generating the multi-angle freeview data, depth data for each image may be determined based on the plurality of images.

Wherein the depth data may comprise depth values corresponding to pixels of the image. The distance of the acquisition device to each point in the area to be viewed can be used as the above-mentioned depth value, which can directly reflect the geometry of the visible surface in the area to be viewed. The depth value may be a distance of each point in the area to be viewed along the optical axis of the camera to the optical center, and the origin of the camera coordinate system may be the optical center. It will be appreciated by those skilled in the art that the distance may be a relative value, with multiple images on the same basis.

Further, the depth data may include depth values corresponding one-to-one to the pixels of the image, or may be partial values selected from a set of depth values corresponding one-to-one to the pixels of the image.

It will be understood by those skilled in the art that the depth value set may be stored in the form of a depth map, in a specific implementation, the depth data may be obtained by down-sampling an original depth map, and the image in which the depth value sets in one-to-one correspondence with pixels of the image are stored in the form of a pixel point arrangement of the image is the original depth map.

In a specific implementation, the pixel data of the image stored in the first field may be raw image data, such as data acquired from an acquisition device, or may be data obtained by reducing the resolution of the raw image data. Further, the pixel data of the image may be original pixel data of the image, or pixel data after resolution reduction. The pixel data of the image may be YUV data or RGB data, or may be other data capable of expressing the image.

In a specific implementation, the depth data stored in the second field may be the same as or different from the number of pixels corresponding to the pixel data of the image stored in the first field. The number can be determined according to the bandwidth limitation of data transmission with the equipment end for processing the multi-angle free visual angle image data, and if the bandwidth is smaller, the data volume can be reduced by the modes of down-sampling or resolution reduction and the like.

In a specific implementation, for each of the images, the pixel data of the image may be sequentially stored in a plurality of fields according to a preset sequence, and the fields may be consecutive or may be spaced apart from the second field. A field storing pixel data of an image may be the first field. The following examples are given.

Referring to fig. 20, pixel data of an image, which is illustrated by pixels 1 to 6 in the figure and other pixels not shown, may be stored in a plurality of consecutive fields in a predetermined order, and the consecutive fields may be used as a first field; the depth data corresponding to the image, indicated by depth values 1 to 6 in the image and other depth values not shown, may be stored in a plurality of consecutive fields in a predetermined order, and these consecutive fields may be used as the second field. The preset sequence may be sequentially stored line by line according to the distribution positions of the image pixels, or may be other sequences.

Referring to fig. 21, pixel data of one image and corresponding depth values may be alternately stored in a plurality of fields. A plurality of fields storing pixel data may be used as a first field, a plurality of fields storing depth values may be used as a second field.

In a particular implementation, the depth data is stored, and may be stored in the same order as the pixel data of the image is stored, such that each field in the first field may be associated with each field in the second field. And the depth value corresponding to each pixel can be embodied.

In particular implementations, pixel data for multiple images and depth data may be stored in a variety of ways. The following examples are further described below.

Referring collectively to fig. 22, the individual pixels of image 1, illustrated as image 1 pixel 1, image 1 pixel 2, and other pixels not shown, may be stored in a continuous field, which may serve as the first field. The depth data of image 1, illustrated as image 1 depth value 1, image 1 depth value 2 shown in the figure, and other depth data not shown, may be stored in fields adjacent to the first field, which may serve as the second field. Similarly, for pixel data of image 2, it may be stored in a first field and depth data of image 2 may be stored in an adjacent second field.

It can be understood that each image in the image stream continuously acquired by one acquisition device of the synchronized multiple acquisition devices, or each frame image in the video stream, may be respectively used as the image 1; similarly, among the plurality of acquisition devices synchronized, an image acquired in synchronization with the image 1 may be the image 2. The acquisition device may be an acquisition device as in fig. 2, or an acquisition device in other scenarios.

Referring to fig. 23 in combination, the pixel data of image 1 and the pixel data of image 2 may be stored in a plurality of adjacent first fields, and the depth data of image 1 and the depth data of image 2 may be stored in a plurality of adjacent second fields.

Referring to fig. 24 in combination, the pixel data of each of the plurality of images may be stored in a plurality of fields, respectively, which may be referred to as a first field. The field storing the pixel data may be arranged to intersect with the field storing the depth value.

With reference to fig. 25, the pixel data and depth values of different images may be arranged in a crossed manner, for example, image 1 pixel 1, image 1 depth value 1, image 2 pixel 1, and image 2 depth value 1 \8230maybe stored in sequence until the pixel data and depth value corresponding to the first pixel of each of the plurality of images is completed, and the adjacent fields thereof store image 1 pixel 2, image 2 pixel 2, and image 2 depth value 2 \8230, until the storage of the pixel data and depth data of each image is completed.

In summary, the field storing the pixel data of each image may be used as the first field, and the field storing the depth data of the image may be used as the second field. For each image, a first field and a second field associated with the first field may be stored.

It will be appreciated by those skilled in the art that the various embodiments described above are merely examples and are not specific limitations on the types, sizes, and arrangements of fields.

Referring to fig. 3 in combination, the multi-angle freeview data including the first field and the second field may be stored in a server 32 at the cloud, and transmitted to the CDN or a device 33 for display, so as to perform image reconstruction.

In a specific implementation, the first field and the second field may both be pixel fields in a stitched image, and the stitched image is used to store pixel data of the plurality of images and the depth data. By adopting the image format for data storage, the data volume can be reduced, the data transmission duration can be reduced, and the resource occupation can be reduced.

The stitched image may be an image in a variety of formats, such as BMP format, JPEG format, PNG format, and the like. These image formats may be compressed formats or may be uncompressed formats. It will be appreciated by those skilled in the art that images of various formats may include fields, referred to as pixel fields, corresponding to individual pixels. The size of the stitched image, that is, parameters such as the number of pixels contained in the stitched image, the aspect ratio, and the like, may be determined as needed, and specifically may be determined according to the number of the synchronized multiple images, the data amount to be stored in each image, the data amount of the depth data to be stored in each image, and other factors.

In a specific implementation, the depth data corresponding to the pixels of each image and the number of bits of the pixel data in the synchronized images may be associated with the format of the stitched image.

For example, when the format of the stitched image is the BMP format, the depth value may range from 0 to 255 and is 8 bits of data, and the data may be stored as the grayscale value in the stitched image; alternatively, the depth value may be 16-bit data, and may be stored as a gray value at two pixel positions in the stitched image, or stored in two channels at one pixel position in the stitched image.

When the format of the stitched image is the PNG format, the depth value may also be 8bit or 16bit data, and in the PNG format, the 16bit depth value may be stored as a gray value of a pixel position in the stitched image.

It is to be understood that the above embodiments are not limited to the storage manner or the number of data bits, and other data storage manners that can be implemented by those skilled in the art all fall within the scope of the present invention.

In a specific implementation, the stitched image may be divided into an image region and a depth map region, a pixel field of the image region stores pixel data of the plurality of images, and a pixel field of the depth map region stores depth data of the plurality of images; the image area stores a pixel field of pixel data of each image as the first field, and the depth map area stores a pixel field of depth data of each image as the second field.

In a specific implementation manner, the image region may be a continuous region, and the depth map region may also be a continuous region.

Further, in a specific implementation, the spliced image may be divided equally, and the two divided parts are respectively used as the image area and the depth map area. Alternatively, the stitched image may be divided in a non-equally divided manner according to the pixel data amount and the depth data amount of the image to be stored

For example, referring to fig. 26, each minimum square indicates one pixel, the image area may be an area 1 within a dashed line frame, that is, an upper half area of the stitched image after being divided into upper and lower halves, and a lower half area of the stitched image may be used as a depth map area.

It is to be understood that fig. 26 is merely illustrative, and the minimum number of squares is not a limitation on the number of pixels in the mosaic image. In addition, the way of dividing equally may be to divide the stitched image equally left and right.

In a specific implementation, the image region may include a plurality of image sub-regions, each image sub-region for storing one of the plurality of images, and a pixel field of each image sub-region may be used as the first field; accordingly, the depth map region may include a plurality of depth map sub-regions, each for storing depth data of one of the plurality of images, and a pixel field of each depth map sub-region may serve as the second field.

Wherein the number of image sub-regions and the number of depth map sub-regions may be equal, both being equal to the number of synchronized multiple images. In other words, it may be equal to the number of cameras described above.

The splicing image is further described with reference to fig. 27 by taking the vertical bisector of the splicing image as an example. The upper half of the stitched image in fig. 27 is an image area, which is divided into 8 image sub-areas, and the image sub-areas store the pixel data of the synchronized 8 images, and the shooting angles of the images are different, that is, the viewing angles are different. The lower half part of the spliced image is a depth map area which is divided into 8 depth map sub-areas, and the depth maps of the 8 images are respectively stored.

In combination with the foregoing, the pixel data of the 8 synchronized images, that is, the view 1 image to the view 8 image, may be the original image acquired from the camera, or may also be the image of the original image with reduced resolution. The depth data is stored in a partial region of the stitched image and may also be referred to as a depth map.

As described above, in an implementation, the stitched image may also be divided in a non-halving manner. For example, referring to fig. 28, the depth data may occupy a smaller number of pixels than the pixel data of the image, and the image region and the depth map region may be of different sizes. For example, the depth data may be obtained by performing quarter down-sampling on the depth map, and then the division manner as shown in fig. 28 may be adopted. The number of pixels occupied by the depth map may also be greater than the detailed number occupied by the pixel data of the image.

It is to be understood that fig. 28 is not limited to dividing the stitched image in a non-uniform manner, and in a specific implementation, the pixel amount and the aspect ratio of the stitched image may be various, and the dividing manner may also be various.

In a specific implementation, the image region or the depth map region may also include a plurality of regions. For example, as shown in fig. 29, the image region may be one continuous region, and the depth map region may include two continuous regions.

Alternatively, referring to fig. 30 and 31, the image region may include two continuous regions, and the depth map region may also include two continuous regions. The image region and the depth region may be arranged at intervals.

Still alternatively, referring to fig. 32, the image sub-regions included in the image region may be arranged at intervals from the depth map sub-regions included in the depth map region. The number of contiguous regions comprised by the image region may be equal to the image sub-region, and the number of contiguous regions comprised by the depth map region may be equal to the depth map sub-region.

In a specific implementation, for the pixel data of each image, the pixel data may be stored to the image sub-region according to the order of pixel point arrangement. For the depth data of each image, the depth data can also be stored in the depth map sub-area according to the arrangement sequence of the pixel points.

With combined reference to fig. 33-35, image 1 is illustrated with 9 pixels in fig. 33, image 2 is illustrated with 9 pixels in fig. 34, and image 1 and image 2 are two images at different angles in synchronization. According to the image 1 and the image 2, the depth data corresponding to the image 1, including the depth value 1 of the image 1 to the depth value 9 of the image 1, can be obtained, and the depth data corresponding to the image 2, including the depth value 1 of the image 2 to the depth value 9 of the image 2, can also be obtained.

Referring to fig. 35, when storing the image 1 in the image sub-region, the image 1 may be stored in the upper left image sub-region according to the order of the arrangement of the pixels, that is, in the image sub-region, the arrangement of the pixels may be the same as that of the image 1. The image 2 is stored to the image sub-area, also to the upper right image sub-area in this way.

Similarly, storing the depth data of the image 1 to the depth map sub-region may be in a similar manner, and in the case where the depth values correspond to the pixel values of the image one to one, may be in a manner as illustrated in fig. 35. If the depth value is obtained by downsampling the original depth map, the depth value can be stored in the sub-region of the depth map according to the sequence of pixel point arrangement of the depth map obtained by downsampling.

As will be understood by those skilled in the art, the compression rate for compressing an image is related to the association of each pixel in the image, and the stronger the association, the higher the compression rate. Because the image obtained by shooting corresponds to the real world, the relevance of each pixel point is strong, and the pixel data and the depth data of the image are stored according to the arrangement sequence of the pixel points, so that the compression rate is higher when the spliced image is compressed, namely, the data volume after compression is smaller under the condition that the data volume before compression is the same.

By dividing the spliced image into an image area and a depth map area, under the condition that a plurality of image sub-areas are adjacent in the image area or a plurality of depth map sub-areas are adjacent in the depth map area, because data stored in each image sub-area is obtained from images or frame images in a video shot from a region to be watched at different angles, and depth maps are stored in the depth map area, a higher compression rate can be obtained when the spliced image is compressed.

In a specific implementation, all or part of the image sub-region and the depth map sub-region may be edge protected. The form of edge protection may be various, for example, taking the depth map of view 1 in fig. 31 as an example, redundant pixels may be disposed at the periphery of the depth map of original view 1; or the number of pixels of the original view 1 depth image is kept unchanged, redundant pixels which do not store actual pixel data are reserved on the periphery of the original view 1 depth image, and the original view 1 depth image is reduced and then stored in the rest pixels; or in other ways, finally leaving redundant pixels between the view 1 depth map and other images around it.

Because the spliced image comprises a plurality of images and depth maps, the relevance of the adjacent boundaries of the images is poor, and the quality loss of the images and the depth maps in the spliced image can be reduced by performing edge protection when the spliced image is compressed.

In implementations, the pixel field of the image sub-region may store three channels of data and the pixel field of the depth map sub-region may store single channel data. The pixel field of the image sub-region is used to store pixel data of any one of the plurality of synchronized images, the pixel data typically being three channel data, such as RGB data or YUV data.

The depth map sub-area is used for storing the depth data of the image, if the depth value is 8-bit binary data, a single channel of the pixel field can be used for storing, and if the depth value is 16-bit binary data, a double channel of the pixel field can be used for storing. Alternatively, the depth values may be stored with a larger pixel area. For example, if the plurality of images to be synchronized are all 1920 × 1080 images and the depth value is 16-bit binary data, the depth value may be stored in 2 times of the 1920 × 1080 image area, and each image area may be stored as a single channel. The stitched image may also be divided in combination with the specific storage manner.

The uncompressed data volume of the mosaic image is stored in a manner that each channel of each pixel occupies 8 bits, and can be calculated according to the following formula: number of synchronized multiple images (data amount of pixel data of image + data amount of depth map).

If the original image has a resolution of 1080P, i.e. 1920 × 1080 pixels, and the original depth map is in a progressive format, the original depth map may also occupy 1920 × 1080 pixels and be a single channel. The pixel data amount of the original image is: 1920 × 1080 × 8 × 3bit, the data amount of the original depth map is 1920 × 1080 × 8bit, if the number of the cameras is 30, the pixel data amount of the spliced image is 30 × 1920 × 1080 (8 × 3+1920 × 1080 × 8) bit and is about 237M, and if the spliced image is not compressed, more system resources are occupied and the delay is longer. Especially, in the case of a small bandwidth, for example, when the bandwidth is 1Mbps, an uncompressed mosaic image needs about 237s for transmission, the real-time performance is poor, and the user experience needs to be improved.

The data volume of the spliced image can be reduced by one or more of the ways of regularly storing to obtain a higher compression rate, reducing the resolution of the original image, taking the pixel data after the resolution reduction as the pixel data of the image, or performing down-sampling on one or more of the original depth maps.

For example, if the original image has a resolution of 4K, i.e., a pixel resolution of 4096 × 2160, and is down-sampled to a resolution of 540P, i.e., a pixel resolution of 960 × 540, the number of pixels of the stitched image is about one sixteenth of the number before down-sampling. The amount of data may be made smaller in combination with any one or more of the other ways of reducing the amount of data described above.

It can be understood that if the bandwidth is supported and the decoding capability of the device performing data processing can support a stitched image with a higher resolution, a stitched image with a higher resolution can also be generated to improve the image quality.

It will be understood by those skilled in the art that, in different application scenarios, the synchronized pixel data and depth data of multiple images may also be stored in other manners, for example, in units of pixel points stored in a stitched image. Referring to fig. 33, 34, and 36, for the image 1 and the image 2 shown in fig. 33 and 34, it may be stored to the stitched image in the manner of fig. 36.

In summary, the pixel data and the depth data of the image may be stored in the stitched image, and the stitched image may be divided into the image region and the depth map region in various ways, or the pixel data and the depth data of the image may be stored in a preset order without being divided.

In a specific implementation, the synchronized plurality of images may also be a synchronized plurality of frame images obtained by decoding a plurality of videos. The video may be acquired by a plurality of video cameras, and the settings may be the same as or similar to those of the cameras acquiring images in the foregoing.

In a specific implementation, the generating of the multi-angle freeview image data may further include generating an association relation field, and the association relation field may indicate an association relation of the first field with the at least one second field. The first field stores pixel data of one image of a plurality of synchronous images, and the second field stores depth data corresponding to the image, wherein the pixel data and the depth data correspond to the same shooting angle, namely the same visual angle. The association relationship between the two can be described by the association relationship field.

Taking fig. 27 as an example, in fig. 27, the area storing the view 1 image to the view 8 image is 8 first fields, the area storing the view 1 depth map to the view 8 depth map is 8 second fields, and for the first field storing the view 1 image, there is an association relationship with the second field storing the view 1 depth map, and similarly, there is an association relationship between the field storing the view 2 image and the field storing the view 2 depth map.

The association relation field may indicate an association relation between the first field and the second field of each of the synchronized multiple images in various ways, and specifically may be a content storage rule of the pixel data and the depth data of the synchronized multiple images, that is, by indicating the storage way described in the foregoing, the association relation between the first field and the second field is indicated.

In a specific implementation, the association relationship field may only contain different mode numbers, and the device performing data processing may obtain the storage manner of the pixel data and the depth data in the acquired multi-angle free-viewing angle image data according to the mode number of the field and the data stored in the device performing data processing. For example, if the received pattern number is 1, the storage method is analyzed as follows: the spliced image is equally divided into an upper area and a lower area, the upper half area is an image area, the lower half area is a depth map area, and the image at a certain position of the upper half area is associated with the depth map stored at the corresponding position of the lower half area.

It can be understood that, in the foregoing embodiments, the storage modes for storing the stitched images, for example, the storage modes illustrated in fig. 27 to fig. 36, may all be described by corresponding association relation fields, so that the device for processing data may obtain the associated images and depth data according to the association relation fields.

As described above, the picture format of the stitched image may be any one of BMP, PNG, JPEG, webp, and other image formats, or may be other image formats. The storage mode of the pixel data and the depth data in the multi-angle free-view image data is not limited to the mode of splicing images. The storage can be performed in various ways, and there can also be corresponding association relation field description.

Similarly, the storage mode may be indicated by a mode number. For example, as shown in fig. 23, the association field may store a pattern number 2, and after the device performing data processing reads the pattern number, it may be analyzed that the pixel data of the synchronized multiple images are sequentially stored, and the lengths of the first field and the second field are analyzed, and after the storage of the multiple first fields is finished, the depth data of each image is stored in the same storage order as the image. And the data processing device can determine the association relationship between the pixel data and the depth data of the image according to the association relationship field.

It is understood that the storage manner of the pixel data and the depth data of the synchronized images may be various, and the expression manner of the association relation field may also be various. The content may be indicated by the mode number or may be directly indicated. The device performing data processing may determine the association relationship between the pixel data of the image and the depth data according to the content of the association relationship field, in combination with the stored data or other a priori knowledge, for example, the content corresponding to each pattern number, or the specific number of the synchronized multiple images.

In a specific implementation, the generating of the multi-angle freeview image data may further include: based on the synchronized plurality of images, parameter data of each image is calculated and stored, the parameter data including photographing position and photographing angle data of the image.

The equipment for processing data can determine a virtual viewpoint in the same coordinate system with the equipment according to the needs of a user by combining the shooting position and the shooting angle of each image in a plurality of synchronous images, reconstruct the images based on multi-angle free view image data, and show the expected viewing position and view angle for the user.

In a specific implementation, the parameter data may also include internal parameter data including attribute data of a photographing device of the image. The aforementioned shooting position and shooting angle data of the image may also be referred to as external parameter data, and the internal parameter data and the external parameter data may be referred to as attitude data. By combining the internal parameter data and the external parameter data, the factors indicated by the internal parameter data such as lens distortion and the like can be considered during image reconstruction, and the image of the virtual viewpoint can be reconstructed more accurately.

In a specific implementation, the generating of the multi-angle freeview image data may further include: generating a parameter data storage address field, wherein the parameter data storage address field is used for indicating the storage address of the parameter data. The apparatus performing data processing may acquire the parameter data from the storage address of the parameter data.

In a specific implementation, the generating of the multi-angle freeview image data may further include: a data combination storage address field is generated for indicating a storage address of the data combination, i.e. indicating a storage address of the first field and the second field of each of the plurality of images being synchronized. The device for processing data may acquire the synchronized pixel data and depth data of the plurality of images from the storage space corresponding to the storage address of the data combination, and from this point of view, the data combination includes the synchronized pixel data and depth data of the plurality of images.

It is understood that the multi-angle freeview image data may include specific data such as pixel data, depth data, and parameter data of an image, and other indicative data such as the aforementioned generation association field, parameter data storage address field, data combination storage address field, and the like. These indicative data may be stored in a header file to indicate to the device performing the data processing to obtain the data combination, as well as parameter data, etc.

In particular, the noun explanations, specific implementations, and advantageous effects referred to in the embodiments of generating multi-angle freeview data may refer to other embodiments, and various particular implementations in a multi-angle freeview interaction method may be implemented in combination with other embodiments.

The multi-angle freeview data may be multi-angle freeview video data, and is described further below with particular reference to a method of generating multi-angle freeview video data.

Referring to fig. 37 in combination, the multi-angle freeview video data generating method may include the steps of:

step S371, acquiring a plurality of videos with frame synchronization, the plurality of videos having different shooting angles;

step S372, analyzing each video to obtain image combinations of a plurality of frame moments, wherein the image combinations comprise a plurality of frame images with synchronous frames;

step S373, determining depth data of each frame image in the image combination based on the image combination of each frame time in the plurality of frame times;

step S374, generating a spliced image corresponding to each frame time, wherein the spliced image comprises a first field for storing the pixel data of each frame image in the image combination and a second field for storing the depth data of each frame image in the image combination;

in step S375, video data is generated based on the plurality of stitched images.

In this embodiment, the capturing device may be a video camera, and a plurality of videos with frame synchronization may be acquired by a plurality of video cameras. Each video includes frame images at a plurality of frame times, and a plurality of image combinations may respectively correspond to different frame times, each image combination including a plurality of frame images that are frame-synchronized.

In a particular implementation, depth data for each frame image in the image combination is determined based on the image combination for each of the plurality of frame time instances.

Along with the foregoing embodiments, if the frame image in the original video has a resolution of 1080P, that is, 1920 × 1080 pixels, and is scanned line by line, the original depth map may also occupy 1920 × 1080 pixels, which is a single channel. The pixel data amount of the original image is: 1920 × 1080 × 8 × 3bit, the data volume of the original depth map is 1920 × 1080 × 8bit, if the number of the cameras is 30, the pixel data volume of the spliced image is 30 × 1920 × 1080 × 8 × 3+1920 × 1080 bit, which is about 237M, and if the spliced image is not compressed, the spliced image occupies more system resources and has larger delay. Especially, when the bandwidth is small, for example, the bandwidth is 1Mbps, an uncompressed stitched image needs about 237s to be transmitted, and if the original stitched image is transmitted at a frame rate, it is difficult to play the video in real time.

By regular storage, a higher compression rate can be obtained when the video format is compressed, or the resolution of the original image is reduced, the pixel data with the reduced resolution is used as the pixel data of the image, or one or more of the original depth maps are subjected to down-sampling, or the video compression code rate is increased, or the like, so that the data volume of the spliced image can be reduced.

For example, if the resolution of the frame image in the original video, that is, the acquired videos, is 4K, that is, 4096 × 2160 pixel resolution, and the down-sampling is 540P, that is, 960 × 540 pixel resolution, the number of pixels in the stitched image is about one sixteenth of the number before the down-sampling. The amount of data may be made smaller in combination with any one or more of the other ways of reducing the amount of data described above.

In a specific implementation, the video data is generated based on a plurality of the stitched images, which may be based on all or part of the stitched images, and specifically may be determined according to a frame rate of a video to be generated and a frame rate of an acquired video, or may also be determined according to a bandwidth for performing communication with a data processing device.

In a specific implementation, the video data is generated based on a plurality of the stitched images, and the video data may be generated by encoding and encapsulating the plurality of stitched images in the order of frame time.

Specifically, the packing Format may be any one of formats such as AVI, quickTime File Format, MPEG, WMV, real Video, flash Video, matroska, or other packing formats, and the encoding Format may be h.261, h.263, h.264, h.265, MPEG, AVS, or other encoding formats.

In a specific implementation, the generating of the multi-angle freeview image data may further include generating an association relation field, and the association relation field may indicate an association relation of the first field with the at least one second field. The first field stores pixel data of one of the plurality of images which are synchronized, and the second field stores depth data corresponding to the image, which correspond to the same shooting angle, i.e., the same angle of view.

In a specific implementation, the generating of the multi-angle freeview video data may further include: based on the synchronized plurality of frame images, parameter data of each frame image is calculated and stored, the parameter data including shooting position and shooting angle data of the frame image.

In a specific implementation, the frame-synchronized plurality of frame images in the image combination at different time instants in the synchronized plurality of videos may correspond to the same parameter data, and the parameter data may be calculated in any set of image combinations.

In a specific implementation, the generating of the multi-angle free view range image data may further include: generating a parameter data storage address field, wherein the parameter data storage address field is used for indicating the storage address of the parameter data. The apparatus performing data processing may acquire the parameter data from the storage address of the parameter data.

In a specific implementation, the generating of the multi-angle free view range image data may further include: and generating a video data storage address field for indicating a storage address of the generated video data.

It is understood that the generated video data, as well as other indicative data, such as the aforementioned generation association field, parameter data storage address field, video data storage address field, etc., may be included in the multi-angle freeview video data. These indicative data may be stored in a header file to instruct the device performing the data processing to acquire video data, parameter data, and the like.

The noun explanations, concrete implementations, and advantageous effects involved in the various embodiments of generating multi-angle freeview video data may refer to other embodiments, and various concrete implementations in the multi-angle freeview interaction method may be implemented in combination with other embodiments.

The following is further described with particular reference to multi-angle freeview data processing.

FIG. 38 is a flowchart of a multi-angle freeview data processing method according to an embodiment of the present invention, which may specifically include the following steps:

step S381, acquiring a data header file;

step S382, determining the definition format of the data file according to the analysis result of the data header file;

step S383, based on the defined format, reading a data combination from a data file, wherein the data combination comprises pixel data and depth data of a plurality of synchronous images, the synchronous images have different visual angles to regions to be watched, and the pixel data and the depth data of each image in the synchronous images have an association relation;

and step 384, reconstructing an image or video of a virtual viewpoint according to the read data combination, wherein the virtual viewpoint is selected from a multi-angle free view range, and the multi-angle free view range is a range supporting virtual viewpoint switching and watching on the region to be watched.

The multi-angle free visual angle data in the embodiment of the invention is data which can support the reconstruction of images or videos of virtual viewpoints in a multi-angle free visual angle range. May include header files as well as data files. The header file may indicate a defined format of the data file so that the apparatus for data processing multi-angle freeview data can parse desired data from the data file according to the header file, as described further below.

Referring to fig. 3 in combination, the device performing data processing may be a device located in the CDN, or a device 33 performing display, and may also be a device performing data processing. The data file and the header file may be stored in the server 32 at the cloud end, or in some application scenarios, the header file may also be stored in a device for data processing, and the header file is obtained locally.

In a specific implementation, the stitched image in each of the foregoing embodiments may be used as a data file in the embodiments of the present invention. In an application scenario with limited bandwidth, the stitched image may be divided into multiple portions for transmission multiple times. Correspondingly, the header file may include a segmentation mode, and the device for processing data may combine the segmented multiple portions according to the indication in the header file to obtain a stitched image.

In a specific implementation, the defined format may include a storage format, and the header file may include a field indicating the storage format of the data combination, where the field may indicate the storage format using a number, or may directly write to the storage format. Accordingly, the parsing result may be the number of the storage format, or the storage format.

Accordingly, the device performing data processing may determine the storage format according to the parsing result. For example, a specific storage format may be determined based on the number, and the stored supporting data; or the storage format may be directly obtained from a field indicating the storage format of the data combination. In other embodiments, if the storage format can be fixed in advance, the fixed storage format may also be recorded in the device for data processing.

In a specific implementation, the storage format may be a picture format or a video format. As mentioned above, the picture format may be any one of BMP, PNG, JPEG, webp, and other image formats, or may be other image formats; the Video Format may include a packing Format and an encoding Format, the packing Format may be any one of the formats of AVI, quickTime File Format, MPEG, WMV, real Video, flash Video, matroska, etc., or may be other packing formats, and the encoding Format may be an encoding Format of h.261, h.263, h.264, h.265, MPEG, AVS, etc., or may be other encoding formats.

The storage format may also be a picture format or other formats besides a video format, which is not limited herein. Various storage formats capable of indicating through a header file or enabling a device performing data processing to acquire required data through stored supporting data so as to perform subsequent image or video reconstruction of a virtual viewpoint are within the protection scope of the present invention.

In a specific implementation, when the storage format of the data combination is a video format, the number of the data combinations may be multiple, and each data combination may be a data combination corresponding to a different frame time after decapsulating and decoding a video.

In a specific implementation, the defined format may include a content storage rule of the data combination, and the header file may include a field indicating the content storage rule of the data combination. Through the content storage rule, the device performing data processing can determine the association between the pixel data and the depth data in each image. The field indicating the content storage rule of the data combination may also be referred to as an association field, and the field may indicate the content storage rule of the data combination by a number or may be directly written to the rule.

Accordingly, the device performing data processing may determine the content storage rule of the data combination according to the analysis result. For example, a specific content storage rule may be determined based on the number, and the stored supporting data; or the content storage rule of the data combination may be directly obtained from a field indicating the content storage rule of the data combination.

In other embodiments, if the content storage rule may be fixed in advance, the content storage rule of the fixed data combination may be recorded in the device for data processing. The following further describes a specific implementation manner of the content storage rule of the data combination and the device for processing data to obtain the data combination in combination with the indication of the header file.

In a specific implementation, the storage rule of the synchronized pixel data of the multiple images and the depth data may specifically be a storage rule of the synchronized pixel data of the multiple images and the depth data in the stitched image.

As mentioned above, the storage format of the data combination may be a picture format or a video format, and accordingly, the data combination may be a picture format or a frame image in a video. The image or frame image stores pixel data and depth data of each of a plurality of images synchronized with each other, and from this viewpoint, an image or frame image obtained by decoding according to a picture format or a video format may also be referred to as a stitched image. The storage rule of the pixel data and the depth data of the synchronized plurality of images may be a storage location in the stitched image, which may be varied. For the multiple storage modes of the synchronized pixel data of the multiple images and the depth data in the stitched image, reference may be made to the foregoing description, which is not described herein again.

In a specific implementation, the content storage rule of the data combination may be used to indicate to a device performing data processing a plurality of storage manners of the pixel data and the depth data of the synchronized plurality of images in the stitched image, or may indicate, for each image, the storage manners of the first field and the second field in other storage manners, that is, the storage rule of the pixel data and the depth data of the synchronized plurality of images.

As described above, the header file may include a field indicating a content storage rule of a data combination, and the field may indicate the content storage rule of the data combination by a number, or the rule may be directly written in the header file, or the content storage rule of the fixed data combination may be recorded in the device for processing data.

The content storage rule may correspond to any one of the storage manners, and the device for processing data may analyze the storage manner according to the content storage rule, further analyze the data combination, and determine an association relationship between the pixel data and the depth data of each of the plurality of images.

In a specific implementation, the content storage rule may be indicated by a distribution of the image region and the depth map region by pixel data of each of the synchronized plurality of images and a storage location of the depth data in the stitched image.

The indication may be a pattern number, for example, if the pattern number is 1, the content storage rule may be parsed as: the spliced image is equally divided into an upper area and a lower area, the upper half area is an image area, the lower half area is a depth map area, and the image at a certain position of the upper half area is associated with the depth map stored at the corresponding position of the lower half area. The device performing the data processing may further determine a specific storage manner based on the rule. For example, the storage mode shown in fig. 27 or fig. 28, or other storage modes, may be further determined by combining the number of the multiple images to be synchronized, the storage order of the pixel data and the depth data, and the proportional relationship between the depth data and the pixel points occupied by the pixel data.

In a specific implementation, the content storage rule may also be indicated by the distribution of the image sub-regions and the depth map sub-regions through the pixel data of each of the synchronized multiple images and the storage location of the depth data in the stitched image. The pixel data of each image in the plurality of synchronized images is stored in the image sub-area, and the depth data of each image in the plurality of synchronized images is stored in the depth map sub-area.

For example, the content storage rule may be that the image sub-region and the depth map sub-region are arranged in a column-by-column crossing manner, and similar to the previous example, the device performing data processing may further determine a specific storage manner based on the rule. For example, the storage mode shown in fig. 31 or other storage modes may be further determined by combining the number of the synchronized multiple images, the storage order of the pixel data and the depth data, and the proportional relationship between the depth data and the pixel points occupied by the depth data.

As mentioned above, the first field for storing pixel data and the second field for storing depth data may be pixel fields in the stitched image, or may be fields stored in other forms. It will be appreciated by those skilled in the art that the content storage rules may be instructions tailored to the particular storage mode, so that the device performing the data processing may be informed of the corresponding storage mode.

In a specific implementation, the content storage rule may further include more information for supporting a device performing data processing to parse out a storage manner of the data combination. For example, all or part of the aforementioned image sub-region and the depth map sub-region may be subjected to edge protection, and the manner of edge protection. The content storage rule may include a resolution relationship between the pixel data of the image and the depth data.

The device for processing data may determine a specific storage manner based on the stored information or information obtained from other fields of the header file. For example, the number of the aforementioned synchronized multiple images may also be obtained through a header file, and specifically may be obtained through a definition format of a data file analyzed in the header file.

After determining the specific storage manner, the data processing device may analyze the pixel data of the synchronized multiple images and the depth data corresponding to the synchronized multiple images.

In a specific implementation, the resolution of the pixel data and the depth data may be the same, and the pixel data and the corresponding depth value of each pixel point of each image may be further determined.

As mentioned above, the depth data may also be downsampled data, and a defined format in the header file may have a corresponding field for indication, and the device performing data processing may perform corresponding upsampling to determine pixel data and a corresponding depth value of each pixel point of each image.

Correspondingly, rendering and displaying are performed according to the read data combination, and rendering and displaying can be performed after reconstruction of the image is performed according to the pixel data and the corresponding depth value of each pixel point of each image and the position of the virtual viewpoint to be displayed. For a video, the reconstructed image may be a frame image, the frame image is displayed according to the sequence of the frame time, and the video may be played for a user to complete video reconstruction. That is, video reconstruction may include reconstruction of frame images in a video, and the specific implementation of frame image reconstruction is the same as or similar to image reconstruction.

In a specific implementation, referring to fig. 39, performing image reconstruction of the virtual viewpoint may include the following steps:

step S391, determining parameter data of each image in the plurality of synchronized images, wherein the parameter data comprises shooting position and shooting angle data of the images;

step S392, determining the parameter data of the virtual viewpoint, wherein the parameter data of the virtual viewpoint comprises a virtual viewing position and a virtual viewing angle;

step S393, determining a plurality of target images among the synchronized plurality of images;

step S394, for each target image, mapping the depth data to the virtual viewpoint according to a relationship between the parameter data of the virtual viewpoint and the parameter data of the image;

step S395, a reconstructed image is generated based on the depth data mapped to the virtual viewpoint and the pixel data of the target image.

Wherein generating the reconstructed image further may comprise: and determining the pixel value of each pixel point of the reconstructed image. Specifically, for each pixel point, if the pixel data mapped to the virtual viewpoint is 0, the hole filling may be performed by using the pixel data around one or more target images. For each pixel point, if the pixel data mapped to the virtual viewpoint are multiple non-zero data, the weight value of each data can be determined, and finally the value of the pixel point is determined.

In an embodiment of the present invention, when generating a reconstructed image, forward mapping may be performed first, and a texture map of a corresponding group in an image combination of the video frame is projected to a three-dimensional euclidean space by using depth information, that is: respectively mapping the depth maps of the corresponding groups to the virtual viewpoint positions at the user interaction time according to a space geometric relationship to form a virtual viewpoint position depth map, then executing reverse mapping, and projecting three-dimensional space points to an imaging plane of a virtual camera, namely: and copying pixel points in the texture map of the corresponding group into the virtual texture map corresponding to the generated virtual viewpoint position according to the mapped depth map to form the virtual texture map corresponding to the corresponding group. And then, fusing the corresponding virtual texture maps of the corresponding groups to obtain a reconstructed image of the virtual viewpoint position at the user interaction moment. The method is adopted to reconstruct the image, and the sampling precision of the reconstructed image can be improved.

Preprocessing may be performed prior to performing the forward mapping. Specifically, the depth values of the forward map and the homography matrix of the texture reverse map may be calculated first according to the corresponding sets of corresponding parameter data in the image combination of the video frame. In a particular implementation, the depth level may be converted to a depth value using a Z-transform.

In the depth map forward mapping process, a corresponding set of depth maps may be mapped to a depth map of a virtual viewpoint position using a formula, and then a depth value of the corresponding position is copied. In addition, the depth map of the corresponding set may have noise, and some sampled signals may be included in the mapping process, so that the generated depth map of the virtual viewpoint position may have small noise holes. To address this problem, median filtering may be employed to remove noise.

In a specific implementation, the virtual viewpoint position depth map obtained after the forward mapping may be further post-processed according to requirements, so as to further improve the quality of the generated reconstructed image. In an embodiment of the present invention, before performing reverse mapping, a foreground and background occlusion relationship is processed on a depth map of a virtual viewpoint position obtained by forward mapping, so that a generated depth map can more truly reflect a position relationship of an object in a scene seen by the virtual viewpoint position.

For the reverse mapping, specifically, the positions of the corresponding set of texture maps in the virtual texture map may be calculated according to the depth map of the virtual viewpoint positions obtained by the forward mapping, and then, the texture values of the corresponding pixel positions are copied, wherein the hole in the depth map may be marked as 0 or marked as having no texture value in the virtual texture map. Hole dilation can be done for the regions marked as holes to avoid synthesis artifacts.

And then, fusing the generated virtual texture maps of the corresponding groups to obtain a reconstructed image of the virtual viewpoint position at the user interaction moment. In particular embodiments, the fusion may be performed in various ways, and the following two examples are given as examples.

In an embodiment of the present invention, the weighting process is performed first, and then the hole filling is performed. Specifically, the method comprises the following steps: and carrying out weighting processing on the pixels at the corresponding positions in the virtual texture maps corresponding to the corresponding groups in the image combination of the video frames at the user interaction time to obtain the pixel values at the corresponding positions in the reconstructed image at the virtual viewpoint position at the user interaction time. And then, for the position with the zero pixel value in the reconstructed image of the virtual viewpoint position at the user interaction moment, filling up the hole by using the pixels around the pixel in the reconstructed image to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment.

In another embodiment of the present invention, the hole filling is performed first, and then the weighting process is performed. Specifically, the method comprises the following steps: and for the position where the pixel value in the virtual texture map corresponding to each corresponding group in the image combination of the video frame at the user interaction moment is zero, respectively filling a hole by using the surrounding pixel values, and then weighting the pixel value at the corresponding position in the virtual texture map corresponding to each corresponding group after filling the hole to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment.

The weighting processing in the above embodiment may specifically adopt a weighted average manner, and may also adopt different weighting coefficients according to parameter data or a positional relationship between the acquisition device and the virtual viewpoint. In an embodiment of the present invention, weighting is performed according to the position of the virtual viewpoint and the reciprocal of the position distance of each acquisition device, that is: the closer the acquisition device is to the virtual viewpoint position, the greater the weight.

In specific implementation, the hole filling may be performed by using a preset hole filling algorithm according to needs, which is not described herein again.

In a specific implementation, the shooting position and shooting angle data of the image may be referred to as external parameter data, and the parameter data may also include internal parameter data, that is, attribute data of a shooting device of the image. Distortion parameters and the like can be embodied through internal parameter data, and the mapping relation is more accurate by combining the internal parameters.

In a specific implementation, the parameter data may be obtained from a data file, and specifically, may be obtained from a corresponding storage space according to a storage address of the parameter data in the header file.

In specific implementation, the target image may be determined according to the 6-degree-of-freedom coordinates of the virtual viewpoint and the viewpoint of the virtual viewer at the image capturing position, that is, the 6-degree-of-freedom coordinates of the image viewpoint, and a plurality of images with the closer positions of the image viewpoint and the virtual coordinate viewpoint are selected.

In a specific implementation, all of the plurality of images synchronized may be the target image. More images are selected as target images, so that the quality of the reconstructed images is higher, and the selection of the target images can be determined according to requirements without limitation.

As described above, the depth data may be a set of depth values corresponding to pixels of an image one-to-one, depth data mapped to the virtual viewpoint, and data corresponding to pixels of an image one-to-one. And generating a reconstructed image, specifically, for each pixel position, acquiring data of a corresponding position from pixel data of the target image according to the depth data, and generating the reconstructed image. When data is acquired from a plurality of target images at one pixel position, weighting calculation can be performed on the plurality of data to improve the quality of the reconstructed image.

It will be understood by those skilled in the art that the process of reconstructing an image from a virtual viewpoint based on multi-angle freeview image data in the embodiments of the present invention may be various, and is not limited herein.

An embodiment of the present invention further provides a multi-angle free view processing apparatus, a schematic structural diagram of which is shown in fig. 40, and the apparatus may include:

a header file obtaining unit 401 adapted to obtain a header file;

a header file parsing unit 402, adapted to parse the header file, and determine a defined format of the data file according to a parsing result;

a data combination reading unit 403, adapted to read a data combination from a data file based on the defined format, where the data combination includes pixel data and depth data of a plurality of synchronized images, the plurality of synchronized images differ from each other in view angle of a region to be viewed, and the pixel data and the depth data of each of the plurality of synchronized images have an association relationship;

a reconstructing unit 404, adapted to perform image or video reconstruction of a virtual viewpoint according to the read data combination, where the virtual viewpoint is selected from a multi-angle free view range, and the multi-angle free view range is a range supporting switching viewing of a virtual viewpoint for the region to be viewed.

In a specific implementation, the reconstruction unit 404 is further adapted to reconstruct an image or video of the virtual viewpoint based on the data combination according to the relation between the virtual viewpoint and the parametric data.

The noun explanation, the principle, the specific implementation and the beneficial effects related to the multi-angle free visual angle data processing device in the embodiment of the present invention can be referred to the multi-angle free visual angle data processing method in the embodiment of the present invention, and are not described herein again.

The computer readable storage medium may be various suitable media such as an optical disc, a mechanical hard disk, a solid state hard disk, and the like.

The terminal may be an edge computing node or a device for displaying, for example, a mobile phone, a tablet computer, or the like may be used as a device for displaying, and the edge computing node may be a node for performing near field communication with a display device for displaying a reconstructed image, and maintaining high-bandwidth low-delay connection, for example, connecting through wifi, 5g, or the like. Specifically, the edge computing node may be a base station, a mobile device, a vehicle-mounted device, a home router with sufficient computing power, and the like. Referring collectively to fig. 3, the edge computing node may be a device located on the CDN.

An embodiment of the present invention further provides a mobile device, including a communication component, a processor, and a display component: the communication component is used for receiving multi-angle free visual angle data, and the multi-angle free visual angle data comprises a data header file and the data file; the processor is used for rendering based on the multi-angle free visual angle data to obtain display content, and a virtual viewpoint of the display content is selected from the multi-angle free visual angle range; the display component is used for displaying the display content. The mobile device can be a mobile phone, a tablet computer and the like.

The noun explanation, specific implementation and beneficial effects involved in the multi-angle freeview data processing method can be seen in other embodiments, and various specific implementations in the multi-angle freeview interaction method can be implemented in combination with other embodiments.

The foregoing multi-angle freeview data may be multi-angle freeview image data, and is described further below with particular reference to multi-angle freeview image data processing.

FIG. 41 is a flowchart of a multi-angle freeview image data processing method according to an embodiment of the present invention, which may specifically include the following steps:

step S411, acquiring a data combination stored in a picture format, wherein the data combination comprises pixel data and depth data of a plurality of synchronous images, and the plurality of synchronous images have different visual angles of regions to be watched;

step S412, based on the data combination, performing image reconstruction of a virtual viewpoint, where the virtual viewpoint is selected from a multi-angle free view range, and the multi-angle free view range is a range supporting switching viewing of the virtual viewpoint for the region to be viewed.

The data combination in the picture format may be obtained by parsing the header file and reading the header file from the data file in the implementation manner in the foregoing embodiment. The way in which the image reconstruction of the virtual viewpoint is performed can also be seen from the foregoing.

In a specific implementation, acquiring a data combination stored in a picture format and performing image reconstruction of a virtual viewpoint may be performed by an edge computing node. As previously described, the edge computing node may be a node that communicates in close proximity with a display device that displays the reconstructed image, maintaining a high bandwidth low latency connection, such as through wifi, 5g, or the like. In particular, the edge computing node may be a base station, a mobile device, a vehicle-mounted device, a home router with sufficient computing power. Referring collectively to fig. 3, the edge computing node may be a device located on the CDN.

Correspondingly, before the image of the virtual viewpoint is reconstructed, the parameter data of the virtual viewpoint can be received, and after the image of the virtual viewpoint is reconstructed, the reconstructed image can be sent to a display device.

The image is reconstructed through the edge computing node, the requirement on display equipment can be reduced, equipment with low computing capacity can receive user instructions, and multi-angle free visual angle experience is provided for users.

For example, in a 5G scenario, a communication speed between a User Equipment (UE) and a base station, especially a base station of a current serving cell, is fast, and a User may determine parameter data of a virtual viewpoint by indicating the UE, and calculate to obtain a reconstructed image by using the base station of the current serving cell as an edge calculation node. The equipment for displaying can receive the reconstructed image and provide multi-angle free visual angle service for users.

It is to be understood that in the specific implementation, the device for performing image reconstruction and the device for performing display may be the same device. The device may receive a user indication and determine a virtual viewpoint based on the user's real-time. After the image of the virtual viewpoint is reconstructed, the reconstructed image may be displayed.

In particular implementations, implementations of receiving a user indication and generating a virtual viewpoint according to the user indication may be various, the virtual viewpoint being a viewpoint within a free viewing angle range. Therefore, the embodiment of the invention can support the user to freely switch the virtual viewpoint in a multi-angle free visual angle range.

It is understood that the noun explanations, specific implementation manners and advantageous effects involved in the multi-angle freeview picture data processing method can be found in other embodiments, and various specific implementations in the multi-angle freeview interaction method can be implemented in combination with other embodiments.

The multi-angle freeview data described above may also be multi-angle freeview video data, and is described further below with particular reference to multi-angle freeview video data processing.

FIG. 42 is a flowchart of a multi-angle freeview video data processing method according to an embodiment of the present invention, which may include the following steps:

step S421, analyzing the acquired video data to obtain data combinations at different frame times, where the data combinations include pixel data and depth data of multiple synchronized images, and the multiple synchronized images have different viewing angles of regions to be viewed;

step S422, for each frame time, based on the data combination, image reconstruction of a virtual viewpoint is carried out, the virtual viewpoint is selected from a multi-angle free visual angle range, the multi-angle free visual angle range is a range supporting switching and watching of the virtual viewpoint of an area to be watched, and the reconstructed image is used for video playing.

In a specific implementation, the format of the obtained video data may be various, the obtained video data may be analyzed, decapsulation and decoding may be performed based on the video format to obtain frame images at different frame times, and the data combination may be obtained from the frame images, that is, in the frame images, the pixel data and the depth data of a plurality of images that are synchronized may be stored. From this perspective, the frame image may also be referred to as a stitched image.

The video data may be obtained from the data file according to the header file, and the specific implementation manner of obtaining the data combination may be as described above. For a specific implementation of image reconstruction from a virtual viewpoint, see also the foregoing. After the reconstructed image at each frame time is obtained, video playing can be performed according to the sequence of the frame times.

In a specific implementation, acquiring data combinations at different frame times and performing image reconstruction of a virtual viewpoint may be performed by an edge computing node.

Correspondingly, before the image of the virtual viewpoint is reconstructed, the parameter data of the virtual viewpoint can be received, and after the image of the virtual viewpoint is reconstructed, the reconstructed image at each frame time can be sent to a display device.

It is to be understood that, in the embodiment, the device for performing image reconstruction and display may be the same device.

It is to be understood that the noun explanations, specific implementations and advantageous effects involved in the multi-angle freeview video data processing method can be found in other embodiments, and various specific implementations in the multi-angle freeview interaction method can be implemented in combination with other embodiments.

The following is further described with particular reference to multi-angle freeview interaction methods.

FIG. 43 is a flowchart of a multi-angle free-view interaction method in an embodiment of the present invention, which may specifically include the following steps:

step S431, receiving a user instruction;

step S432, determining a virtual viewpoint according to the user instruction, wherein the virtual viewpoint is selected from a multi-angle free visual angle range, and the multi-angle free visual angle range is a range supporting the switching and watching of the virtual viewpoint of an area to be watched;

step S433, displaying display content to be viewed on the region to be viewed based on the virtual viewpoint, where the display content is generated based on a data combination and the virtual viewpoint, the data combination includes pixel data and depth data of a plurality of synchronized images, an association exists between the image data and the depth data of each image, and the plurality of synchronized images have different viewing angles to the region to be viewed.

In an embodiment of the present invention, the virtual viewpoint may be a viewpoint within a multi-angle free view range, and the specific multi-angle view range may be associated with a data combination.

In a particular implementation, a user indication may be received, and a virtual viewpoint may be determined within a free viewing angle range based on the user indication. The user indication and the manner in which the virtual viewpoint is determined according to the user indication may be varied, as further illustrated below.

In a specific implementation, determining a virtual viewpoint according to the user indication may include: and determining a basic viewpoint for watching the area to be watched, wherein the basic viewpoint comprises the position and the visual angle of the basic viewpoint. At least one of the position and the angle of view of the virtual viewpoint may be changed based on the base viewpoint, and the user indication may be associated with a change manner of the change. And determining a virtual viewpoint by taking the basic viewpoint as a reference according to the user instruction, the basic viewpoint and the association relation.

The base viewpoint may include a position and a viewing angle of the user viewing the region to be viewed. Further, the base viewpoint may be a position corresponding to a screen displayed by the display device when receiving the user instructionAnd view angle, for example, referring to fig. 4, if receiving a user's instruction, the image displayed by the device is as shown in fig. 4, and referring to fig. 2 in combination, the location of the base viewpoint may be VP as shown in fig. 2 ₁ . It is understood that the position and the angle of view of the base viewpoint may be preset, or the base viewpoint may be a virtual viewpoint determined in advance according to a user instruction, and the base viewpoint may also be expressed by coordinates of 6 DoF. The association relationship between the user indication and the change mode of the virtual viewpoint based on the base viewpoint may be a preset association relationship.

In specific implementations, the manner of receiving the user instruction may be various, and each of the manners will be described below.

In one particular implementation, a path of the point of contact on the touch-sensitive screen may be detected, and the path may include a start point, an end point, and a direction of movement of the point of contact, with the path as the user indication.

Accordingly, the association relationship between the path and the virtual viewpoint based on the change manner of the base viewpoint may be various.

For example, the paths may be 2, and if at least one contact point in the 2 paths moves in a direction away from each other, the position of the virtual viewpoint moves in a direction close to the region to be viewed.

Referring to fig. 44 and 11 in combination, vector F in fig. 44 ₁

Sum vector F

₂ 2 paths can be respectively illustrated, under which if the base viewpoint is B in FIG. 11 ₂ Then the virtual viewpoint may be B ₃ . That is, the region to be viewed is enlarged for the user.

It is understood that fig. 44 is only an illustration, and in a specific application scenario, the starting point, the ending point, and the direction of the 2 paths may be various, and at least one contact point in the 2 paths may move in a direction away from the other. One of the 2 paths may be a path of an unmoved contact point, and include only the starting point.

In an embodiment of the present invention, the display image before enlargement may be as shown in fig. 4, and the enlarged image may be as shown in fig. 45.

In a specific implementation, the center point of the magnification may be determined according to the position of the contact point, or a preset point may be used as the center point, and the image may be magnified by the center point. The magnification of the magnification, that is, the amplitude of the movement of the virtual viewpoint, may be associated with the amplitude of the approach of the contact points in the 2 paths, and the association relationship may be preset.

In a specific implementation, if at least one contact point in the 2 paths moves in a direction close to the other, the position of the virtual viewpoint may move in a direction away from the region to be viewed.

Referring to fig. 46 and 11 in combination, vector F in fig. 46 ₃ Sum vector F ₄ It is possible to respectively illustrate 2 paths under which if the base viewpoint is B in FIG. 11 ₃ Then the virtual viewpoint may be B ₂ . That is, for the user, the region to be viewed is narrowed.

It is understood that fig. 46 is only an illustration, and in a specific application scenario, the starting point, the ending point, and the direction of the 2 paths may be various, and at least one contact point in the 2 paths may move in a direction approaching to another. One of the 2 paths may be a path of an unmoved contact point, and include only the starting point.

In an embodiment of the present invention, the display image before being reduced may be as shown in fig. 45, and the image after being reduced may be as shown in fig. 4.

In a specific implementation, the reduced center point may be determined according to the position of the contact point, or a preset point may be used as the center point, and the image is reduced by using the center point. The zoom-out magnification, that is, the amplitude of the virtual viewpoint movement, may be associated with the amplitude of the approach of the contact points in the 2 paths, and the association relationship may be preset.

In a specific implementation, the association relationship between the path and the change manner of the virtual viewpoint based on the base viewpoint may also include: the number of the paths is 1, the moving distance of the contact point is associated with the change amplitude of the visual angle, and the moving direction of the contact point is associated with the change direction of the visual angle.

For example, referring to fig. 5 and 13 in combination, if the received user indication is 1 path, the vector in fig. 5 is usedD ₅₂ Schematically, if the base viewpoint is point C in FIG. 13 ₂ The virtual viewpoint may be point C ₁ 。

In an embodiment of the present invention, the display before the switching of the viewing angle may be as shown in fig. 5, and the display after the switching of the viewing angle may be as shown in fig. 6.

If the received user indication is 1 path, for example, vector D in FIG. 8 ₈₁ It is shown that if the base viewpoint is point C in FIG. 13 ₂ The virtual viewpoint may be point C ₃ 。

In an embodiment of the present invention, the display before the switching of the viewing angle may be as shown in fig. 8, and the display after the switching of the viewing angle may be as shown in fig. 9.

It will be appreciated by those skilled in the art that the various embodiments described above are merely illustrative in nature and are not limiting of the association between the user indication and the virtual viewpoint.

In particular implementations, the user indication may include voice control instructions, which may be in a natural language format, such as "zoom in," "zoom out," "view left," and the like. Correspondingly, the virtual viewpoint is determined according to the user instruction, voice recognition can be performed on the user instruction, and the virtual viewpoint is determined by taking the basic viewpoint as a reference according to a preset incidence relation between the instruction and the virtual viewpoint based on the change mode of the basic viewpoint.

In particular implementations, the user indication may also include a selection of a preset viewpoint from which to view the region to be viewed. The preset viewpoint may be various according to the region to be viewed. For example, if the viewing area is a basketball game area, the predetermined viewpoint can be a viewpoint of a virtual audience located under the backboard as a side of a field, or a virtual trainer viewing angle. Accordingly, a preset viewpoint may be used as a virtual viewpoint.

In particular implementations, the user indication may also include a selection of a particular object in the area to be viewed. The particular object may be determined by image recognition techniques. For example, in a basketball game, each player in the game scene may be identified according to face recognition technology, the user may be provided with options of the relevant player, and according to the selection of a specific player by the user, a virtual viewpoint may be determined, and the user may be provided with a picture under the virtual viewpoint.

In a specific implementation, the user indication may further include at least one of a position and a viewing angle of the virtual viewpoint, for example, the 6DoF coordinates of the virtual viewpoint may be directly input.

In specific implementation, the manner of receiving the user indication may be various, for example, various manners such as detecting a signal of a contact point on the touch-sensitive screen, detecting a signal of an acoustoelectric sensor, detecting a signal of a sensor capable of representing the posture of the device, such as a gyroscope, a gravity sensor, and the like; the corresponding user indication may be a path of the point of contact on the touch sensitive screen, a voice control instruction, a gesture operation, and so on. The content indicated by the user may be various, for example, various manners such as indicating that the virtual viewpoint indicates a preset viewpoint based on a change manner of the base viewpoint, indicating a specific viewing object, or directly indicating at least one of a position and a viewing angle of the virtual viewpoint. The specific implementation of determining the virtual viewpoint according to the user instruction may also be various.

Specifically, in combination with the manner of receiving the user instruction, the detection of the various sensing devices may be performed at preset time intervals, and the time intervals correspond to the frequency of detection, for example, the detection may be performed at a frequency of 25 times/second, so as to obtain the user instruction.

It is to be understood that the manner of receiving the user indication, the content of the user indication, and the manner of determining the virtual viewpoint according to the user indication may be combined or replaced, and are not limited herein.

In specific implementation, after receiving the trigger instruction, the user instruction may be received in response to the trigger instruction, so that misoperation of the user may be avoided. The triggering instruction may be a preset button clicked in the screen area, or a voice control signal is used as the triggering instruction, or the foregoing manner that the user instruction can be taken, or other manners.

In particular implementations, the user indication may be received during the playing of a video or presentation of an image. In the process of displaying the image, a user indication is received, and the data combination can be the data combination corresponding to the image. In the process of playing the video, a user indication is received, and the data combination can be a data combination corresponding to a frame image in the video. The display content for viewing the region to be viewed based on the virtual viewpoint may be an image reconstructed based on the virtual viewpoint.

In the process of playing the video, after a virtual viewpoint is generated according to a user instruction, the display content for watching the to-be-watched area based on the virtual viewpoint may be a frame image of a plurality of reconstructed frames generated based on the virtual viewpoint. That is, in the process of switching the virtual viewpoint, the video may be continuously played, the video may be played in the original virtual viewpoint before the virtual viewpoint is re-determined according to the user instruction, and after the virtual viewpoint is re-determined, a reconstructed frame image based on the virtual viewpoint may be generated to be played in the position and the viewing angle of the switched virtual viewpoint.

Further, in the process of playing the video, after receiving an instruction from a user to generate a virtual viewpoint, the display content for viewing the to-be-viewed area based on the virtual viewpoint may be a frame image of a plurality of reconstructed frames generated based on the virtual viewpoint. That is, in the process of switching the virtual viewpoint, the video may be continuously played, before the virtual viewpoint is determined, the video may be played in the original configuration, and after the virtual viewpoint is determined, a reconstructed frame image based on the virtual viewpoint may be generated to be played at the position and the viewing angle of the switched virtual viewpoint. Or, the video playing can be paused, and the virtual viewpoint can be switched.

With combined reference to fig. 4 and fig. 6, during the image presentation, a user instruction may be received, a virtual viewpoint may be generated according to the user instruction to switch viewing, and the display content may be changed from the image shown in fig. 4 to the image shown in fig. 6.

When the video is played to the frame image as shown in fig. 4, the virtual viewpoint is switched to show the frame image as shown in fig. 6. Before receiving a new user instruction, the frame image based on the virtual viewpoint may be continuously presented for video playing, for example, when a new user instruction is received while playing to the frame image shown in fig. 47, the virtual viewpoint may be switched to continue video playing according to the new user instruction.

It is to be understood that the noun explanations, specific implementations, and advantageous effects referred to in the multi-angle freeview interaction method may refer to other embodiments, and various specific implementations in the multi-angle freeview interaction method may be implemented in combination with other embodiments.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A multi-angle freeview data processing method, comprising:

acquiring a data header file, wherein the data header file is used for indicating a defined format of a data file so as to analyze required data from the data file according to the data header file;

determining a defined format of the data file according to the analysis result of the data header file;

reading a data combination from a data file based on the defined format, wherein the data combination comprises pixel data and depth data of a plurality of synchronous images, the synchronous images have different visual angles of regions to be watched, and the pixel data and the depth data of each image in the synchronous images have an association relation;

and reconstructing an image or a video of a virtual viewpoint according to the read data combination, wherein the virtual viewpoint is selected from a multi-angle free visual angle range, and the multi-angle free visual angle range is a range supporting the switching and watching of the virtual viewpoint on the area to be watched.

2. The method of claim 1, wherein the defined format comprises a storage format of the data combination.

3. The method of claim 2, wherein the storage format of the data assembly is a video format.

4. The method of claim 3, wherein the number of data combinations is plural, and each data combination corresponds to a different frame time.

5. The method of claim 2, wherein the storage format of the data combination is a picture format.

6. The method according to claim 4 or 5, wherein said defined format comprises content storage rules for said data combination, said content storage rules comprising storage rules for pixel data and depth data for said synchronized plurality of images.

7. The method of claim 6, wherein the storage rule for the pixel data and the depth data of the synchronized multiple images comprises: storage locations of pixel data and depth data of the synchronized plurality of images in a stitched image, the stitched image being an image or frame image read from the data file according to a picture format or a video format of the defined formats.

8. The method of claim 7, wherein the pixel data of each of the plurality of synchronized images is stored in an image area, the depth data is stored in a depth map area, and the storage locations of the pixel data and the depth data of each of the plurality of synchronized images in the stitched image are indicated by the distribution of the image area and the depth map area.

9. The method of claim 7, wherein the pixel data of each of the plurality of images is stored in an image sub-region, the depth data of each of the plurality of images is stored in a depth map sub-region, and the storage locations of the pixel data and the depth data of each of the plurality of images in the merged image are indicated by the distribution of the image sub-region and the depth map sub-region.

10. The multi-angle freeview data processing method according to claim 9, wherein said storing rule of the pixel data and the depth data of the synchronized plurality of images further comprises: and all or part of the image subarea and the depth map subarea are subjected to edge protection.

11. The method of claim 6, wherein the storage rules for pixel data and depth data of the synchronized multiple images further comprise: the resolution of the pixel data and the depth data of the image are the same.

12. The method of claim 1, wherein the data combination comprises a first field of the synchronized plurality of pictures storing pixel data for each picture, and a second field associated with the first field storing depth data for the picture.

13. The method of claim 12, wherein the defined format comprises an association relationship between the first field and the second field.

14. The method of claim 1, wherein the data file further comprises parameter data for each of the plurality of images, the parameter data including a photographing position and a photographing angle data of the image.

15. The multi-angle freeview data processing method as claimed in claim 14, wherein the parameter data further includes internal parameter data, the internal parameter data including attribute data of a photographing apparatus of the picture.

16. The method of claim 14 or 15, wherein the defined format further includes a storage address of the parameter data.

17. The method as claimed in claim 14 or 15, wherein the reconstructing the image or video from the virtual viewpoint according to the read data combination comprises:

and reconstructing an image or a video of the virtual viewpoint based on the data combination according to the relation between the virtual viewpoint and the parameter data.

18. The method of claim 1, wherein the defined format further comprises storage addresses of pixel data and depth data of the synchronized plurality of images.

19. The method of claim 1, wherein the defined format of the data file includes a number of the synchronized multiple pictures.

20. A multi-angle freeview data processing apparatus, comprising:

the data header file acquisition unit is suitable for acquiring a data header file, and the data header file is used for indicating the definition format of the data file so as to analyze the required data from the data file according to the data header file;

the data header file analysis unit is suitable for determining the definition format of the data file according to the analysis result of the data header file;

the data combination reading unit is suitable for reading a data combination from a data file based on the defined format, wherein the data combination comprises pixel data and depth data of a plurality of synchronous images, the synchronous images have different visual angles of regions to be watched, and the pixel data and the depth data of each image in the synchronous images have an association relation;

and the reconstruction unit is suitable for reconstructing images or videos of virtual viewpoints according to the read data combination, the virtual viewpoints are selected from a multi-angle free visual angle range, and the multi-angle free visual angle range is a range supporting the switching and watching of the virtual viewpoints of the region to be watched.

21. A computer readable storage medium having stored thereon computer instructions, wherein the computer instructions are executable to perform the steps of the multi-angle freeview data processing method according to any one of claims 1 to 19.

22. A terminal comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor executes the computer instructions to perform the steps of the multi-angle freeview data processing method according to any one of claims 1 to 19.

23. A mobile device comprising a communication component, a processor, and a display component, wherein: the communication component for receiving multi-angle freeview data, the multi-angle freeview data comprising the header file and the data file in the multi-angle freeview data processing method of any one of claims 1 to 19;

the processor is configured to perform rendering based on the multi-angle free-view data to obtain a display content, where a virtual view of the display content is selected from the multi-angle free-view range in the multi-angle free-view data processing method according to any one of claims 1 to 19;

the display component is used for displaying the display content.