CN111669518A

CN111669518A - Multi-angle free visual angle interaction method and device, medium, terminal and equipment

Info

Publication number: CN111669518A
Application number: CN201910173415.1A
Authority: CN
Inventors: 盛骁杰
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-03-07
Filing date: 2019-03-07
Publication date: 2020-09-15

Abstract

The embodiment of the invention discloses a multi-angle free visual angle interaction method, a device, a medium, a terminal and equipment, wherein the multi-angle free visual angle interaction method comprises the following steps: receiving an indication of a user in a video playing process; determining a virtual viewpoint according to the indication of the user, wherein the virtual viewpoint is selected from a multi-angle free visual angle range, and the multi-angle free visual angle range is a range supporting the switching and watching of viewpoints of an area to be watched; displaying an image which is watched on the region to be watched based on the virtual viewpoint at a specified frame time, wherein the specified frame time is determined according to the instruction of the user, the image is generated based on a data combination and the virtual viewpoint, the data combination comprises pixel data and depth data of a plurality of images which are synchronized at the specified frame time, and the synchronized images have different visual angles with the region to be watched. The technical scheme in the embodiment of the invention can improve the user experience.

Description

Multi-angle free visual angle interaction method and device, medium, terminal and equipment

Technical Field

The invention relates to the field of data processing, in particular to a multi-angle free visual angle interaction method and device, a medium, a terminal and equipment.

Background

In the field of image and video processing, video data can be received, and a played video is displayed to a user. Such playback is typically based on a fixed perspective, and the user experience is to be improved.

Disclosure of Invention

The technical problem solved by the embodiment of the invention is how to provide a method capable of carrying out multi-angle free visual angle interaction.

To solve the above technical problem, an embodiment of the present invention provides a multi-angle free-view interaction method, including: receiving a user instruction in the video playing process; determining a virtual viewpoint according to the user instruction, wherein the virtual viewpoint is selected from a multi-angle free visual angle range, and the multi-angle free visual angle range is a range supporting the switching and watching of viewpoints of an area to be watched; displaying an image which is watched on the to-be-watched area based on the virtual viewpoint at a specified frame time, wherein the specified frame time is determined according to the user instruction, the image is generated based on a data combination and the virtual viewpoint, the data combination comprises pixel data and depth data of a plurality of images which are synchronized at the specified frame time, an association relationship exists between the image data and the depth data of each image, and the plurality of synchronized images have different visual angles to the to-be-watched area.

Optionally, the frame time when the user instruction is received is used as the specified frame time.

Optionally, the specified frame time is determined according to time indication information corresponding to the user indication.

Optionally, the method further includes: receiving a continuous playing instruction; and responding to the continuous playing instruction, and continuously playing the video watched in the region to be watched from the virtual viewpoint from the specified frame time.

Optionally, the method further includes: receiving a continuous playing instruction; and responding to the continuous playing instruction to receive the virtual viewpoint for watching the to-be-watched area before the viewpoint instruction, and playing the video.

Optionally, determining a virtual viewpoint according to the user instruction includes: determining a basic viewpoint for observing an area to be observed, wherein the basic viewpoint comprises the position and the visual angle of the basic viewpoint; and determining the virtual viewpoint by taking the basic viewpoint as a reference according to the user instruction and the incidence relation between the user instruction and the change mode of the virtual viewpoint based on the basic viewpoint.

Optionally, receiving the user indication includes: detecting a path of the touch point on the touch sensitive screen, wherein the path comprises at least one of a starting point, an end point and a moving direction of the touch point, and the path is taken as the user indication.

Optionally, the association relationship between the path and the virtual viewpoint based on the change mode of the base viewpoint includes: the number of the paths is 2, and if at least one contact point in the 2 paths moves in a direction away from the opposite side, the position of the virtual viewpoint moves in a direction close to the region to be viewed.

Optionally, the association relationship between the path and the virtual viewpoint based on the change mode of the base viewpoint includes: the number of the paths is 2, and if at least one contact point in the 2 paths moves in a direction close to the opposite side, the position of the virtual viewpoint moves in a direction far away from the region to be watched.

Optionally, the association relationship between the path and the virtual viewpoint based on the change mode of the base viewpoint includes: the number of the paths is 1, the moving distance of the contact point is associated with the change amplitude of the visual angle, and the moving direction of the contact point is associated with the change direction of the visual angle.

Optionally, the user indication includes a voice control instruction.

Optionally, the user instruction includes a selection of a preset viewpoint for viewing the region to be viewed.

Optionally, the preset viewpoint is used as the virtual viewpoint.

Optionally, the user indication includes: the user selects a particular object in the area to be viewed.

Optionally, before receiving the user instruction, the method further includes: determining a specific object of the region to be watched through an image recognition technology; providing a selection option for the particular object.

Optionally, the user indication includes at least one of a position and a perspective of the virtual viewpoint.

Optionally, the user indication includes a voice control instruction.

Optionally, the user indication includes: attitude change information from at least one of a gyroscope or a gravity sensor.

Optionally, a trigger instruction is received, and a user instruction is received in response to the trigger instruction.

The embodiment of the invention also provides a multi-angle free visual angle interaction device, which comprises: the instruction receiving unit is suitable for receiving user instructions in the video playing process; a virtual viewpoint determining unit, adapted to determine a virtual viewpoint according to the user instruction, where the virtual viewpoint is selected from a multi-angle free view range, and the multi-angle free view range is a range supporting switching viewing of viewpoints of an area to be viewed; the display unit is suitable for displaying an image for watching the to-be-watched area based on the virtual viewpoint at a specified frame time, the specified frame time is determined according to the user instruction, the image is generated based on a data combination and the virtual viewpoint, the data combination comprises pixel data and depth data of a plurality of images which are synchronized at the specified frame time, an association relationship exists between the image data and the depth data of each image, and the synchronized images are different from the view angle of the to-be-watched area.

The embodiment of the invention also provides a computer readable storage medium, wherein computer instructions are stored on the computer readable storage medium, and the computer instructions execute the steps of the multi-angle free visual angle interaction method when running.

The embodiment of the invention also provides a terminal which comprises a memory and a processor, wherein the memory is stored with a computer instruction capable of running on the processor, and the processor executes the multi-angle free visual angle interaction method when running the computer instruction.

An embodiment of the present invention further provides a mobile device, including a communication component, a processor, and a display component: the communication component is used for receiving multi-angle free view data, and the multi-angle free view data comprises a data combination; the processor is used for rendering based on the virtual viewpoint according to the multi-angle free view data to obtain an image for watching the to-be-watched area based on the virtual viewpoint; the display component is used for displaying the image which is viewed on the region to be viewed based on the virtual viewpoint.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, in the video playing process, the user instruction is received, the virtual viewpoint is determined according to the user instruction, and the image to be watched in the area to be watched based on the virtual viewpoint at the specified frame time determined according to the user instruction is displayed, so that the user can freely watch the image interested in the video at multiple angles in the video watching process, and the user experience can be improved.

Drawings

FIG. 1 is a schematic diagram of a region to be viewed according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an arrangement of a collecting apparatus according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a multi-angle free-view display system according to an embodiment of the present invention;

FIG. 4 is a schematic illustration of a device display in an embodiment of the invention;

FIG. 5 is a schematic diagram of an apparatus for controlling according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of another embodiment of the present invention;

FIG. 7 is a schematic diagram of another arrangement of the collecting apparatus in the embodiment of the present invention;

FIG. 8 is a schematic illustration of another manipulation of the apparatus in an embodiment of the present invention;

FIG. 9 is a schematic illustration of a display of another apparatus in an embodiment of the invention;

FIG. 10 is a flow chart of a method for setting up a collection device according to an embodiment of the present invention;

FIG. 11 is a diagram illustrating a multi-angle free viewing range according to an embodiment of the present invention;

FIG. 12 is a diagram illustrating another multi-angle free viewing range in an embodiment of the present invention;

FIG. 13 is a diagram illustrating another multi-angle free viewing range in an embodiment of the present invention;

FIG. 14 is a diagram illustrating another multi-angle free view range according to an embodiment of the present invention;

FIG. 15 is a diagram illustrating another multi-angle free viewing range in an embodiment of the present invention;

FIG. 16 is a schematic diagram of another arrangement of the acquisition equipment in the embodiment of the invention;

FIG. 17 is a schematic diagram of another arrangement of the collecting apparatus in the embodiment of the present invention;

FIG. 18 is a schematic diagram of another arrangement of the acquisition equipment in the embodiment of the invention;

FIG. 19 is a flowchart of a multi-angle freeview data generating method according to an embodiment of the present invention;

FIG. 20 is a diagram illustrating distribution positions of pixel data and depth data of a single image according to an embodiment of the present invention;

FIG. 21 is a diagram illustrating distribution positions of pixel data and depth data of another single image according to an embodiment of the present invention;

FIG. 22 is a diagram illustrating distribution positions of pixel data and depth data of an image according to an embodiment of the present invention;

FIG. 23 is a diagram illustrating distribution positions of pixel data and depth data of another image according to an embodiment of the present invention;

FIG. 24 is a diagram illustrating distribution positions of pixel data and depth data of another image according to an embodiment of the present invention;

FIG. 25 is a diagram illustrating distribution positions of pixel data and depth data of another image according to an embodiment of the present invention;

FIG. 26 is a schematic illustration of image region stitching according to an embodiment of the present invention;

FIG. 27 is a schematic structural diagram of a stitched image in an embodiment of the present invention;

FIG. 28 is a schematic structural diagram of another stitched image in an embodiment of the present invention;

FIG. 29 is a schematic structural diagram of another stitched image in an embodiment of the present invention;

FIG. 30 is a schematic structural diagram of another stitched image in an embodiment of the present invention;

FIG. 31 is a schematic structural diagram of another stitched image in an embodiment of the present invention;

FIG. 32 is a schematic structural diagram of another stitched image in an embodiment of the present invention;

FIG. 33 is a diagram illustrating a pixel data distribution of an image according to an embodiment of the present invention;

FIG. 34 is a schematic diagram of a pixel data distribution of another image in an embodiment of the invention;

FIG. 35 is a diagram illustrating data storage in a stitched image, in accordance with an embodiment of the present invention;

FIG. 36 is a schematic illustration of data storage in another stitched image in an embodiment of the present invention;

FIG. 37 is a flowchart illustrating a multi-angle freeview video data generating method according to an embodiment of the present invention;

FIG. 38 is a flowchart illustrating a multi-angle freeview data processing method according to an embodiment of the present invention;

FIG. 39 is a flowchart illustrating a method for reconstructing a virtual viewpoint image according to an embodiment of the present invention;

FIG. 40 is a flowchart illustrating a multi-angle freeview image data processing method according to an embodiment of the present invention;

FIG. 41 is a flowchart illustrating a multi-angle freeview video data processing method according to an embodiment of the present invention;

FIG. 42 is a flowchart of a multi-angle free-view interaction method according to an embodiment of the present invention;

FIG. 43 is a schematic view of another embodiment of the present invention;

FIG. 44 is a schematic illustration of a display of another apparatus in an embodiment of the invention;

FIG. 45 is a schematic view of another embodiment of the present invention;

FIG. 46 is a schematic illustration of a display of another apparatus in an embodiment of the invention;

FIG. 47 is a flowchart illustrating another multi-angle free-view interaction method according to an embodiment of the present invention;

FIG. 48 is a schematic structural diagram of a multi-angle freeview interaction apparatus according to an embodiment of the present invention;

fig. 49 is a schematic structural diagram of a virtual viewpoint determining unit in the embodiment of the present invention;

FIG. 50 is a diagram illustrating a multi-angle freeview data generation process according to an embodiment of the present invention;

FIG. 51 is a schematic diagram of a multi-camera 6DoF acquisition system in an embodiment of the present invention;

FIG. 52 is a diagram illustrating the generation and processing of 6DoF video data according to an embodiment of the present invention;

FIG. 53 is a diagram illustrating a structure of a header file according to an embodiment of the present invention;

FIG. 54 is a diagram illustrating a user-side processing of 6DoF video data according to an embodiment of the present invention;

FIG. 55 is a schematic diagram of the inputs and outputs of a reference software in an embodiment of the invention;

FIG. 56 is a diagram of an algorithm architecture of a reference software according to an embodiment of the present invention.

Detailed Description

As described in the background, in the field of image and video processing, video data can be received and a playing video can be presented to a user. Such playback is typically based on a fixed perspective, and the user experience is to be improved.

In the embodiment of the invention, in the video playing process, the indication of the user is received, the virtual viewpoint is determined according to the indication of the user, and the image to be watched in the watching area based on the virtual viewpoint is displayed at the specified frame time determined according to the indication of the user, so that the user can freely watch the image which is interested in the video at multiple angles in the video watching process, and the user experience can be improved.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below.

As an example embodiment of the invention, the applicant discloses the following steps: the first step is acquisition and Depth map calculation, which includes three main steps, Multi-Camera Video Capturing, Camera inside and outside Parameter calculation (Camera Parameter Estimation), and Depth map calculation (Depth map calculation). For multi-camera acquisition, it is desirable that the video acquired by the various cameras be frame-level aligned. Referring to fig. 50 in combination, Texture images (Texture images), i.e., synchronized images, described later, may be obtained by video capture with multiple cameras; camera parameters (Camera Parameter) can be obtained by calculating internal and external parameters of the Camera, including internal Parameter data and external Parameter data described later; through the depth map calculation, a depth map (DepthMap) can be obtained.

In this scheme, no special camera, such as a light field camera, is required for video acquisition. Likewise, complicated camera calibration prior to acquisition is not required. Multiple cameras can be laid out and arranged to better capture objects or scenes to be photographed. Referring to fig. 51 in combination, a plurality of capturing devices, for example, cameras 1 to N, may be provided in the area to be viewed.

After the above three steps are processed, the texture map, all the camera parameters and the depth map of each camera acquired from the multiple cameras are obtained. These three portions of data may be referred to as data files in multi-angle freeview video data, and may also be referred to as 6-degree-of-freedom video data (6DoF video data). With the data, the user end can generate a virtual viewpoint according to a virtual 6 Degree of Freedom (DoF) position, thereby providing a video experience of 6 DoF.

Referring to fig. 52 in combination, the 6DoF video data and the indicative data may be compressed and transmitted to the user side, and the user side may obtain the 6DoF expression of the user side according to the received data, that is, the 6DoF video data and the metadata. Indicative data, which may also be referred to as Metadata (Metadata),

referring to fig. 53 in combination, the metadata may be used to describe a data schema of the 6DoF video data, and specifically may include: stitching Pattern metadata (Stitching Pattern metadata) indicating storage rules of pixel data and depth data of a plurality of images in a stitched image; edge protection metadata (Padding pattern metadata), which may be used to indicate the way edge protection is performed in the stitched image, and Other metadata (Other metadata). The metadata may be stored in a header file, and the specific order of storage may be as shown in FIG. 51, or in other orders.

Referring to fig. 54, the client obtains 6DoF video data, which includes camera parameters, texture maps, and depth maps, and description metadata (metadata), and in addition, interactive behavior data of the client. Through these data, the user end may perform 6DoF Rendering in a Depth Image-Based Rendering (DIBR) manner, so as to generate an Image of a virtual viewpoint at a specific 6DoF position generated according to the user behavior, that is, according to the user instruction, determine the virtual viewpoint at the 6DoF position corresponding to the instruction.

In one embodiment implemented at test time, each test case contained 20 seconds of video data, 30 frames/second, 1920 x 1080 resolution. For any of the 30 cameras, there are a total of 600 frames. The main folder contains a texture map folder and a depth map folder. Under the texture folder, secondary directories from 0-599 can be found, which represent 600 frames of content for 20 seconds of video, respectively. Under each secondary directory, the texture maps acquired by 30 cameras are included, named from 0.yuv to 29.yuv in the format of yuv420, and under the folder of the depth maps, each secondary directory includes 30 depth maps calculated by a depth estimation algorithm. Each depth map corresponds to a texture map by the same name. The texture maps and corresponding depth maps of multiple cameras are all of a certain frame instant in 20 seconds of video.

All the depth maps in the test case are generated by a preset depth estimation algorithm. In tests, these depth maps may provide good virtual viewpoint reconstruction quality at virtual 6DoF positions. In one case, the reconstructed image of the virtual viewpoint can be generated directly from the given depth map. Alternatively, the depth map may be generated or refined by a depth calculation algorithm based on the original texture map.

The test case contains, in addition to the depth map and texture map, an sfm file, which is a parameter used to describe all 30 cameras. The data of this file is written in binary format, and the specific data format is described below. In consideration of adaptability to different cameras, a fish-eye camera model with distortion parameters is adopted in the test. Reference can be made to the DIBR reference software we provide to see how to read and use camera parameter data from the document. The camera parameter data contains some of the following fields:

(1) krt _ R is the rotation matrix of the camera;

(2) krt _ cc is the optical center position of the camera;

(3) krt _ world position is the three-dimensional spatial coordinates of the camera;

(4) krt _ kc is the distortion coefficient of the camera;

(5) src _ width is the width of the calibration image;

(6) src _ height is the height of the calibration image;

(7) fishereye _ radius and lens _ fov are parameters of the fisheye camera.

In the technical solution of the present invention, the user can find out how to read the codes of the corresponding parameters in the sfm file in detail from the preset parameter reading function (set _ sfm _ parameters function).

In DIBR reference software, camera parameters, texture maps, depth maps, and the 6DoF position of the virtual camera are received as inputs, while the generated texture maps and depth maps at the virtual 6DoF position are output. The 6DoF position of the virtual camera is the aforementioned 6DoF position determined from the user behavior. The DIBR reference software may be software implementing virtual viewpoint-based image reconstruction in an embodiment of the present invention.

Referring to fig. 55 in conjunction, in the reference software, the camera parameters, texture map, depth map, and 6DoF position of the virtual camera are received as inputs, while the generated texture map at the virtual 6DoF position and the generated depth map are output.

Referring collectively to FIG. 56, the software may include several process steps: camera selection (camera selection), Forward mapping of Depth map (Depth map), Depth map post-processing (Postprocessing), Texture map Backward mapping (background Projection of Texture map), multi-camera mapping Texture map Fusion (Texture Fusion), and hole filling of image (Inpainting).

In the reference software, the two cameras closest to the virtual 6DoF position may be selected by default for virtual viewpoint generation.

In the step of post-processing of the depth map, the quality of the depth map may be improved by various methods, such as foreground edge protection, filtering at the pixel level, etc.

For the output generated image, a method of fusing texture maps captured from two cameras is used. The fusion weight is a global weight determined by the distance of the position of the virtual viewpoint from the reference camera position. In the case where a pixel of the output virtual visual point image is mapped by only one camera, that mapped pixel may be directly adopted as the value of the output pixel.

After the fusion step, if the pixels with holes are not mapped, the pixels with holes can be filled by adopting an image filling method.

For the output depth map, for convenience of error and analysis, a depth map mapped from one of the cameras to the virtual viewpoint position may be used as an output.

It is to be understood that the above examples are only illustrative and not restrictive of the specific embodiments, and the technical solutions of the present invention will be further described below.

Referring to fig. 1, the area to be watched may be a basketball court, and a plurality of collecting devices may be provided to collect data of the area to be watched.

For example, with combined reference to FIG. 2, it may be at a height H above the basket_LKSeveral acquisition devices are arranged along a certain path, for example, 6 acquisition devices may be arranged along an arc, that is, an acquisition device CJ₁To CJ₆. It is understood that the arrangement position, number and supporting manner of the collecting device can be various, and is not limited herein.

The acquisition devices may be cameras or video cameras capable of synchronized shooting, for example, may be cameras or video cameras capable of synchronized shooting through a hardware synchronization line. Data acquisition is carried out on the region to be watched through a plurality of acquisition devices, and a plurality of synchronous images or video streams can be obtained. According to the video streams collected by the plurality of collecting devices, a plurality of synchronous frame images can be obtained as a plurality of synchronous images. It will be appreciated that synchronisation is ideally intended to correspond to the same time instant, but that errors and deviations may also be tolerated.

With reference to fig. 3 in combination, in the embodiment of the present invention, data acquisition may be performed on an area to be viewed by an acquisition system 31 including a plurality of acquisition devices; the acquired synchronized images may be processed by the acquisition system 31 or by the server 32 to generate multi-angle free view data that can support the display device 33 to perform virtual viewpoint switching. The displaying device 33 may display a reconstructed image generated based on the multi-angle free view data, the reconstructed image corresponding to a virtual viewpoint, display reconstructed images corresponding to different virtual viewpoints according to a user instruction, and switch viewing positions and viewing angles.

In a specific implementation, the process of reconstructing the image to obtain the reconstructed image may be implemented by the device 33 for displaying, or may be implemented by a device located in a Content Delivery Network (CDN) in an edge computing manner. It is to be understood that fig. 3 is an example only and is not limiting of the acquisition system, the server, the device performing the display, and the specific implementation. The process of image reconstruction based on multi-angle freeview data will be described in detail later with reference to fig. 38 to 41, and will not be described herein again.

With reference to fig. 4, following the previous example, the user may watch the to-be-watched area through the display device, in this embodiment, the to-be-watched area is a basketball court. As described above, the viewing position and the viewing angle are switchable.

For example, the user may slide on the screen to switch the virtual viewpoint. In an embodiment of the present invention, with reference to fig. 5, when the user's finger slides the screen to the right, the virtual viewpoint for viewing can be switched. Continuing with reference to FIG. 2, the position of the virtual viewpoint prior to sliding may be the VP₁After the sliding screen switches the virtual viewpoint, the position of the virtual viewpoint may be VP₂. Referring collectively to fig. 6, after sliding the screen, the reconstructed image of the screen presentation may be as shown in fig. 6. The reconstructed image may be obtained by image reconstruction based on multi-angle free view data generated from data acquired by a plurality of acquisition devices in an actual acquisition context.

It is to be understood that the image viewed before switching may be a reconstructed image. The reconstructed image may be a frame image in a video stream. In addition, the manner of switching the virtual viewpoint according to the user instruction may be various, and is not limited herein.

In a specific implementation, the virtual viewpoint may be represented by coordinates of 6 degrees of Freedom (DoF), wherein the spatial position of the virtual viewpoint may be represented as (x, y, z) and the viewing angle may be represented as three rotational directions

The virtual viewpoint is a three-dimensional concept, and three-dimensional information is required for generating a reconstructed image. In a specific implementation, the multi-angle freeview data may include depth data for providing third-dimensional information outside the planar image. The amount of data for depth data is small compared to other implementations, such as providing three-dimensional information through point cloud data. The specific implementation of generating the multi-angle freeview data will be described in detail later with reference to fig. 19 to 37, and will not be described in detail here.

In the embodiment of the invention, the switching of the virtual viewpoint can be performed within a certain range, namely a multi-angle free visual angle range. That is, in the multi-angle free view range, the virtual view position and the view angle can be switched at will.

The multi-angle free visual angle range is related to the arrangement of the collecting equipment, and the wider the shooting coverage range of the collecting equipment is, the larger the multi-angle free visual angle range is. The quality of the picture displayed by the display equipment is related to the number of the acquisition equipment, and generally, the larger the number of the acquisition equipment is, the smaller the hollow area in the displayed picture is.

Referring to fig. 7, if two rows of upper and lower collecting devices with different heights are installed in the basketball court, the collecting devices CJ are respectively arranged in the upper row₁To CJ₆And lower row of collection devices CJ₁To CJ₆Compared with the arrangement of only one row of collecting equipment, the multi-angle free visual angle range is larger.

Referring to fig. 8 in combination, the user's finger can slide upward, switching the virtual viewpoint from which to view. Referring collectively to fig. 9, after sliding the screen, the image of the screen presentation may be as shown in fig. 9.

In specific implementation, if only one row of collecting devices is arranged, a certain degree of freedom in the up-down direction can be obtained in the process of obtaining a reconstructed image through image reconstruction, and the multi-angle free visual angle range of the multi-angle free visual angle range is smaller than that of two rows of collecting devices arranged in the up-down direction.

It can be understood by those skilled in the art that the foregoing embodiments and the corresponding drawings are only illustrative, and are not limited to the setting of the capturing device and the association relationship between the multi-angle free viewing angle ranges, nor to the operation manner and the obtained display effect of the device for displaying. The specific implementation of performing virtual viewpoint switching viewing on the region to be viewed according to the user instruction will be further described in detail with reference to fig. 43 to 47, and will not be described herein again.

The following further elaborations are made in particular with regard to the method of setting up the acquisition device.

Fig. 10 is a flowchart of a method for setting acquisition equipment in an embodiment of the present invention, which may specifically include the following steps:

step S101, determining a multi-angle free visual angle range, and supporting the switching and watching of virtual viewpoints of an area to be watched in the multi-angle free visual angle range;

and S102, determining a setting position of acquisition equipment at least according to the multi-angle free visual angle range, wherein the setting position is suitable for setting the acquisition equipment and carrying out data acquisition on the area to be watched.

It will be understood by those skilled in the art that a fully free viewing angle may refer to a 6-degree-of-freedom viewing angle, i.e. a spatial position and a viewing angle at which a user can freely switch a virtual viewpoint at a device performing a display. Wherein the spatial position of the virtual viewpoint can be represented as (x, y, z), and the viewing angle can be represented as three rotational directions

There are 6 degrees of freedom directions in total, so it is called 6 degrees of freedom viewing angle.

As described above, in the embodiment of the present invention, the switching of the virtual viewpoint may be performed within a certain range, which is a multi-angle free view range. That is, within the multi-angle free view range, the virtual viewpoint position and the view can be arbitrarily switched.

The multi-angle free visual angle range can be determined according to the requirements of application scenes. For example, in some scenarios, the area to be viewed may have a core viewpoint, such as the center of a stage, or the center point of a basketball court, or the basket of a basketball court, etc. In these scenes, the multi-angle freeview range may include a plane or stereoscopic region that includes the core viewpoint. It is understood that the region to be viewed may be a point, a plane or a stereoscopic region, and is not limited thereto.

As mentioned above, the multi-angle free viewing angle range may be a variety of regions, which will be further exemplified below with reference to fig. 11 to 15.

Referring to FIG. 11, the core viewpoint is represented by O point, and the multi-angle free view range may be a sector area, such as sector area A, with the core viewpoint as the center of the circle and located on the same plane as the core viewpoint₁OA₂Or a sector area B₁OB₂Or a circular surface centered on the point O.

Takes the multi-angle free visual angle range as a sector area A₁OA₂For example, the position of the virtual viewpoint may be continuously switched within the area, e.g., from A₁Along the arc segment A₁A₂Continuously switching to A₂Alternatively, the arc line segment L may be followed₁L₂Switching is performed or otherwise the position is switched within the multi-angle free view range. Accordingly, the view angle of the virtual viewpoint may also be transformed within the region.

With further reference to fig. 12, the core viewpoint may be a central point E of the basketball court, and the multi-angle free view range may be a sector area, such as sector area F, with the central point E as a circle center and located on the same plane as the central point E₁₂₁EF₁₂₂. The center point E of the basketball court may be located on the court floor, or the center point E of the basketball court may be a certain height from the ground. Arc end point F of sector area₁₂₁And arc end point F₁₂₂May be the same, e.g. height H in the figure₁₂₁。

Referring to FIG. 13 in conjunction, where the core viewpoint is represented by point O, the multi-angle free view range may be a portion of a sphere centered at the core viewpoint, e.g., region C₁C₂C₃C₄Indicating a partial region of a sphere, the multi-angle free viewing angle range may be region C₁C₂C₃C₄The stereo range formed with point O. Any point within the range can be used as the position of the virtual viewpoint.

With further reference to FIG. 14, the core viewpoint may be the center point E of the basketball court, and the multi-angle viewing range may be the ball centered at the center point EPart of the body, e.g. region F₁₃₁F₁₃₂F₁₃₃F₁₃₄Indicating a partial area of a sphere, the multi-angle free view range may be area F₁₃₁F₁₃₂F₁₃₃F₁₃₄A stereo range formed with the center point E.

In a scene with a core viewpoint, the position of the core viewpoint may be various, and the multi-angle free viewing angle range may also be various, which is not listed here. It is to be understood that the above embodiments are only examples and are not limiting on the multi-angle free view range, and the shapes shown therein are not limiting on actual scenes and applications.

In specific implementation, the core viewpoint may be determined according to a scene, in one shooting scene, there may also be multiple core viewpoints, and the multi-angle free view range may be a superposition of multiple sub-ranges.

In other application scenarios, the multi-angle free view range may also be coreless, for example, in some application scenarios, it is desirable to provide multi-angle free view viewing of historic buildings or of paintings. Accordingly, the multi-angle free view range can be determined according to the needs of the scenes.

It is understood that the shape of the free view range may be arbitrary, and any point within the multi-angle free view range may be used as a position.

Referring to FIG. 15, the multi-angle free view range may be a cube D₁D₂D₃D₄D₅D₆D₇D₈The region to be viewed is a surface D₁D₂D₃D₄Then cube D₁D₂D₃D₄D₅D₆D₇D₈Any point in the virtual viewpoint can be used as the position of the virtual viewpoint, and the viewing angle of the virtual viewpoint, that is, the viewing angle, can be various. For example, may be on the surface D₅D₆D₇D₈Selecting position E₆Along E₆D₁Can also be viewed from a viewing angleTo edge E₆D₉Viewed at an angle of (D)₉Selected from the area to be viewed.

In a specific implementation, after the multi-angle free view range is determined, the position of the acquisition equipment can be determined according to the multi-angle free view range.

Specifically, the setting position of the capturing device may be selected within the multi-angle free view range, for example, the setting position of the capturing device may be determined in a boundary point of the multi-angle free view range.

Referring to fig. 16, the core viewpoint may be a central point E of the basketball court, and the multi-angle free view range may be a sector area, such as sector area F, with the central point E as a center and located on the same plane as the central point E₆₁EF₆₂. The acquisition device may be arranged within a multi-angle view range, e.g. along arc F₆₅F₆₆Setting. The image reconstruction can be performed by using an algorithm in the area not covered by the acquisition equipment. In particular implementations, the acquisition device may also follow an arc F₆₁F₆₂And setting acquisition equipment at the end point of the arc line to improve the quality of the reconstructed image. Each collection device may be positioned toward a center point E of the basketball court. The position of the acquisition device may be represented by spatial position coordinates and the orientation may be represented by three rotational directions.

In specific implementation, the number of the settable setting positions can be 2 or more, and correspondingly, 2 or more acquisition devices can be set. The number of acquisition devices can be determined according to the quality requirement of the reconstructed image or video. In scenes with higher picture quality requirements for the reconstructed image or video, the number of capture devices may be greater, while in scenes with lower picture quality requirements for the reconstructed image or video, the number of capture devices may be less.

With continued reference to fig. 16, it can be appreciated that the reduction of voids in the reconstructed picture, in the pursuit of higher reconstructed image or video picture quality, can be along an arc F₆₁F₆₂A greater number of acquisition devices are provided, for example, 40 cameras may be provided.

Referring to fig. 17, the core viewpoint may be a center point E of the basketball court, and the multi-angle view range may be a portion of a sphere centered at the center point E, such as the area F₆₁F₆₂F₆₃F₆₄Indicating a partial area of a sphere, the multi-angle free view range may be area F₆₁F₆₂F₆₃F₆₄A stereo range formed with the center point E. The acquisition device may be arranged within a multi-angle view range, e.g. along arc F₆₅F₆₆And arc F₆₇F₆₈And (4) setting. Similar to the previous example, the image reconstruction may be performed using an algorithm in the area not covered by the acquisition device. In particular implementations, the acquisition device may also follow an arc F₆₁F₆₂And arc line F₆₃F₆₄And setting acquisition equipment at the end point of the arc line so as to improve the quality of the reconstructed image.

Each collection device may be positioned toward a center point E of the basketball court. It will be appreciated that, although not shown, the number of acquisition devices may be along the arc F₆₁F₆₂And arc line F₆₃F₆₄More.

As mentioned before, in some application scenarios, the region to be viewed may comprise a core viewpoint, and correspondingly, the multi-angle free view range comprises a region where views point to the core viewpoint. In this application scenario, the position of the capture device may be selected from an arc-shaped region with the direction of the depression pointing towards the core viewpoint.

When the region to be watched comprises the core watching point, the setting position is selected in the arc region pointing to the core watching point in the depression direction, so that the acquisition equipment is arranged in an arc shape. Because the watching area comprises the core watching point, and the visual angle points to the core watching point, in the scene, the arc-shaped arrangement of the acquisition equipment can adopt less acquisition equipment to cover a larger multi-angle free visual angle range.

In a specific implementation, the setting position of the acquisition device can be determined by combining the view angle range and the boundary shape of the region to be watched. For example, the setting positions of the capturing devices may be determined at preset intervals along the boundary of the region to be viewed within the viewing angle range.

Referring to FIG. 18 in combination, the multi-angle view range may be coreless, e.g., the virtual view location may be selected from the hexahedron F₈₁F₈₂F₈₃F₈₄F₈₅F₈₆F₈₇F₈₈And viewing the region to be viewed from the virtual viewpoint position. The boundary of the region to be viewed can be the ground boundary line of the court. The collecting equipment can be arranged along the intersection B of the ground boundary line and the region to be watched₈₉B₉₄Arranged, for example, in position B₈₉To position B ₉₄6 acquisition devices are provided. The degree of freedom in the up-down direction can be realized by an algorithm, or the horizontal projection position can be an intersecting line B₈₉B₉₄And then a row of collecting equipment is arranged.

In specific implementation, the multi-angle free viewing angle range can also support the viewing of the region to be viewed from the upper side of the region to be viewed, which is a direction away from the horizontal plane.

Correspondingly, can carry on collection equipment through unmanned aerial vehicle to set up collection equipment at the upside that waits to watch the region, also can set up collection equipment at the top of waiting to watch the building at region place, the top does the structure of building in the direction of keeping away from the horizontal plane.

For example, the collecting device can be arranged at the top of a basketball court or hovered on the upper side of the basketball court by carrying the collecting device by the unmanned aerial vehicle. Can set up collection equipment at the venue top at stage place, perhaps also can carry on through unmanned aerial vehicle.

By arranging the acquisition equipment at the upper side of the region to be watched, the multi-angle free visual angle range can comprise the visual angle above the region to be watched.

In a specific implementation, the capturing device may be a camera or a video camera, and the captured data may be picture or video data.

It is understood that the manner of disposing the collecting device at the disposing position may be various, for example, the collecting device may be supported at the disposing position by the supporting frame, or other disposing manners may be possible.

In addition, it is to be understood that the above embodiments are only for illustration and are not limiting on the setting mode of the acquisition device. In various application scenes, the specific implementation modes of determining the setting position of the acquisition equipment and setting the acquisition equipment for acquisition according to the multi-angle free visual angle range are all within the protection scope of the invention.

The following is further described with particular reference to a method of generating multi-angle freeview data.

As previously described, with continued reference to fig. 3, the acquired synchronized multiple images may be processed by the acquisition system 31 or by the server 32 to generate multi-angle freeview data capable of supporting virtual view switching by the displaying device 33, where the multi-angle freeview data may indicate third-dimensional information outside the two-dimensional images by depth data.

Specifically, referring to fig. 19 in combination, the generating of the multi-angle freeview data may include the steps of:

step S191, a plurality of synchronous images are obtained, and shooting angles of the images are different;

step S192 of determining depth data of each image based on the plurality of images;

step S193, for each of the images, storing pixel data of each image in a first field, and storing the depth data in at least one second field associated with the first field.

The synchronized plurality of images may be images captured by a camera or frame images in video data captured by a video camera. In generating the multi-angle freeview data, depth data for each image may be determined based on the plurality of images.

Wherein the depth data may comprise depth values corresponding to pixels of the image. The distance of the acquisition device to each point in the area to be viewed may be taken as the above-mentioned depth value, which may directly reflect the geometry of the visible surface in the area to be viewed. The depth value may be a distance of each point in the area to be viewed along the optical axis of the camera to the optical center, and the origin of the camera coordinate system may be the optical center. It will be appreciated by those skilled in the art that the distance may be a relative value, with multiple images on the same basis.

Further, the depth data may include depth values corresponding one-to-one to the pixels of the image, or may be partial values selected from a set of depth values corresponding one-to-one to the pixels of the image.

It will be understood by those skilled in the art that the depth value set may be stored in the form of a depth map, in a specific implementation, the depth data may be obtained by down-sampling an original depth map, and the depth value sets corresponding to pixels of the image in a one-to-one manner are stored in the form of an image arranged according to pixel points of the image.

In a specific implementation, the pixel data of the image stored in the first field may be raw image data, such as data acquired from an acquisition device, or may be data obtained by reducing resolution of the raw image data. Further, the pixel data of the image may be original pixel data of the image, or pixel data after resolution reduction. The pixel data of the image may be YUV data or RGB data, or may be other data capable of expressing the image.

In a specific implementation, the depth data stored in the second field may be the same as or different from the number of pixels corresponding to the pixel data of the image stored in the first field. The number can be determined according to the bandwidth limitation of data transmission with the equipment end for processing the multi-angle free visual angle image data, and if the bandwidth is smaller, the data volume can be reduced by the modes of down-sampling or resolution reduction and the like.

In a specific implementation, for each of the images, the pixel data of the image may be sequentially stored in a plurality of fields according to a preset sequence, and the fields may be consecutive or may be spaced apart from the second field. A field storing pixel data of an image may be the first field. The following examples are given for illustration.

Referring to fig. 20, pixel data of an image, which is illustrated by pixels 1 to 6 in the figure and other pixels not shown, may be stored in a predetermined order into a plurality of consecutive fields, which may be used as a first field; the depth data corresponding to the image, indicated by depth values 1 to 6 in the image and other depth values not shown, may be stored in a plurality of consecutive fields in a predetermined order, and these consecutive fields may be used as the second field. The preset sequence may be sequentially stored line by line according to the distribution positions of the image pixels, or may be other sequences.

Referring to fig. 21, pixel data of one image and corresponding depth values may be alternately stored in a plurality of fields. A plurality of fields storing pixel data may be used as a first field, a plurality of fields storing depth values may be used as a second field.

In a particular implementation, the depth data is stored, and may be stored in the same order as the pixel data of the stored image, such that each field in the first field may be associated with each field in the second field. And the depth value corresponding to each pixel can be embodied.

In particular implementations, pixel data for multiple images and depth data may be stored in a variety of ways. The following examples are further described below.

Referring collectively to fig. 22, the individual pixels of image 1, illustrated as image 1 pixel 1, image 1 pixel 2, and other pixels not shown, may be stored in a continuous field, which may serve as the first field. The depth data of image 1, illustrated as image 1 depth value 1, image 1 depth value 2 shown in the figure, and other depth data not shown, may be stored in fields adjacent to the first field, which may serve as the second field. Similarly, for pixel data of image 2, it may be stored in a first field and depth data of image 2 may be stored in an adjacent second field.

It can be understood that each image in the image stream continuously acquired by one acquisition device of the synchronized multiple acquisition devices, or each frame image in the video stream, may be respectively used as the image 1; similarly, among the plurality of acquisition devices synchronized, an image acquired in synchronization with the image 1 may be the image 2. The acquisition device may be an acquisition device as in fig. 2, or an acquisition device in other scenarios.

Referring to fig. 23 in combination, the pixel data of image 1 and the pixel data of image 2 may be stored in a plurality of adjacent first fields, and the depth data of image 1 and the depth data of image 2 may be stored in a plurality of adjacent second fields.

Referring to fig. 24 in combination, the pixel data of each of the plurality of images may be stored in a plurality of fields, respectively, which may be referred to as a first field. The field storing the pixel data may be arranged to intersect with the field storing the depth value.

Referring to fig. 25 in conjunction, the pixel data and depth values of different images may also be arranged in a crossed manner, for example, image 1 pixel 1, image 1 depth value 1, image 2 pixel 1, and image 2 depth value 1 … may be stored in sequence until the pixel data and depth value corresponding to the first pixel of each of the plurality of images is completed, and the adjacent fields thereof store image 1 pixel 2, image 1 depth value 2, image 2 pixel 2, and image 2 depth value 2 … until the storage of the pixel data and depth data of each image is completed.

In summary, the field storing the pixel data of each image may be used as the first field, and the field storing the depth data of the image may be used as the second field. For each image, a first field and a second field associated with the first field may be stored.

It will be appreciated by those skilled in the art that the various embodiments described above are merely examples and are not specific limitations on the types, sizes, and arrangements of fields.

Referring to fig. 3 in combination, the multi-angle freeview data including the first field and the second field may be stored in the server 32 at the cloud, and transmitted to the CDN or the display device 33 for image reconstruction.

In a specific implementation, the first field and the second field may both be pixel fields in a stitched image, and the stitched image is used to store pixel data of the plurality of images and the depth data. By adopting the image format for data storage, the data volume can be reduced, the data transmission duration can be reduced, and the resource occupation can be reduced.

The stitched image may be an image in a variety of formats, such as BMP format, JPEG format, PNG format, and the like. These image formats may be compressed formats or may be uncompressed formats. Those skilled in the art will appreciate that images of various formats may include fields, referred to as pixel fields, corresponding to individual pixels. The size of the stitched image, that is, parameters such as the number of pixels contained in the stitched image, the aspect ratio, and the like, may be determined as needed, and specifically may be determined according to the number of the synchronized multiple images, the data amount to be stored in each image, the data amount of the depth data to be stored in each image, and other factors.

In a specific implementation, the depth data corresponding to the pixels of each image and the number of bits of the pixel data in the synchronized images may be associated with the format of the stitched image.

For example, when the format of the stitched image is the BMP format, the range of the depth values may be 0-255, which is 8-bit data, and the data may be stored as the gray value in the stitched image; alternatively, the depth value may be 16-bit data, and may be stored as a gray value at two pixel positions in the stitched image, or stored in two channels at one pixel position in the stitched image.

When the format of the stitched image is the PNG format, the depth value may also be 8bit or 16bit data, and in the PNG format, the 16bit depth value may be stored as a gray value of a pixel position in the stitched image.

It is understood that the above embodiments are not limited to the storage manner or the number of data bits, and other data storage manners that can be realized by those skilled in the art fall within the scope of the present invention.

In a specific implementation, the stitched image may be divided into an image region and a depth map region, a pixel field of the image region stores pixel data of the plurality of images, and a pixel field of the depth map region stores depth data of the plurality of images; the image area stores a pixel field of pixel data of each image as the first field, and the depth map area stores a pixel field of depth data of each image as the second field.

In a specific implementation, the image region may be a continuous region, and the depth map region may also be a continuous region.

Further, in a specific implementation, the spliced image may be divided equally, and the two divided parts are respectively used as the image area and the depth map area. Alternatively, the stitched image may be divided in a non-equally dividing manner according to the pixel data amount and the depth data amount of the image to be stored

For example, referring to fig. 26, each minimum square indicates one pixel, the image region may be a region 1 within a dashed line frame, that is, an upper half region of the stitched image after being divided into upper and lower halves, and a lower half region of the stitched image may be a depth map region.

It is to be understood that fig. 26 is merely illustrative, and the minimum number of squares is not a limitation on the number of pixels in the mosaic image. In addition, the way of dividing equally may be to divide the stitched image equally left and right.

In a specific implementation, the image region may include a plurality of image sub-regions, each image sub-region for storing one of the plurality of images, and a pixel field of each image sub-region may be used as the first field; accordingly, the depth map region may include a plurality of depth map sub-regions, each for storing depth data of one of the plurality of images, and a pixel field of each depth map sub-region may serve as the second field.

Wherein the number of image sub-regions and the number of depth map sub-regions may be equal, both being equal to the number of synchronized multiple images. In other words, it may be equal to the number of cameras described above.

The splicing image is further described with reference to fig. 27 by taking the vertical bisector of the splicing image as an example. The upper half of the stitched image in fig. 27 is an image area, which is divided into 8 image sub-areas, and the image sub-areas store the pixel data of the synchronized 8 images, and the shooting angles of the images are different, that is, the viewing angles are different. The lower half part of the spliced image is a depth map area which is divided into 8 depth map sub-areas, and the depth maps of the 8 images are respectively stored.

As described above, the pixel data of the 8 synchronized images, that is, the view 1 image to the view 8 image, may be the original image acquired from the camera, or may be the image of the original image with reduced resolution. The depth data is stored in a partial region of the stitched image and may also be referred to as a depth map.

As described above, in an implementation, the stitched image may also be divided in a non-halving manner. For example, referring to fig. 28, the depth data may occupy a smaller number of pixels than the pixel data of the image, and the image region and the depth map region may be of different sizes. For example, the depth data may be obtained by quarter down-sampling the depth map, and a partitioning method as shown in fig. 28 may be adopted. The number of pixels occupied by the depth map may also be greater than the detailed number occupied by the pixel data of the image.

It is to be understood that fig. 28 is not limited to dividing the stitched image in a non-uniform manner, and in a specific implementation, the amount of pixels and the aspect ratio of the stitched image may be various, and the dividing manner may also be various.

In a specific implementation, the image region or the depth map region may also include a plurality of regions. For example, as shown in fig. 29, the image region may be one continuous region, and the depth map region may include two continuous regions.

Alternatively, referring to fig. 30 and 31, the image region may include two continuous regions, and the depth map region may also include two continuous regions. The image region and the depth region may be arranged at intervals.

Still alternatively, referring to fig. 32, the image sub-regions included in the image region may be arranged at intervals from the depth map sub-regions included in the depth map region. The number of contiguous regions comprised by the image region may be equal to the number of image sub-regions, and the number of contiguous regions comprised by the depth map region may be equal to the number of depth map sub-regions.

In a specific implementation, for the pixel data of each image, the pixel data may be stored to the image sub-region according to the order of pixel arrangement. For the depth data of each image, the depth data can also be stored in the depth map sub-area according to the sequence of pixel point arrangement.

With combined reference to fig. 33-35, image 1 is illustrated with 9 pixels in fig. 33, image 2 is illustrated with 9 pixels in fig. 34, and image 1 and image 2 are two images at different angles in synchronization. According to the image 1 and the image 2, the depth data corresponding to the image 1, including the depth value 1 of the image 1 to the depth value 9 of the image 1, can be obtained, and the depth data corresponding to the image 2, including the depth value 1 of the image 2 to the depth value 9 of the image 2, can also be obtained.

Referring to fig. 35, when storing the image 1 in the image sub-region, the image 1 may be stored in the upper left image sub-region according to the order of the arrangement of the pixels, that is, in the image sub-region, the arrangement of the pixels may be the same as that of the image 1. The image 2 is stored to the image sub-area, also to the upper right image sub-area in this way.

Similarly, storing the depth data of the image 1 to the depth map sub-area may be in a similar manner, and in the case where the depth values correspond one-to-one to the pixel values of the image, may be in a manner as shown in fig. 35. If the depth value is obtained by downsampling the original depth map, the depth value can be stored in the sub-region of the depth map according to the sequence of pixel point arrangement of the depth map obtained by downsampling.

As will be understood by those skilled in the art, the compression rate for compressing an image is related to the association of each pixel in the image, and the stronger the association, the higher the compression rate. Because the image that obtains of shooing corresponds real world, the associativity of each pixel is stronger, through the order according to the pixel arranges, pixel data and the depth data of storage image, can be so that when compressing the concatenation image, the compression ratio is higher, also promptly, can make the data bulk after the compression littleer under the same condition of data bulk before the compression.

By dividing the spliced image into an image area and a depth map area, under the condition that a plurality of image sub-areas are adjacent in the image area or a plurality of depth map sub-areas are adjacent in the depth map area, data stored in each image sub-area are obtained from images shot from different angles in an area to be watched or frame images in video, and all data stored in the depth map area are depth maps, so that a higher compression rate can be obtained when the spliced image is compressed.

In a specific implementation, all or part of the image sub-region and the depth map sub-region may be edge protected. The form of edge protection may be various, for example, taking the depth map of view 1 in fig. 31 as an example, redundant pixels may be disposed at the periphery of the depth map of original view 1; or the number of pixels of the original view 1 depth image is kept unchanged, redundant pixels which do not store actual pixel data are reserved on the periphery of the original view 1 depth image, and the original view 1 depth image is reduced and then stored in the rest pixels; or in other ways, finally leaving redundant pixels between the view 1 depth map and other images around it.

Because the spliced image comprises a plurality of images and depth maps, the relevance of the adjacent boundaries of the images is poor, and the quality loss of the images and the depth maps in the spliced image can be reduced by performing edge protection when the spliced image is compressed.

In implementations, the pixel field of the image sub-region may store three channels of data and the pixel field of the depth map sub-region may store single channel data. The pixel field of the image sub-region is used to store pixel data of any one of the plurality of synchronized images, the pixel data typically being three channel data, such as RGB data or YUV data.

The depth map sub-region is used for storing depth data of an image, if the depth value is 8-bit binary data, a single channel of the pixel field can be used for storing, and if the depth value is 16-bit binary data, a double channel of the pixel field can be used for storing. Alternatively, the depth values may be stored with a larger pixel area. For example, if the synchronized images are all 1920 × 1080 images and the depth value is 16-bit binary data, the depth value may be stored in 2 times of the 1920 × 1080 image area, and each image area may be stored as a single channel. The stitched image may also be divided in combination with the specific storage manner.

The uncompressed data volume of the mosaic image is stored in a manner that each channel of each pixel occupies 8 bits, and can be calculated according to the following formula: number of synchronized multiple images (data amount of pixel data of image + data amount of depth map).

If the original image has a resolution of 1080P, i.e. 1920 × 1080 pixels, and is scanned line by line, the original depth map may also occupy 1920 × 1080 pixels, which is a single channel. The pixel data amount of the original image is: 1920 × 1080 × 8 × 3bit, the data volume of the original depth map is 1920 × 1080 × 8bit, if the number of cameras is 30, the pixel data volume of the spliced image is 30 × 1080 (1920 × 1080 × 8 × 3+1920 × 1080) bit, which is about 237M, and if the image is not compressed, the system resources are occupied, and the delay is large. Especially, in the case of a small bandwidth, for example, when the bandwidth is 1Mbps, an uncompressed stitched image needs about 237s for transmission, the real-time performance is poor, and the user experience needs to be improved.

The data volume of the spliced image can be reduced by one or more of the ways of regularly storing to obtain a higher compression rate, reducing the resolution of the original image, taking the pixel data after the resolution reduction as the pixel data of the image, or performing down-sampling on one or more of the original depth maps.

For example, if the original image has a resolution of 4K, i.e., a pixel resolution of 4096 × 2160, and is down-sampled to a resolution of 540P, i.e., a pixel resolution of 960 × 540, the number of pixels of the stitched image is about one sixteenth of the number before down-sampling. The amount of data may be made smaller in combination with any one or more of the other ways of reducing the amount of data described above.

It can be understood that if the bandwidth is supported and the decoding capability of the device performing data processing can support a stitched image with higher resolution, a stitched image with higher resolution can also be generated to improve the image quality.

It will be understood by those skilled in the art that the pixel data and the depth data of the synchronized images may be stored in other ways in different application scenarios, for example, in units of pixels in the stitched image. Referring to fig. 33, 34, and 36, for the image 1 and the image 2 shown in fig. 33 and 34, it may be stored to the stitched image in the manner of fig. 36.

In summary, the pixel data and the depth data of the image may be stored in the stitched image, and the stitched image may be divided into the image region and the depth map region in various ways, or may not be divided, and the pixel data and the depth data of the image are stored in a preset order.

In a specific implementation, the synchronized plurality of images may also be a synchronized plurality of frame images obtained by decoding a plurality of videos. The video may be acquired by a plurality of video cameras, and the settings may be the same as or similar to those of the cameras acquiring images in the foregoing.

In a specific implementation, the generating of the multi-angle freeview image data may further include generating an association relation field, and the association relation field may indicate an association relation of the first field with the at least one second field. The first field stores pixel data of one image in a plurality of synchronous images, and the second field stores depth data corresponding to the image, wherein the pixel data and the depth data correspond to the same shooting angle, namely the same visual angle. The association relationship between the two can be described by the association relationship field.

Taking fig. 27 as an example, in fig. 27, the area storing the view 1 image to the view 8 image is 8 first fields, the area storing the view 1 depth map to the view 8 depth map is 8 second fields, and for the first field storing the view 1 image, there is an association relationship with the second field storing the view 1 depth map, and similarly, there is an association relationship between the field storing the view 2 image and the field storing the view 2 depth map.

The association relation field may indicate an association relation between the first field and the second field of each of the synchronized multiple images in various ways, and specifically may be a content storage rule of the pixel data and the depth data of the synchronized multiple images, that is, by indicating the storage way described in the foregoing, the association relation between the first field and the second field is indicated.

In a specific implementation, the association relationship field may only contain different mode numbers, and the device performing data processing may obtain the storage manner of the pixel data and the depth data in the acquired multi-angle free-viewing angle image data according to the mode number of the field and the data stored in the device performing data processing. For example, if the received pattern number is 1, the storage manner is analyzed as follows: the spliced image is equally divided into an upper area and a lower area, the upper half area is an image area, the lower half area is a depth map area, and an image at a certain position of the upper half area is associated with a depth map stored at a corresponding position of the lower half area.

It can be understood that the storage modes for storing the stitched images in the foregoing embodiments, such as the storage modes illustrated in fig. 27 to fig. 36, may all have descriptions of corresponding association relationship fields, so that the device for processing data may obtain the associated images and depth data according to the association relationship fields.

As described above, the picture format of the stitched image may be any one of BMP, PNG, JPEG, Webp, and other image formats, or may be other image formats. The storage mode of the pixel data and the depth data in the multi-angle free visual angle image data is not limited to the mode of splicing images. The storage can be performed in various ways, and there can also be corresponding association relation field description.

Similarly, the storage mode may be indicated by a mode number. For example, as shown in fig. 23, the association field may store a pattern number 2, and after the device performing data processing reads the pattern number, it may be analyzed that the pixel data of the synchronized multiple images are sequentially stored, and the lengths of the first field and the second field are sequentially stored, and after the storage of the multiple first fields is finished, the depth data of each image is stored in the same storage order as the image. The device for data processing may determine the association between the pixel data of the image and the depth data according to the association field.

It is understood that the storage manner of the pixel data and the depth data of the synchronized images may be various, and the expression manner of the association relation field may also be various. The content may be indicated by the mode number or may be directly indicated. The device performing data processing may determine the association relationship between the pixel data of the image and the depth data according to the content of the association relationship field, in combination with the stored data or other a priori knowledge, for example, the content corresponding to each pattern number, or the specific number of the synchronized multiple images.

In a specific implementation, the generating of the multi-angle freeview image data may further include: based on the synchronized plurality of images, parameter data of each image is calculated and stored, the parameter data including photographing position and photographing angle data of the image.

The equipment for processing data can determine a virtual viewpoint in the same coordinate system with the equipment according to the needs of a user by combining the shooting position and the shooting angle of each image in a plurality of synchronous images, reconstruct the images from the view angle image data based on multiple angles, and show the expected viewing position and view angle for the user.

In a specific implementation, the parameter data may also include internal parameter data including attribute data of a photographing apparatus of the image. The aforementioned shooting position and shooting angle data of the image may also be referred to as external parameter data, and the internal parameter data and the external parameter data may be referred to as attitude data. By combining the internal parameter data and the external parameter data, the factors indicated by the internal parameter data such as lens distortion and the like can be considered during image reconstruction, and the image of the virtual viewpoint can be reconstructed more accurately.

In a specific implementation, the generating of the multi-angle freeview image data may further include: generating a parameter data storage address field, wherein the parameter data storage address field is used for indicating a storage address of the parameter data. The apparatus performing data processing may acquire the parameter data from the storage address of the parameter data.

In a specific implementation, the generating of the multi-angle freeview image data may further include: and generating a data combination storage address field for indicating the storage address of the data combination, namely indicating the storage address of the first field and the second field of each image in the plurality of images which are synchronized. The apparatus for data processing may acquire the pixel data and the depth data of the synchronized plurality of images from the storage space corresponding to the storage address of the data combination, and from this viewpoint, the data combination includes the pixel data and the depth data of the synchronized plurality of images.

It is understood that the multi-angle freeview image data may include specific data such as pixel data, depth data, and parameter data of an image, and other indicative data such as the aforementioned generation association field, parameter data storage address field, data combination storage address field, and the like. These indicative data may be stored in a header file to instruct the device performing the data processing to obtain the data combination, as well as parameter data, etc.

In particular, the noun explanations, specific implementations, and advantageous effects involved in the various embodiments of generating multi-angle freeview data may refer to other embodiments, and various specific implementations in a multi-angle freeview interaction method may be implemented in combination with other embodiments.

The multi-angle freeview data may be multi-angle freeview video data, and is described further below with particular reference to a method of generating multi-angle freeview video data.

Referring to fig. 37 in combination, the multi-angle freeview video data generating method may include the steps of:

step S371, acquiring a plurality of videos with frame synchronization, the plurality of videos having different shooting angles;

step S372, analyzing each video to obtain image combinations of a plurality of frame moments, wherein the image combinations comprise a plurality of frame images in frame synchronization;

step S373, determining depth data of each frame image in the image combination based on the image combination of each frame time in the plurality of frame times;

step S374, generating a spliced image corresponding to each frame time, wherein the spliced image comprises a first field for storing the pixel data of each frame image in the image combination and a second field for storing the depth data of each frame image in the image combination;

in step S375, video data is generated based on the plurality of stitched images.

In this embodiment, the capturing device may be a camera, and a plurality of videos with frame synchronization may be acquired by a plurality of cameras. Each video includes frame images at a plurality of frame times, and a plurality of image combinations may respectively correspond to different frame times, each image combination including a plurality of frame images that are frame-synchronized.

In a particular implementation, depth data for each frame image in the image combination is determined based on the image combination for each of the plurality of frame time instances.

Along with the foregoing embodiments, if the frame image in the original video has a resolution of 1080P, that is, 1920 × 1080 pixels, and is scanned line by line, the original depth map may also occupy 1920 × 1080 pixels, which is a single channel. The pixel data amount of the original image is: 1920 × 1080 × 8 × 3bit, the data volume of the original depth map is 1920 × 1080 × 8bit, if the number of the cameras is 30, the pixel data volume of the spliced image is 30 × 1080 (1920 × 1080 × 8 × 3+1920 × 1080) bit, which is about 237M, and if the spliced image is not compressed, the system resources are occupied more, and the delay is larger. Particularly, when the bandwidth is small, for example, when the bandwidth is 1Mbps, one uncompressed stitched image needs about 237s to be transmitted, and if the original stitched image is transmitted at the frame rate, it is difficult to play the video in real time.

By regular storage, a higher compression rate can be obtained when the video format is compressed, or the resolution of the original image is reduced, the pixel data with the reduced resolution is used as the pixel data of the image, or one or more of the original depth maps are subjected to down-sampling, or the video compression code rate is increased, or the like, so that the data volume of the spliced image can be reduced.

For example, if the resolution of a frame image in the original video, that is, the acquired videos is 4K, that is, 4096 × 2160 pixel resolution, and the down-sampling is 540P, that is, 960 × 540 pixel resolution, the number of pixels in the stitched image is about one sixteenth of the number before the down-sampling. The amount of data may be made smaller in combination with any one or more of the other ways of reducing the amount of data described above.

In a specific implementation, the video data is generated based on a plurality of the stitched images, the video data may be generated based on all or part of the stitched images, and specifically, the video data may be determined according to a frame rate of a video to be generated and a frame rate of an acquired video, or may also be determined according to a bandwidth for performing communication with a data processing device.

In a specific implementation, the video data is generated based on a plurality of the stitched images, and the video data may be generated by encoding and encapsulating the plurality of stitched images in the order of frame time.

Specifically, the packing Format may be any one of formats such as AVI, QuickTime File Format, MPEG, WMV, Real Video, Flash Video, Matroska, or other packing formats, and the encoding Format may be an encoding Format such as h.261, h.263, h.264, h.265, MPEG, AVS, or other encoding formats.

In a specific implementation, the generating of the multi-angle freeview image data may further include generating an association relation field, and the association relation field may indicate an association relation of the first field with the at least one second field. The first field stores pixel data of one image in a plurality of synchronous images, and the second field stores depth data corresponding to the image, wherein the pixel data and the depth data correspond to the same shooting angle, namely the same visual angle.

In a specific implementation, the generating of the multi-angle freeview video data may further include: based on the synchronized plurality of frame images, parameter data of each frame image is calculated and stored, the parameter data including shooting position and shooting angle data of the frame image.

In a specific implementation, the frame-synchronized plurality of frame images in the image combination at different time instants in the synchronized plurality of videos may correspond to the same parameter data, and the parameter data may be calculated in any set of image combinations.

In a specific implementation, the generating of the multi-angle free view range image data may further include: generating a parameter data storage address field for indicating a storage address of the parameter data. The apparatus performing data processing may acquire the parameter data from the storage address of the parameter data.

In a specific implementation, the generating of the multi-angle free view range image data may further include: and generating a video data storage address field for indicating a storage address of the generated video data.

It is understood that the generated video data, and other indicative data, such as the aforementioned generation association field, parameter data storage address field, video data storage address field, etc., may be included in the multi-angle freeview video data. These indicative data may be stored in a header file to instruct the device performing the data processing to acquire video data, as well as parameter data, etc.

The noun explanations, concrete implementations, and advantageous effects involved in the various embodiments of generating multi-angle freeview video data may refer to other embodiments, and various concrete implementations in the multi-angle freeview interaction method may be implemented in combination with other embodiments.

The following is further described with particular reference to multi-angle freeview data processing.

FIG. 38 is a flowchart of a multi-angle freeview data processing method according to an embodiment of the present invention, which may specifically include the following steps:

step S381, acquiring a data header file;

step S382, determining the definition format of the data file according to the analysis result of the data header file;

step S383, based on the defined format, reading a data combination from a data file, wherein the data combination comprises pixel data and depth data of a plurality of synchronous images, the synchronous images have different visual angles of regions to be watched, and the pixel data and the depth data of each image in the synchronous images have an association relation;

and step 384, reconstructing an image or video of a virtual viewpoint according to the read data combination, wherein the virtual viewpoint is selected from a multi-angle free view range, and the multi-angle free view range is a range supporting virtual viewpoint switching and watching on the region to be watched.

The multi-angle free visual angle data in the embodiment of the invention is data which can support the reconstruction of images or videos of virtual viewpoints in a multi-angle free visual angle range. May include header files as well as data files. The header file may indicate a defined format of the data file so that the apparatus for data processing multi-angle freeview data can parse desired data from the data file according to the header file, as described further below.

Referring to fig. 3 in combination, the device performing data processing may be a device located in the CDN, or a device 33 performing display, or may also be a device performing data processing. The data file and the header file may be stored in the server 32 at the cloud end, or in some application scenarios, the header file may also be stored in a device for data processing, and the header file is obtained locally.

In a specific implementation, the stitched image in the foregoing embodiments may be used as a data file in the embodiments of the present invention. In an application scenario with limited bandwidth, the stitched image may be divided into multiple portions for transmission multiple times. Correspondingly, the header file may include a segmentation mode, and the device for processing data may combine the segmented multiple portions according to the indication in the header file to obtain a stitched image.

In a specific implementation, the defined format may include a storage format, and the header file may include a field indicating the storage format of the data combination, where the field may indicate the storage format using a number, or may directly write to the storage format. Accordingly, the parsing result may be the number of the storage format, or the storage format.

Accordingly, the device performing data processing may determine the storage format according to the parsing result. For example, a specific storage format may be determined based on the number, and the stored supporting data; or the storage format may be directly obtained from a field indicating the storage format of the data combination. In other embodiments, if the storage format is fixed in advance, the fixed storage format may also be recorded in the device for processing data.

In a specific implementation, the storage format may be a picture format or a video format. As mentioned above, the picture format may be any one of BMP, PNG, JPEG, Webp, and other image formats, or may be other image formats; the Video Format may include a packing Format and an encoding Format, the packing Format may be any one of the formats of AVI, QuickTime File Format, MPEG, WMV, Real Video, Flash Video, Matroska, etc., or may be other packing formats, and the encoding Format may be an encoding Format of h.261, h.263, h.264, h.265, MPEG, AVS, etc., or may be other encoding formats.

The storage format may also be a picture format or other formats besides a video format, which is not limited herein. Various storage formats capable of indicating through a header file or enabling a device performing data processing to acquire required data through stored supporting data so as to perform subsequent image or video reconstruction of a virtual viewpoint are within the protection scope of the present invention.

In a specific implementation, when the storage format of the data combination is a video format, the number of the data combinations may be multiple, and each data combination may be a data combination corresponding to a different frame time after decapsulating and decoding a video.

In a specific implementation, the defined format may include a content storage rule of the data combination, and the header file may include a field indicating the content storage rule of the data combination. With the content storage rule, the device performing data processing can determine the correlation between the pixel data and the depth data in each image. The field indicating the content storage rule of the data combination may also be referred to as an association field, and the field may indicate the content storage rule of the data combination with a number or may be written directly to the rule.

Accordingly, the device performing data processing may determine the content storage rule of the data combination according to the analysis result. For example, a specific content storage rule may be determined based on the number, and the stored supporting data; or the content storage rule of the data combination may be directly obtained from a field indicating the content storage rule of the data combination.

In other embodiments, if the content storage rule may be fixed in advance, the content storage rule of the fixed data combination may be recorded in the device for data processing. The following further describes a specific implementation manner of the content storage rule of the data combination and the device for processing data to obtain the data combination in combination with the indication of the header file.

In a specific implementation, the storage rule of the synchronized pixel data of the multiple images and the depth data may specifically be a storage rule of the synchronized pixel data of the multiple images and the depth data in the stitched image.

As mentioned above, the storage format of the data combination may be a picture format or a video format, and accordingly, the data combination may be a picture format or a frame image in a video. In this respect, an image or a frame image obtained by decoding the image or the frame image according to the picture format or the video format may be referred to as a stitched image. The storage rule of the pixel data and the depth data of the synchronized plurality of images may be a storage location in the stitched image, which may be varied. The foregoing description may refer to various storage manners of the synchronized pixel data and depth data of the multiple images in the stitched image, and are not repeated herein.

In a specific implementation, the content storage rule of the data combination may be used to indicate to the device performing data processing a plurality of storage modes of the pixel data and the depth data of the synchronized plurality of images in the stitched image, or may indicate, for each image, a storage mode of the first field and the second field in another storage mode, that is, a storage rule of the pixel data and the depth data of the synchronized plurality of images.

As described above, the header file may include a field indicating a content storage rule of a data combination, and the field may use a content storage rule whose number indicates the data combination, or may directly write the rule in the header file, or may record the content storage rule of the fixed data combination in a device that performs data processing.

The content storage rule may correspond to any one of the storage manners, and the device performing data processing may analyze the storage manner according to the content storage rule, further analyze the data combination, and determine an association relationship between the pixel data and the depth data of each of the plurality of images.

In a specific implementation, the content storage rule may be indicated by the distribution of the image region and the depth map region by the pixel data of each of the synchronized plurality of images and the storage location of the depth data in the stitched image.

The indication may be a pattern number, for example, if the pattern number is 1, the content storage rule may be parsed as: the spliced image is equally divided into an upper area and a lower area, the upper half area is an image area, the lower half area is a depth map area, and the image at a certain position of the upper half area is related to the depth map stored at the corresponding position of the lower half area. The device performing the data processing may further determine a specific storage manner based on the rule. For example, the storage mode shown in fig. 27 or fig. 28, or another storage mode, may be further determined by combining the number of the multiple images to be synchronized, the storage order of the pixel data and the depth data, and the proportional relationship between the depth data and the pixel data occupying the pixel points.

In a specific implementation, the content storage rule may also be indicated by the distribution of the image sub-regions and the depth map sub-regions through the pixel data of each image in the synchronized plurality of images and the storage location of the depth data in the stitched image. The pixel data of each image in the plurality of synchronized images is stored in the image sub-region, and the depth data of each image in the plurality of synchronized images is stored in the depth map sub-region.

For example, the content storage rule may be that the image sub-region and the depth map sub-region are arranged in a column-by-column cross manner, and similar to the previous example, the device performing data processing may further determine a specific storage manner based on the rule. For example, the storage mode shown in fig. 31 or other storage modes may be further determined by combining the number of the synchronized multiple images, the storage order of the pixel data and the depth data, and the proportional relationship between the depth data and the pixel points occupied by the depth data.

As mentioned above, the first field for storing pixel data and the second field for storing depth data may be pixel fields in the stitched image, or may be fields stored in other forms. It will be appreciated by those skilled in the art that the content storage rules may be instructions adapted to the particular storage means, so that the data processing apparatus can learn the corresponding storage means.

In particular implementations, the content storage rules may also include more information to support the way in which the data processing device resolves the data combinations. For example, all or part of the aforementioned image sub-region and the depth map sub-region may be subjected to edge protection, and the manner of edge protection. The content storage rule may include a resolution relationship between the pixel data of the image and the depth data.

The device for processing data may determine a specific storage manner based on the stored information or information obtained from other fields of the header file. For example, the number of the synchronized images may be obtained by a header file, specifically, by a definition format of a data file analyzed in the header file.

After determining the specific storage manner, the device performing data processing may analyze the pixel data of the synchronized multiple images and the depth data corresponding to the synchronized multiple images.

In a specific implementation, the resolution of the pixel data and the depth data may be the same, and the pixel data and the corresponding depth value of each pixel point of each image may be further determined.

As mentioned above, the depth data may also be downsampled data, and a corresponding field may be indicated in a definition format in the header file, and the data processing device may perform corresponding upsampling to determine pixel data and a corresponding depth value of each pixel point of each image.

Correspondingly, rendering and displaying are performed according to the read data combination, and rendering and displaying can be performed after reconstruction of the image is performed according to the pixel data and the corresponding depth value of each pixel point of each image and the position of the virtual viewpoint to be displayed. For a video, the reconstructed image may be a frame image, the frame image is displayed according to the sequence of the frame time, and the video may be played for a user to complete video reconstruction. That is, video reconstruction may include reconstruction of frame images in a video, and the specific implementation of the frame image reconstruction is the same as or similar to the image reconstruction.

In a specific implementation, referring to fig. 39, performing image reconstruction of the virtual viewpoint may include the following steps:

step S391, determining parameter data of each image in the plurality of synchronized images, wherein the parameter data comprises shooting position and shooting angle data of the images;

step S392, determining the parameter data of the virtual viewpoint, wherein the parameter data of the virtual viewpoint comprises a virtual viewing position and a virtual viewing angle;

step S393, determining a plurality of target images among the synchronized plurality of images;

step S394, for each target image, mapping the depth data to the virtual viewpoint according to a relationship between the parameter data of the virtual viewpoint and the parameter data of the image;

step S395, a reconstructed image is generated based on the depth data mapped to the virtual viewpoint and the pixel data of the target image.

Wherein generating the reconstructed image further may comprise: a pixel value for each pixel point of the reconstructed image is determined. Specifically, for each pixel point, if the pixel data mapped to the virtual viewpoint is 0, the hole filling may be performed by using the pixel data around one or more target images. For each pixel point, if the pixel data mapped to the virtual viewpoint are multiple non-zero data, the weight value of each data can be determined, and finally the value of the pixel point is determined.

In an embodiment of the present invention, when generating a reconstructed image, forward mapping may be performed first, and a texture map of a corresponding group in an image combination of the video frame is projected to a three-dimensional euclidean space by using depth information, that is: respectively mapping the depth maps of the corresponding groups to the virtual viewpoint positions at the user interaction time according to a space geometric relationship to form virtual viewpoint position depth maps, then executing reverse mapping, and projecting the three-dimensional space points to an imaging plane of a virtual camera, namely: and copying pixel points in the texture map of the corresponding group into the virtual texture map corresponding to the generated virtual viewpoint position according to the mapped depth map to form the virtual texture map corresponding to the corresponding group. And then, fusing the corresponding virtual texture maps of the corresponding groups to obtain a reconstructed image of the virtual viewpoint position at the user interaction moment. The method is adopted to reconstruct the image, and the sampling precision of the reconstructed image can be improved.

Preprocessing may be performed prior to performing the forward mapping. Specifically, the depth value of the forward mapping and the homography matrix of the texture reverse mapping may be calculated according to the parameter data corresponding to the corresponding group in the image combination of the video frame. In a particular implementation, the depth level may be converted to a depth value using a Z-transform.

In the depth map forward mapping process, a depth map of a corresponding group may be mapped to a depth map of a virtual viewpoint position using a formula, and then a depth value of the corresponding position is copied. In addition, the depth map of the corresponding set may have noise, and some sampled signals may be included in the mapping process, so that the generated depth map of the virtual viewpoint position may have small noise holes. Median filtering can be used to remove noise for this problem.

In a specific implementation, the depth map of the virtual viewpoint position obtained after the forward mapping may be further post-processed according to requirements, so as to further improve the quality of the generated reconstructed image. In an embodiment of the present invention, before performing reverse mapping, a virtual viewpoint position depth map obtained by forward mapping is subjected to foreground and background occlusion relation processing, so that the generated depth map can more truly reflect the position relation of an object in a scene seen by the virtual viewpoint position.

For the reverse mapping, specifically, the positions of the corresponding texture maps in the virtual texture map may be calculated according to the depth map of the virtual viewpoint positions obtained by the forward mapping, and then the texture values of the corresponding pixel positions are copied, wherein the hole in the depth map may be marked as 0 or as having no texture value in the virtual texture map. Hole dilation can be done for the regions marked as holes to avoid synthesis artifacts.

And then, fusing the generated virtual texture maps of the corresponding groups to obtain a reconstructed image of the virtual viewpoint position at the moment of user interaction. In the specific implementation, can also be through a variety of ways to fuse, the following by two embodiments for example.

In an embodiment of the present invention, the weighting process is performed first, and then the hole filling is performed. Specifically, the method comprises the following steps: and carrying out weighting processing on pixels at corresponding positions in the virtual texture map corresponding to each corresponding group in the image combination of the video frame at the user interaction moment to obtain pixel values at corresponding positions in the reconstructed image at the virtual viewpoint position at the user interaction moment. And then, for the position with the pixel value being zero in the reconstructed image at the virtual viewpoint position at the user interaction moment, filling up the hole by using the pixels around the pixel in the reconstructed image to obtain the reconstructed image at the virtual viewpoint position at the user interaction moment.

In another embodiment of the present invention, the hole filling is performed first, and then the weighting process is performed. Specifically, the method comprises the following steps: and for the position where the pixel value in the virtual texture map corresponding to each corresponding group in the image combination of the video frame at the user interaction moment is zero, respectively filling the hole by using the surrounding pixel values, and then weighting the pixel value at the corresponding position in the virtual texture map corresponding to each corresponding group after filling the hole to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment.

The weighting processing in the above embodiment may specifically adopt a weighted average manner, and may also adopt different weighting coefficients according to the parameter data, or the positional relationship between the photographing apparatus and the virtual viewpoint. In an embodiment of the present invention, weighting is performed according to the position of the virtual viewpoint and the reciprocal of the position distance of each acquisition device, that is: the closer the acquisition device is to the virtual viewpoint position, the greater the weight.

In specific implementation, a preset hole filling algorithm may be adopted to fill the hole according to needs, which is not described herein again.

In a specific implementation, the shooting position and shooting angle data of the image may be referred to as external parameter data, and the parameter data may also include internal parameter data, that is, attribute data of a shooting device of the image. Distortion parameters and the like can be embodied through internal parameter data, and the mapping relation is more accurately determined by combining the internal parameters.

In a specific implementation, the parameter data may be obtained from a data file, and specifically, the parameter data may be obtained from a corresponding storage space according to a storage address of the parameter data in the header file.

In specific implementation, the target image may be determined according to the 6-degree-of-freedom coordinates of the virtual viewpoint and the viewpoint of the virtual viewer at the image capturing position, that is, the 6-degree-of-freedom coordinates of the image viewpoint, and a plurality of images with the closer distance between the image capturing viewpoint and the virtual coordinate viewpoint are selected.

In a specific implementation, all of the plurality of images synchronized may be used as the target image. More images are selected as target images, so that the quality of the reconstructed images is higher, and the selection of the target images can be determined according to requirements without limitation.

As described above, the depth data may be a set of depth values corresponding one-to-one to pixels of an image, depth data mapped to the virtual viewpoint, and data corresponding one-to-one to pixels of the image. And generating a reconstructed image, specifically, for each pixel position, acquiring data of a corresponding position from pixel data of the target image according to the depth data, and generating the reconstructed image. When data is acquired from a plurality of target images at one pixel position, weighting calculation can be performed on the plurality of data to improve the quality of the reconstructed image.

It can be understood by those skilled in the art that the process of performing image reconstruction of a virtual viewpoint based on multi-angle freeview image data in the embodiment of the present invention may be various, and is not limited herein.

The noun explanation, specific implementation and beneficial effects involved in the multi-angle freeview data processing method can be seen in other embodiments, and various specific implementations in the multi-angle freeview interaction method can be implemented in combination with other embodiments.

The multi-angle freeview data described above may be multi-angle freeview image data, and is described further below with particular reference to multi-angle freeview image data processing.

FIG. 40 is a flowchart of a multi-angle freeview image data processing method according to an embodiment of the present invention, which may specifically include the following steps:

step S401, acquiring a data combination stored in a picture format, wherein the data combination comprises pixel data and depth data of a plurality of synchronous images, and the plurality of synchronous images have different visual angles of regions to be watched;

step S402, based on the data combination, carrying out image reconstruction of a virtual viewpoint, wherein the virtual viewpoint is selected from a multi-angle free visual angle range, and the multi-angle free visual angle range is a range supporting switching and watching of the virtual viewpoint on the region to be watched.

The data combination in the picture format may be obtained by parsing the header file and reading the header file from the data file in the implementation manner in the foregoing embodiment. The method of reconstructing an image of a virtual viewpoint may be as described above.

In a specific implementation, acquiring a data combination stored in a picture format and performing image reconstruction of a virtual viewpoint may be performed by an edge computing node. As previously described, the edge computing node may be a node that communicates in close proximity with a display device that displays the reconstructed image, maintaining a high bandwidth low latency connection, e.g., through wifi, 5g, etc. In particular, the edge computing node may be a base station, a mobile device, a vehicle-mounted device, a home router with sufficient computing power. Referring collectively to fig. 3, the edge compute node may be a device located in the CDN.

Correspondingly, before the image of the virtual viewpoint is reconstructed, the parameter data of the virtual viewpoint can be received, and after the image of the virtual viewpoint is reconstructed, the reconstructed image can be sent to a display device.

The image is reconstructed through the edge calculation node, the requirement on display equipment can be reduced, equipment with low calculation capacity can receive user instructions, and multi-angle free visual angle experience is provided for users.

For example, in a 5G scenario, a communication speed between a User Equipment (UE) and a base station, especially a base station of a current serving cell, is fast, and a User may determine parameter data of a virtual viewpoint by indicating the UE, and calculate to obtain a reconstructed image by using the base station of the current serving cell as an edge calculation node. The equipment for displaying can receive the reconstructed image and provide multi-angle self-contained visual angle service for users.

It is to be understood that in the specific implementation, the device for performing image reconstruction and the device for performing display may be the same device. The device may receive a user indication and determine a virtual viewpoint based on the user's real-time. After the image of the virtual viewpoint is reconstructed, the reconstructed image may be displayed.

In particular implementations, implementations of receiving a user indication and generating a virtual viewpoint according to the user indication may be various, the virtual viewpoint being a viewpoint within a free viewing angle range. Therefore, the embodiment of the invention can support the user to freely switch the virtual viewpoint in a multi-angle free visual angle range.

It is to be understood that the noun explanations, specific implementations and advantageous effects involved in the multi-angle freeview picture data processing method may refer to other embodiments, and various specific implementations in the multi-angle freeview interaction method may be implemented in combination with the other embodiments.

The multi-angle freeview data described above may also be multi-angle freeview video data, and is described further below with particular reference to multi-angle freeview video data processing.

FIG. 41 is a flowchart of a multi-angle freeview video data processing method according to an embodiment of the present invention, which may include the following steps:

step S411, analyzing the acquired video data to obtain data combinations at different frame times, wherein the data combinations comprise pixel data and depth data of a plurality of synchronous images, and the plurality of synchronous images have different visual angles of regions to be watched;

step S412, for each frame time, based on the data combination, image reconstruction of a virtual viewpoint is carried out, the virtual viewpoint is selected from a multi-angle free visual angle range, the multi-angle free visual angle range is a range supporting switching and watching of the virtual viewpoint of an area to be watched, and the reconstructed image is used for video playing.

In a specific implementation, the format of the obtained video data may be various, the obtained video data may be analyzed, decapsulation and decoding may be performed based on the video format to obtain frame images at different frame times, and the data combination may be obtained from the frame images, that is, in the frame images, the pixel data and the depth data of a plurality of images that are synchronized may be stored. From this perspective, the frame image may also be referred to as a stitched image.

The video data may be obtained from the data file according to the header file, and the specific implementation manner of obtaining the data combination may be as described above. The detailed implementation of image reconstruction of the virtual viewpoint can also be referred to the foregoing. After the reconstructed image at each frame time is obtained, video playing can be performed according to the sequence of the frame times.

In a specific implementation, acquiring data combinations at different frame times and performing image reconstruction of a virtual viewpoint may be performed by an edge computing node.

Correspondingly, before the image of the virtual viewpoint is reconstructed, the parameter data of the virtual viewpoint can be received, and after the image of the virtual viewpoint is reconstructed, the reconstructed image of each frame time can be sent to a display device.

It is to be understood that in the specific implementation, the device for performing image reconstruction and the device for performing display may be the same device.

It is to be understood that the noun explanations, specific implementations and advantageous effects involved in the multi-angle freeview video data processing method can be found in other embodiments, and various specific implementations in the multi-angle freeview interaction method can be implemented in combination with the other embodiments.

The following is further described with particular reference to multi-angle freeview interaction methods.

FIG. 42 is a flowchart of a multi-angle free-view interaction method in an embodiment of the present invention, which may specifically include the following steps:

step S421, receiving user instruction;

step S422, determining a virtual viewpoint according to the user instruction, wherein the virtual viewpoint is selected from a multi-angle free view range, and the multi-angle free view range is a range supporting switching and watching of the virtual viewpoint in an area to be watched;

step S423, displaying display content for viewing the to-be-viewed region based on the virtual viewpoint, where the display content is generated based on a data combination and the virtual viewpoint, the data combination includes pixel data and depth data of a plurality of synchronized images, an association relationship exists between the image data and the depth data of each image, and the plurality of synchronized images have different viewing angles with respect to the to-be-viewed region.

In an embodiment of the present invention, the virtual viewpoint may be a viewpoint within a multi-angle free view range, and the specific multi-angle view range may be associated with a data combination.

In a particular implementation, a user indication may be received, and a virtual viewpoint may be determined within a free viewing angle range based on the user indication. The user indication and the manner in which the virtual viewpoint is determined according to the user indication may be varied, as further exemplified below.

In a specific implementation, determining a virtual viewpoint according to the user indication may include: and determining a basic viewpoint for watching the area to be watched, wherein the basic viewpoint comprises the position and the visual angle of the basic viewpoint. At least one of the position and the angle of view of the virtual viewpoint may be changed based on the base viewpoint, and the user indication may be associated with a change manner of the change. And determining a virtual viewpoint by taking the basic viewpoint as a reference according to the user instruction, the basic viewpoint and the association relation.

Wherein, the basic viewpoint can comprise the region to be watched of the userThe position and angle of view. Further, the base viewpoint may be a position and a view angle corresponding to a picture displayed by the device for displaying when receiving the user instruction, for example, referring to fig. 4, if the device displays an image as shown in fig. 4 when receiving the user instruction, and referring to fig. 2 in combination, the position of the base viewpoint may be a VP as shown in fig. 2₁. It is understood that the position and the viewing angle of the base viewpoint may be preset, or the base viewpoint may be a virtual viewpoint determined previously according to the user instruction, and the base viewpoint may also be expressed by coordinates of 6 DoF. The association relationship between the user indication and the change mode of the virtual viewpoint based on the base viewpoint may be a preset association relationship.

In specific implementations, the manner of receiving the user instruction may be various, and each of the manners will be described below.

In one particular implementation, a path of the point of contact on the touch-sensitive screen may be detected, and the path may include a start point, an end point, and a direction of movement of the point of contact, with the path as the user indication.

Accordingly, the association relationship between the path and the change manner of the virtual viewpoint based on the base viewpoint may be various.

For example, the paths may be 2, and if at least one contact point in the 2 paths moves in a direction away from each other, the position of the virtual viewpoint moves in a direction close to the region to be viewed.

Referring to fig. 43 and 11 in combination, vector F in fig. 43₁

Sum vector F

₂2 paths can be respectively illustrated, under which if the base viewpoint is B in FIG. 11₂Then the virtual viewpoint may be B₃. That is, the region to be viewed is enlarged for the user.

It can be understood that fig. 43 is only an illustration, and in a specific application scenario, the starting point, the ending point, and the direction of the 2 paths may be various, and it is sufficient that at least one contact point in the 2 paths moves in a direction away from the other. One of the 2 paths may be a path of an unmoved contact point, and include only the starting point.

In an embodiment of the present invention, the display image before enlargement may be as shown in fig. 4, and the enlarged image may be as shown in fig. 44.

In a specific implementation, the center point of the magnification may be determined according to the position of the contact point, or a preset point may be used as the center point, and the image may be magnified by the center point. The magnification of the magnification, that is, the amplitude of the movement of the virtual viewpoint, may be associated with the amplitude of the approach of the contact points in the 2 paths, and the association relationship may be preset.

In a specific implementation, if at least one contact point in the 2 paths moves in a direction close to the other, the position of the virtual viewpoint may move in a direction away from the region to be viewed.

Referring to fig. 45 and 11 in combination, vector F in fig. 45₃

Sum vector F

₄2 paths can be respectively illustrated, under which if the base viewpoint is B in FIG. 11₃Then the virtual viewpoint may be B₂. That is, for the user, the region to be viewed is narrowed.

It can be understood that fig. 45 is only an illustration, and in a specific application scenario, the starting point, the ending point, and the direction of the 2 paths may be various, and it is sufficient that at least one contact point in the 2 paths moves in a direction approaching to another. One of the 2 paths may be a path of an unmoved contact point, and include only the starting point.

In an embodiment of the present invention, the display image before being reduced may be as shown in fig. 44, and the image after being reduced may be as shown in fig. 4.

In a specific implementation, the reduced center point may be determined according to the position of the contact point, or a preset point may be used as the center point, and the image is reduced by using the center point. The zoom-out magnification, that is, the amplitude of the virtual viewpoint movement, may be associated with the amplitude of the approach of the contact points in the 2 paths, and the association relationship may be preset.

In a specific implementation, the association relationship between the path and the change manner of the virtual viewpoint based on the base viewpoint may also include: the number of the paths is 1, the moving distance of the contact point is associated with the change amplitude of the visual angle, and the moving direction of the contact point is associated with the change direction of the visual angle.

For example, referring to fig. 5 and 13 in combination, if the received user indication is 1 path, the vector D in fig. 5 is used₅₂It is shown that if the base viewpoint is point C in FIG. 13₂The virtual viewpoint may be point C₁。

In an embodiment of the present invention, the display before the switching of the viewing angle may be as shown in fig. 5, and the display after the switching of the viewing angle may be as shown in fig. 6.

If the received user indication is 1 path, for example, vector D in FIG. 8₈₁Schematically, if the base viewpoint is point C in FIG. 13₂The virtual viewpoint may be point C₃。

In an embodiment of the present invention, the display before the switching of the viewing angle may be as shown in fig. 8, and the display after the switching of the viewing angle may be as shown in fig. 9.

It will be appreciated by those skilled in the art that the various embodiments described above are merely illustrative in nature and are not limiting of the association between the user indication and the virtual viewpoint.

In particular implementations, the user indication may include voice control instructions, which may be in a natural language format, such as "zoom in," "zoom out," "view left," and the like. Correspondingly, the virtual viewpoint is determined according to the user instruction, voice recognition can be carried out on the user instruction, and the virtual viewpoint is determined by taking the basic viewpoint as a reference according to a preset incidence relation between the instruction and the virtual viewpoint based on the change mode of the basic viewpoint.

In particular implementations, the user indication may also include a selection of a preset viewpoint from which to view the region to be viewed. The preset viewpoint may be various according to a region to be viewed, and may include a position and a viewing angle. For example, if the viewing area is a basketball game area, the predetermined viewpoint may be located under the backboard so that the user has a viewing angle of the audience as the side of the basketball game or a coach viewing angle. Accordingly, a preset viewpoint may be used as a virtual viewpoint.

In particular implementations, the user indication may also include a selection of a particular object in the area to be viewed. The particular object may be determined by image recognition techniques. For example, in a basketball game, each player in the game scene may be identified according to face recognition technology, the user may be provided with options of the relevant player, and according to the selection of a specific player by the user, a virtual viewpoint may be determined, and the user may be provided with a picture under the virtual viewpoint.

In a specific implementation, the user indication may further include at least one of a position and a viewing angle of the virtual viewpoint, for example, the 6DoF coordinates of the virtual viewpoint may be directly input.

In specific implementation, the manner of receiving the user indication may be various, for example, various manners such as detecting a signal of a contact point on the touch-sensitive screen, detecting a signal of an acoustoelectric sensor, detecting a signal of a sensor capable of representing the posture of the device, such as a gyroscope, a gravity sensor, and the like; the corresponding user indication may be a path of the touch point on the touch sensitive screen, a voice control instruction, a gesture operation, and the like. The content indicated by the user may be various, for example, various manners such as indicating that the virtual viewpoint indicates a preset viewpoint based on a change manner of the base viewpoint, indicating a specific viewing object, or directly indicating at least one of a position and a viewing angle of the virtual viewpoint. The specific implementation of determining the virtual viewpoint according to the user instruction may also be various.

Specifically, in combination with the manner of receiving the user instruction, the detection of the various sensing devices may be performed at preset time intervals, and the time intervals correspond to the frequency of detection, for example, the detection may be performed at a frequency of 25 times/second, so as to obtain the user instruction.

It is to be understood that the manner of receiving the user indication, the content of the user indication, and the manner of determining the virtual viewpoint according to the user indication may be combined or replaced, and are not limited herein.

In specific implementation, after receiving the trigger instruction, the user instruction may be received in response to the trigger instruction, so that misoperation of the user may be avoided. The triggering instruction may be a preset button clicked in the screen area, or a voice control signal is used as the triggering instruction, or the foregoing manner that the user instruction can be taken, or other manners.

In particular implementations, the user indication may be received during the playing of a video or presentation of an image. In the process of displaying the image, a user indication is received, and the data combination can be the data combination corresponding to the image. In the process of playing the video, a user indication is received, and the data combination can be a data combination corresponding to a frame image in the video. The display content for viewing the region to be viewed based on the virtual viewpoint may be an image reconstructed based on the virtual viewpoint.

In the process of playing the video, after receiving an instruction from a user to generate a virtual viewpoint, the display content for viewing the to-be-viewed region based on the virtual viewpoint may be a reconstructed multi-frame image generated based on the virtual viewpoint. That is, in the process of switching the virtual viewpoint, the video may be continuously played, the video may be played in the original virtual viewpoint before the virtual viewpoint is determined again according to the user instruction, and after the virtual viewpoint is determined again, the reconstructed frame image based on the virtual viewpoint may be generated to be played in the position and the viewing angle of the switched virtual viewpoint.

Further, in the process of playing the video, after receiving an instruction from a user to generate a virtual viewpoint, the display content to be viewed in the region to be viewed based on the virtual viewpoint may be a frame image of a plurality of reconstructed frames generated based on the virtual viewpoint. That is, in the process of switching the virtual viewpoint, the video may be continuously played, before the virtual viewpoint is determined, the video may be played in the original configuration, and after the virtual viewpoint is determined, a reconstructed frame image based on the virtual viewpoint may be generated to be played at the position and the viewing angle of the switched virtual viewpoint. Or, the video playing can be paused, and the virtual viewpoint can be switched.

With combined reference to fig. 4 and fig. 6, during the image presentation, a user instruction may be received, a virtual viewpoint may be generated according to the user instruction to switch viewing, and the display content may be changed from the image shown in fig. 4 to the image shown in fig. 6.

When the video is played to the frame image as shown in fig. 4, the virtual viewpoint is switched to show the frame image as shown in fig. 6. Before receiving a new user instruction, the frame image based on the virtual viewpoint may be continuously shown for video playing, for example, when a new user instruction is received while playing to the frame image shown in fig. 46, the virtual viewpoint may be switched to continue video playing according to the new user instruction.

It is to be understood that the noun explanations, specific implementation manners, and advantageous effects involved in the multi-angle freeview interaction method may refer to other embodiments, and various specific implementations of the multi-angle freeview interaction method may be implemented in combination with the other embodiments.

The following description is further directed to another multi-angle freeview interaction method.

FIG. 47 is a flowchart illustrating another multi-angle free-view interaction method according to an embodiment of the present invention, which may specifically include the following steps:

step S471, receiving user instruction in the video playing process

Step S472, determining a virtual viewpoint according to the user instruction, wherein the virtual viewpoint is selected from a multi-angle free view range, and the multi-angle free view range is a range supporting viewpoint switching and watching of an area to be watched;

step S473, showing an image for viewing the to-be-viewed region based on the virtual viewpoint at a specified frame time, where the specified frame time is determined according to the user instruction, the image is generated based on a data combination and the virtual viewpoint, the data combination includes pixel data and depth data of a plurality of images synchronized at the specified frame time, there is a correlation between the image data and the depth data of each image, and the synchronized images have different viewing angles for the to-be-viewed region.

In a specific implementation, a specified frame time may be determined according to a user instruction, and an image that is viewed on the to-be-viewed region based on the virtual viewpoint at the specified frame time is displayed. The image viewed from the region to be viewed based on the virtual viewpoint may be an image obtained by reconstructing an image based on a data combination and the virtual viewpoint, and the reconstruction manner may be various, which may be as described above. The manner of determining the designated frame time according to the user instruction may be various, and is further described below.

In a specific implementation manner, the frame time indicated by the user is received as the specified frame time, and the image for viewing the to-be-viewed area based on the virtual viewpoint is presented in response to the user indication. From the user's perspective, the video being played is paused, and either one or more of the viewpoint position or the viewpoint angle is switched for the picture at the time of pause.

For example, with reference to fig. 4 and fig. 6, if the frame image corresponding to the frame time indicated by the user is shown in fig. 4 on the device for displaying, and the frame time indicated by the user is taken as the specified frame time, the user may switch the virtual viewpoint. For example, the virtual viewpoint may be switched to view an image as shown in fig. 6.

In another specific implementation manner, the specified frame time may be determined according to time indication information included in a user indication. In combination with the foregoing embodiments, the user instruction may be various, and the manner of receiving the user instruction may also be various. The time indication information may be information generated by detecting clicking of the play progress bar by the user through the touch sensitive screen, or may be time information input by the user. Or may be information generated by a user operating in a voice control manner, and the like, which is not limited herein.

The user can determine that the interested picture is subjected to virtual viewpoint switching within the range of multi-angle free visual angles by selecting the frame time, can select the played picture, or can select the subsequent picture which is not played in a recorded and played scene, and the user experience is better.

As illustrated with reference to fig. 4, 6, and 46, when the user views a video, the user can see a shot screen as illustrated in fig. 6 and 46, and the user can instruct the user to select the screen as illustrated in fig. 6 to change the angle of view, for example, the user can switch the viewpoint to view the screen as illustrated in fig. 4.

It can be understood that receiving the user instruction may be receiving multiple times, that is, when the user switches the viewpoint of the region to be viewed at the time of the specified frame, the user may switch the viewpoint multiple times to view, and the number of different virtual viewpoints may be multiple.

In a specific implementation, the user may further select to continue playing the video after performing one or more virtual viewpoint switches on the to-be-viewed area at the specified frame time. Correspondingly, the multi-angle self-contained visual angle interaction method in the embodiment of the invention can also comprise the step of receiving a continuous playing instruction, and responding to the continuous playing instruction to continuously play the video from the specified frame time.

Further, the video that is continuously played may be played based on a virtual viewpoint that is determined last before the user instructs to continue playing, or may be played based on a virtual viewpoint that is watched in the area to be watched before the user switches viewpoints.

In a specific implementation, when the specified frame time does not coincide with the time when the user instruction is received, the playing may be continued from the specified frame time, or the playing may be continued from the time when the user instruction is received, which is not limited herein.

It can be understood that, the continuous playing of the video may be to generate a reconstructed image corresponding to the virtual viewpoint for playing each frame of image of the video to be viewed according to the method in the embodiment of the present invention.

Therefore, in the embodiment of the invention, the user can watch the video in more ways, and the user experience is better.

It is understood that the noun explanation, specific implementation and beneficial effects involved in the multi-angle free-view interaction method can be seen in other embodiments, and various specific implementations in the multi-angle free-view interaction method can be implemented in combination with other embodiments.

The embodiment of the present invention further provides a multi-angle free-view interaction apparatus, referring to fig. 48, which may specifically include:

an indication receiving unit 481 adapted to receive a user indication during video playing;

a virtual viewpoint determining unit 482, adapted to determine a virtual viewpoint according to the user instruction, where the virtual viewpoint is selected from a multi-angle free view range, and the multi-angle free view range is a range supporting switching viewing of viewpoints for an area to be viewed;

a display unit 483 adapted to display an image for viewing the to-be-viewed area based on the virtual viewpoint at a specified frame time determined according to the user instruction, the image being generated based on a data combination including pixel data and depth data of a plurality of images synchronized at the specified frame time, an association relationship exists between the image data and the depth data of each image, and the plurality of synchronized images are different from a viewing angle of the to-be-viewed area.

In a specific implementation, the multi-angle freeview interaction apparatus may further include:

a first play instruction receiving unit 484 adapted to receive a play continuation instruction;

a first playing unit 485 adapted to continue playing the video watched from the region to be watched from the virtual viewpoint from the specified frame time in response to the continue playing instruction.

In a specific implementation of the present invention, the multi-angle freeview interaction apparatus may further include:

a second play instruction receiving unit 486 adapted to receive a continue play instruction;

the second playing unit 487 is adapted to respond to the continuous playing instruction, so as to receive video playing from a virtual viewpoint that watches an area to be watched before the viewpoint instruction.

Referring to fig. 49, in a specific implementation, the virtual viewpoint determining unit 482 may include:

a basic viewpoint determining subunit 491 adapted to determine a basic viewpoint for viewing the region to be viewed, where the basic viewpoint includes a basic viewpoint position and a basic viewpoint perspective;

the virtual viewpoint determining subunit 492 is adapted to determine the virtual viewpoint based on the base viewpoint according to the user instruction and the association relationship between the user instruction and the change manner of the virtual viewpoint based on the base viewpoint.

With continued reference to fig. 48, in a specific implementation, the indication receiving unit 481 is further adapted to detect a path of the touch point on the touch sensitive screen, the path including at least one of a start point, an end point and a moving direction of the touch point, with the path as the user indication.

In a specific implementation, the user indication may be a user selection of a particular object in the area to be viewed.

a specific object determination unit 488 adapted to determine a specific object of the region to be viewed through an image recognition technique before receiving the user instruction;

an option providing unit 489 adapted to provide selection options of the specific object.

The noun explanation, principle, specific implementation and beneficial effects related to the multi-angle free-viewing angle interaction device in the embodiment of the present invention can be referred to the multi-angle free-viewing angle interaction method in the embodiment of the present invention, and are not described herein in detail.

The computer readable storage medium may be an optical disc, a mechanical hard disk, a solid state hard disk, etc.

The terminal can be various appropriate terminals such as a smart phone and a tablet computer.

An embodiment of the present invention further provides a mobile device, including a communication component, a processor, and a display component: the communication component is used for receiving multi-angle free view data, and the multi-angle free view data comprises a data combination; the processor is used for rendering based on the virtual viewpoint according to the multi-angle free view data to obtain an image for watching the to-be-watched area based on the virtual viewpoint; the display component is used for displaying the image which is viewed on the region to be viewed based on the virtual viewpoint. The mobile device may be a smartphone, tablet computer, or any other suitable device.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A multi-angle free-view interaction method, comprising:

receiving a user instruction in the video playing process;

determining a virtual viewpoint according to the user instruction, wherein the virtual viewpoint is selected from a multi-angle free visual angle range, and the multi-angle free visual angle range is a range supporting the switching and watching of viewpoints of an area to be watched;

displaying an image which is watched on the region to be watched based on the virtual viewpoint at a specified frame time, wherein the specified frame time is determined according to the user instruction, the image is generated based on a data combination and the virtual viewpoint, the data combination comprises pixel data and depth data of a plurality of images which are synchronized at the specified frame time, an association relation exists between the image data and the depth data of each image, and the plurality of synchronized images have different visual angles to the region to be watched.

2. The multi-angle freeview interaction method recited in claim 1, wherein a frame time at which the user indication is received is used as the designated frame time.

3. The multi-angle freeview interaction method as claimed in claim 1, wherein the specified frame time is determined according to time indication information corresponding to a user indication.

4. The multi-angle freeview interaction method recited in claim 1, further comprising: receiving a continuous playing instruction;

and responding to the continuous playing instruction, and continuously playing the video watched in the region to be watched from the virtual viewpoint from the specified frame time.

5. The multi-angle freeview interaction method recited in claim 1, further comprising:

receiving a continuous playing instruction;

and responding to the continuous playing instruction to receive the virtual viewpoint for watching the to-be-watched area before the viewpoint instruction, and playing the video.

6. The multi-angle freeview interaction method of claim 1, wherein determining a virtual viewpoint according to the user indication comprises:

determining a basic viewpoint for watching a region to be watched, wherein the basic viewpoint comprises a basic viewpoint position and a basic viewpoint visual angle;

and determining the virtual viewpoint by taking the basic viewpoint as a reference according to the user instruction and the incidence relation between the user instruction and the change mode of the virtual viewpoint based on the basic viewpoint.

7. The multi-angle freeview interaction method of claim 6, wherein receiving the user indication comprises: detecting a path of the touch point on the touch sensitive screen, wherein the path comprises at least one of a starting point, an end point and a moving direction of the touch point, and the path is taken as the user indication.

8. The multi-angle multiview interaction method of claim 7, wherein the association relationship between the path and the virtual viewpoint based on the change mode of the base viewpoint comprises: the number of the paths is 2, and if at least one contact point in the 2 paths moves in a direction away from the opposite side, the position of the virtual viewpoint moves in a direction close to the region to be viewed.

9. The multi-angle multiview interaction method of claim 7, wherein the association relationship between the path and the virtual viewpoint based on the change mode of the base viewpoint comprises:

the number of the paths is 2, and if at least one contact point in the 2 paths moves in a direction close to the opposite side, the position of the virtual viewpoint moves in a direction far away from the region to be watched.

10. The multi-angle multiview interaction method of claim 7, wherein the association relationship between the path and the virtual viewpoint based on the change pattern of the base viewpoint comprises:

the number of the paths is 1, the moving distance of the contact point is associated with the change amplitude of the visual angle, and the moving direction of the contact point is associated with the change direction of the visual angle.

11. The method of claim 1, wherein the user indication comprises a voice control command.

12. The multi-angle freeview interaction method of claim 1, wherein the user indication includes a selection of a preset viewpoint for viewing an area to be viewed.

13. The multi-angle multiview interaction method of claim 12, wherein the preset view is taken as the virtual view.

14. The multi-angle freeview interaction method recited in claim 1, wherein the user indication comprises: the user selects a particular object in the area to be viewed.

15. The multi-angle freeview interaction method of claim 14, wherein receiving the user indication further comprises:

determining a specific object of the region to be watched through an image recognition technology;

providing a selection option for the particular object.

16. The multi-angle freeview interaction method recited in claim 1, wherein the user indication comprises at least one of a position of the virtual viewpoint and a view.

17. The method of claim 1, wherein the user indication comprises a voice control command.

18. The multi-angle freeview interaction method recited in claim 1, wherein the user indication comprises: attitude change information from at least one of a gyroscope or a gravity sensor.

19. The multi-angle freeview interaction method recited in claim 1, wherein a trigger instruction is received, and wherein a user indication is received in response to the trigger instruction.

20. A multi-angle freeview interaction apparatus, comprising:

the instruction receiving unit is suitable for receiving user instructions in the video playing process;

a virtual viewpoint determining unit, adapted to determine a virtual viewpoint according to the user instruction, where the virtual viewpoint is selected from a multi-angle free view range, and the multi-angle free view range is a range supporting switching viewing of viewpoints of an area to be viewed;

the display unit is suitable for displaying an image for watching the to-be-watched area based on the virtual viewpoint at a specified frame time, the specified frame time is determined according to the user instruction, the image is generated based on a data combination and the virtual viewpoint, the data combination comprises pixel data and depth data of a plurality of images which are synchronized at the specified frame time, an association relationship exists between the image data and the depth data of each image, and the synchronized images are different in visual angle of the to-be-watched area.

21. A computer readable storage medium having stored thereon computer instructions, wherein the computer instructions are executable to perform the steps of the multi-angle freeview interaction method as claimed in any one of claims 1 to 19.

22. A terminal comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor executes the computer instructions to perform the steps of the multi-angle freeview interaction method of any one of claims 1 to 19.

23. A mobile device comprising a communication component, a processor, and a display component, characterized in that: the communication component for receiving multi-angle freeview data comprising a combination of data as recited in the multi-angle freeview interaction method of any one of claims 1 to 19;

the processor, configured to perform rendering based on the virtual viewpoint in the multi-angle free-view interaction method according to any one of claims 1 to 19 according to the multi-angle free-view data, so as to obtain an image, which is viewed on the to-be-viewed area based on the virtual viewpoint in the multi-angle free-view interaction method according to any one of claims 1 to 19;

the display component is used for displaying the image which is viewed on the region to be viewed based on the virtual viewpoint.