CN112738495B

CN112738495B - Virtual viewpoint image generation method, system, electronic device and storage medium

Info

Publication number: CN112738495B
Application number: CN201911032857.0A
Authority: CN
Inventors: 盛骁杰
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-10-28
Filing date: 2019-10-28
Publication date: 2023-03-28
Anticipated expiration: 2039-10-28
Also published as: WO2021083174A1; CN112738495A

Abstract

A virtual viewpoint image generation method, a system, an electronic device and a storage medium, the method includes: acquiring an image combination of a multi-angle free visual angle, parameter data of the image combination and preset virtual viewpoint path data; selecting a texture map and a depth map of a corresponding group of each virtual viewpoint in the virtual viewpoint path from the image combination according to the preset virtual viewpoint path data and the parameter data of the image combination; and inputting the texture map and the depth map of the corresponding group of each virtual viewpoint into a graphic processor, and respectively combining and rendering the pixel points in the texture map and the depth map of the corresponding group in the selected image combination by a plurality of threads by taking the pixel points as processing units aiming at each virtual viewpoint in a virtual viewpoint path to obtain the image corresponding to the virtual viewpoint. By the scheme, the data processing speed can be increased, and the requirements of low-delay playing and real-time interaction of the multi-angle free visual angle video are met.

Description

Virtual viewpoint image generation method, system, electronic device and storage medium

Technical Field

The embodiment of the invention relates to the technical field of image processing, in particular to a virtual viewpoint image generation method, a virtual viewpoint image generation system, electronic equipment and a storage medium.

Background

With the continuous development of interconnection technology, more and more video platforms continuously improve the viewing experience of users by providing higher definition or watching videos with higher fluency.

However, for a video with a strong experience feeling in the field, for example, a video of a sports game, a user often can only watch the game through one viewpoint position in the watching process, and cannot freely switch the viewpoint position to watch a game picture or a game process at different viewpoint positions, and thus cannot experience the feeling of watching the game while moving the viewpoint in the field.

The 6degree of freedom (6 degree of freedom,6 dof) technology is a technology for providing a high degree of freedom viewing experience, and a user can adjust the viewing angle of a video to be viewed by an interactive means during viewing, so that the user can view the video from a free viewpoint which the user wants to view, and the viewing experience is greatly improved.

However, the existing 6DoF video watching scheme has very large demands on storage capacity and computation, so a large number of servers need to be arranged on site for processing, the implementation cost is too high, the limiting conditions are too many, and data cannot be processed quickly, so that the requirements of low-delay playing and real-time interaction of multi-angle free-view video cannot be met, and the popularization is not facilitated.

The inventor researches and discovers that the current virtual viewpoint image generation method is low in speed, and the requirements of low-delay playing and real-time interaction of multi-angle free visual angle videos are severely restricted.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method, a system, an electronic device, and a storage medium for generating a virtual viewpoint image, so as to increase the virtual viewpoint image generation speed, and meet the requirements of low-delay playing and real-time interaction of a multi-angle free-view video.

The embodiment of the invention provides a virtual viewpoint image generation method, which comprises the following steps:

acquiring an image combination of a multi-angle free visual angle, parameter data of the image combination and preset virtual viewpoint path data, wherein the image combination comprises a plurality of angle-synchronous texture maps and depth maps with corresponding relations;

selecting a texture map and a depth map of a corresponding group of each virtual viewpoint in the virtual viewpoint path from the image combination according to the preset virtual viewpoint path data and the parameter data of the image combination;

and inputting the texture map and the depth map of the corresponding group of each virtual viewpoint into a graphic processor, and respectively combining and rendering the pixel points in the texture map and the depth map of the corresponding group in the selected image combination by a plurality of threads by taking the pixel points as processing units aiming at each virtual viewpoint in a virtual viewpoint path to obtain the image corresponding to the virtual viewpoint.

Optionally, the combining and rendering, by using a plurality of threads and with respect to each virtual viewpoint in the virtual viewpoint path, pixel points in the texture map and the depth map of a corresponding group in the selected image combination by using the pixel points as processing units includes:

carrying out forward mapping on the depth maps of the corresponding groups in parallel, and mapping the depth maps to the virtual viewpoints;

carrying out post-processing on the depth map subjected to forward mapping in parallel;

reversely mapping the texture maps of the corresponding groups in parallel;

and merging the pixels in each virtual texture map generated after the reverse mapping in parallel.

Optionally, the forward mapping the depth maps of the respective groups in parallel onto the virtual viewpoint includes: running a first kernel function on the graphics processor, forward mapping pixels in the respective sets of depth maps in parallel to corresponding virtual viewpoint positions, wherein: adopting atomic operation to obtain a maximum value of a pixel value for a plurality of depth values mapped to the same pixel of the virtual viewpoint to obtain a first depth map of a corresponding virtual viewpoint position; and creating a second depth map of the virtual viewpoint position based on the first depth map of the virtual viewpoint position, performing parallel processing on each pixel in the second depth map, and taking the maximum value of pixel points in a preset area around the corresponding pixel position in the first depth map.

Optionally, the performing, in parallel, a post-processing on the depth map after the forward mapping includes: and running a second kernel function on the graphics processor, and performing median filtering processing on each pixel in a second depth map obtained after forward mapping in a preset area around the pixel position.

Optionally, the reversely mapping the texture maps of the respective groups in parallel includes: and running a third kernel function on the graphics processor, and performing interpolation operation on the pixels in the texture maps of the selected corresponding groups in parallel to generate corresponding virtual texture maps.

Optionally, the fusing the pixels in the virtual texture maps generated after the inverse mapping in parallel includes: and running a fourth kernel function on the graphics processor, and performing weighted fusion on the pixels at the same position in each virtual texture map generated after reverse mapping in parallel.

Optionally, after fusing the pixels in the virtual texture maps generated after the inverse mapping in parallel, the method further includes: and filling holes in parallel for each pixel in the texture map after weighted fusion to obtain an image corresponding to the virtual viewpoint.

Optionally, the inputting the texture map and the depth map of the corresponding group of the virtual viewpoints into the graphics processor includes: and respectively inputting texture maps and depth maps of corresponding groups of virtual viewpoints in the virtual viewpoint path into a plurality of graphics processors, and processing the texture maps and the depth maps in parallel by the graphics processors to generate a plurality of virtual viewpoint images.

The embodiment of the invention provides a virtual viewpoint image generation system, which comprises:

the system comprises a central processing unit, a display unit and a display unit, wherein the central processing unit is suitable for acquiring an image combination of a multi-angle free visual angle, parameter data of the image combination and preset virtual viewpoint path data, wherein the image combination comprises a plurality of angle-synchronous texture maps and depth maps with corresponding relations; selecting a texture map and a depth map of a corresponding group of each virtual viewpoint in the virtual viewpoint path from the image combination according to the preset virtual viewpoint path data and the parameter data of the image combination;

and the graphics processor is suitable for combining and rendering the pixel points in the texture map and the depth map of the corresponding group in the selected image combination by using the pixel points as processing units through a plurality of threads aiming at each virtual viewpoint in the virtual viewpoint path to obtain the image corresponding to the virtual viewpoint.

Optionally, the graphics processor is adapted to forward map respective sets of depth maps in parallel onto the virtual viewpoint; carrying out post-processing on the depth map subjected to forward mapping in parallel; reversely mapping the texture maps of the corresponding groups in parallel; and merging the pixels in each virtual texture map generated after the reverse mapping in parallel.

An embodiment of the present invention further provides an electronic device, including: the virtual viewpoint image generation method comprises a memory, a central processing unit and a graphic processor, wherein computer instructions capable of running on the central processing unit and the graphic processor are stored in the memory, and when the central processing unit and the graphic processor run the computer instructions in a coordinated manner, the steps of the virtual viewpoint image generation method according to any embodiment of the invention are executed.

The embodiment of the present invention further provides a computer-readable storage medium, on which computer instructions are stored, and when the computer instructions are executed, the steps of the virtual viewpoint image generation method according to any embodiment of the present invention are executed.

By adopting the virtual viewpoint image generation scheme of the embodiment of the invention, the texture paint and the depth map of the corresponding group of each virtual viewpoint are input into the graphic processor, and the pixel points can be taken as processing units for each virtual viewpoint in the virtual viewpoint path, and the pixel points in the texture map and the depth map of the corresponding group in the selected image combination are respectively combined and rendered by a plurality of threads to obtain the image corresponding to the virtual viewpoint.

Furthermore, by running the first kernel function on the graphics processor, the pixels in the depth maps of the corresponding groups are forward mapped in parallel to the corresponding virtual viewpoint positions, and for a plurality of depth values mapped to the same pixel of the virtual viewpoint, an atomic operation is adopted to take the value with the largest pixel value to obtain the first depth map of the corresponding virtual viewpoint position, so that the foreground and background occlusion relationship in the depth map mapping can be quickly processed, and for each pixel in the created second depth map of the virtual viewpoint position, the maximum value of the pixel point of the preset area around the corresponding pixel position in the first depth map is taken, so that the mapping gap effect can be improved.

Furthermore, by running a second kernel function on the graphics processor, median filtering processing is performed on each pixel in a second depth map obtained after forward mapping in a preset area around the pixel position, so that the post-processing speed can be greatly increased, and the aging performance of the post-processing is improved.

Furthermore, a third kernel function is operated on the graphics processor, and the pixels in the texture maps of the selected corresponding groups are subjected to interpolation operation in parallel to generate corresponding virtual texture maps, so that the processing speed of reverse mapping can be greatly increased, and the timeliness of the reverse mapping can be improved.

Furthermore, a fourth kernel function is operated on the graphics processor, and the pixels at the same position in each virtual texture map generated after reverse mapping are subjected to weighted fusion in parallel, so that the fusion speed of the virtual texture maps can be greatly increased, and the aging performance of image fusion is improved.

Further, the holes are filled in parallel for each pixel in the texture map after weighted fusion to obtain an image corresponding to the virtual viewpoint, the quality of the generated virtual viewpoint image can be improved through the hole filling, and the hole filling is performed in parallel for each pixel, so that the hole filling speed can be greatly increased, and the timeliness of the hole filling is improved.

Furthermore, a plurality of texture map and depth map input values of the corresponding group of each virtual viewpoint in the virtual path are input into the graphics processor to generate a plurality of virtual viewpoint images in parallel, so that the virtual viewpoint image generation speed can be further accelerated, and the time efficiency of virtual viewpoint image generation can be improved.

Drawings

FIG. 1 is a block diagram of a data processing system according to an embodiment of the present invention;

FIG. 2 is a flow chart of a data processing method in an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a data processing system in an application scenario according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an interactive interface of an interactive terminal according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a server according to an embodiment of the present invention;

FIG. 6 is a flow chart of a data interaction method in an embodiment of the invention;

FIG. 7 is a block diagram of another data processing system in an embodiment of the invention;

FIG. 8 is a block diagram of a data processing system in another application scenario of an embodiment of the invention;

fig. 9 is a schematic structural diagram of an interactive terminal in an embodiment of the present invention;

FIG. 10 is a schematic diagram of an interactive interface of another interactive terminal according to an embodiment of the present invention;

FIG. 11 is a schematic diagram of an interactive interface of another interactive terminal according to an embodiment of the present invention;

FIG. 12 is a schematic diagram of an interactive interface of another interactive terminal according to an embodiment of the present invention;

FIG. 13 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention;

FIG. 14 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention;

FIG. 15 is a flow chart of another data processing method in an embodiment of the present invention;

FIG. 16 is a flow chart of a method of capturing synchronized video frames in an amount of compressed video data according to an embodiment of the present invention;

FIG. 17 is a flow chart of another data processing method in an embodiment of the present invention;

FIG. 18 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention;

FIG. 19 is a block diagram of another data processing system in an embodiment of the invention;

FIG. 20 is a flow chart of a method of data synchronization in an embodiment of the present invention;

FIG. 21 is a timing diagram of a pull stream synchronization in an embodiment of the present invention;

FIG. 22 is a flow chart of another method of truncating a video frame that is synchronized in an amount of compressed video data in an embodiment of the present invention;

FIG. 23 is a schematic configuration diagram of another data processing apparatus in the embodiment of the present invention;

FIG. 24 is a schematic structural diagram of a data synchronization system according to an embodiment of the present invention;

FIG. 25 is a schematic diagram of a data synchronization system in an application scenario according to an embodiment of the present invention;

FIG. 26 is a flow chart of a method of depth map generation in an embodiment of the present invention;

FIG. 27 is a block diagram of a server in an embodiment of the invention;

FIG. 28 is a diagram illustrating a depth map processing performed by a server cluster according to an embodiment of the present invention;

fig. 29 is a flowchart of a virtual viewpoint image generation method in the embodiment of the present invention;

FIG. 30 is a flowchart of a method for performing combined rendering by a GPU in accordance with an embodiment of the present invention;

FIG. 31 is a diagram illustrating a hole filling method according to an embodiment of the present invention;

fig. 32 is a schematic configuration diagram of a virtual visual point image generating system according to an embodiment of the present invention;

fig. 33 is a schematic structural diagram of an electronic device in an embodiment of the invention.

FIG. 34 is a schematic diagram of another data synchronization system in accordance with an embodiment of the present invention;

FIG. 35 is a schematic diagram of another data synchronization system in accordance with an embodiment of the present invention;

fig. 36 is a schematic structural diagram of an acquisition apparatus in an embodiment of the present invention.

Fig. 37 is a schematic diagram of an acquisition array in an application scenario according to an embodiment of the present invention.

FIG. 38 is a block diagram of another data processing system in accordance with an embodiment of the present invention.

Fig. 39 is a schematic structural diagram of another interactive terminal in the embodiment of the present invention.

FIG. 40 is a diagram illustrating an interactive interface of another interactive terminal in an embodiment of the present invention.

FIG. 41 is a diagram illustrating an interactive interface of another interactive terminal in an embodiment of the present invention.

FIG. 42 is a diagram illustrating an interactive interface of another interactive terminal in an embodiment of the present invention.

Fig. 43 is a schematic connection diagram of an interactive terminal in an embodiment of the present invention.

Fig. 44 is a schematic view of an interactive operation of an interactive terminal in an embodiment of the present invention.

FIG. 45 is a schematic diagram of an interactive interface of another interactive terminal in the embodiment of the present invention.

FIG. 46 is a diagram illustrating an interactive interface of another interactive terminal in an embodiment of the present invention.

FIG. 47 is a diagram illustrating an interactive interface of another interactive terminal in an embodiment of the present invention.

FIG. 48 is a diagram illustrating an interactive interface of another interactive terminal in an embodiment of the present invention.

Detailed Description

In the traditional playing scenes such as live broadcasting, rebroadcasting and recorded broadcasting, as mentioned above, a user often watches a game only through one viewpoint position in the watching process, and cannot freely switch the viewpoint position to watch a game picture or a game process at different viewpoint positions, so that the user cannot experience the feeling of watching the game while moving the viewpoint on the scene.

The 6degree of freedom (6 degree of freedom,6 DoF) technology can provide high degree of freedom viewing experience, and a user can adjust the viewing angle of video viewing through an interactive means in the viewing process, so that the user can view the video from a free viewpoint to be viewed, and the viewing experience is greatly improved.

In order to realize a 6DoF scene, free-D playback technology, light field rendering technology, 6DoF video generation technology based on a depth map, and the like exist at present. The Free-D playback technology is used for expressing a 6DoF image by acquiring point cloud data of a scene through multi-angle shooting, and the light field rendering technology is used for expressing the 6DoF image by obtaining depth information and three-dimensional position information of pixels through focal length and space position changes of a dense light field. The method for generating the 6DoF video based on the depth map comprises the step of performing combined rendering on the texture map and the depth map of the corresponding group in the image combination of the video frame at the moment of user interaction based on the virtual viewpoint position and the parameter data corresponding to the texture map and the depth map of the corresponding group, and reconstructing the 6DoF image or the video.

For example, when a Free-D playback scheme is used in the field, a large number of cameras are used to collect raw data, the raw data is collected into a computer room on the field through a Digital Serial Interface (SDI) acquisition card, the raw data is processed through a computing server in the computer room on the field, point cloud data expressing three-dimensional positions and pixel information of all points in space is obtained, and a 6DoF scene is reconstructed. The scheme enables the data volume of field acquisition, transmission and calculation to be extremely large, and particularly for the playing scenes with high requirements on a transmission network and a calculation server, such as live broadcast and rebroadcast, the implementation cost of the 6DoF reconstruction scene is too high, and the limiting conditions are too many. Moreover, at present, no good technical standard and industrial software and hardware support point cloud data, so that a long data processing time is required from the field original data acquisition to the final 6DoF reconstructed scene, and the requirements of low-delay playing and real-time interaction of a multi-angle free visual angle video cannot be met.

For another example, when a light field rendering scheme is used in the field, depth of field information and three-dimensional position information of pixels need to be obtained through focal length and spatial position changes of a dense light field, and because a light field image obtained by the dense light field has too high resolution, the light field image is often required to be decomposed into hundreds of conventional two-dimensional pictures, so that the scheme also enables the data volume of field acquisition, transmission and calculation to be extremely large, has high requirements on a field transmission network and a field calculation server, has too high implementation cost and too many limiting conditions, and cannot process data quickly. In addition, the technical means of reconstructing a 6DoF scene through a light field image is still in experimental exploration, and the requirements of low-delay playing and real-time interaction of a multi-angle free-view video cannot be effectively met at present.

In summary, both Free-D playback technology and light field rendering technology have very large demands on storage capacity and computation amount, and therefore a large number of servers need to be arranged on site for processing, which results in too high implementation cost, too many limiting conditions, and inability to process data quickly, and thus cannot meet the demands of viewing and interaction, and is not favorable for popularization.

Although the 6DoF video reconstruction method based on the depth map can reduce the data computation amount in the video reconstruction process, the requirements of low-delay playing and real-time interaction of multi-angle free visual angle videos are difficult to meet due to the constraints of various factors such as network transmission bandwidth and equipment decoding capacity.

In order to solve the above problems, some embodiments of the present invention provide a multi-angle free view image generation scheme, which employs a distributed system architecture, wherein an acquisition array formed by a plurality of acquisition devices is arranged in a field acquisition area to perform synchronous acquisition of frame images at a plurality of angles, a data processing device intercepts the frame images acquired by the acquisition devices according to a frame interception instruction, a server uses the frame images of a plurality of synchronous video frames uploaded by the data processing device as an image combination, and can determine parameter data corresponding to the image combination and depth data of each frame image in the image combination, and perform frame image reconstruction on a preset virtual viewpoint path based on the parameter data corresponding to the image combination, and pixel data and depth data of a preset frame image in the image combination, so as to obtain corresponding multi-angle free view video data, and insert the multi-angle free view video data into a data stream to be played of a play control device for transmission to a play terminal.

Referring to a schematic structural diagram of a data processing system of an application scenario in an embodiment of the present invention, a data processing system 10 includes: data processing equipment 11, server 12, broadcast controlgear 13 and broadcast terminal 14, wherein, data processing equipment 11 can carry out the intercepting of video frame to the frame image that gathers array collection in the field acquisition region, through treating the video frame that generates multi-angle free visual angle image to intercept, can avoid a large amount of data transmission and data processing, later, carry out the generation of multi-angle free visual angle image by server 12, can make full use of the powerful computing power of server, can generate multi-angle free visual angle video data fast, thereby can insert in time in the data stream of waiting to broadcast of broadcast controlgear, realize the broadcast of multi-angle free visual angle with low cost, satisfy the demand that the user broadcast and real-time interaction of low time delay of multi-angle free visual angle video.

In order to make the embodiments of the present disclosure more clearly understood and implemented by those skilled in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort shall fall within the protection scope of the present specification.

Referring to the flowchart of the data processing method shown in fig. 2, in the embodiment of the present invention, the method may specifically include the following steps:

and S21, receiving frame images of a plurality of synchronous video frames uploaded by the data processing equipment as an image combination.

The synchronous video frames are obtained by intercepting video frames of specified frame time in multi-channel video data streams which are synchronously acquired and uploaded from different positions of a field acquisition area in real time by the data processing equipment based on a video frame intercepting instruction, and the shooting visual angles of the synchronous video frames are different.

In a specific implementation, the video frame capture instruction may include information of a specified frame time, and the data processing device captures a video frame at a corresponding frame time from the multi-channel video data stream according to the information of the specified frame time in the video frame capture instruction. The appointed frame time can take frames as a unit, and takes the N-th to M-th frames as the appointed frame time, wherein N and M are integers not less than 1, and N is not less than M; or, the specified frame time may be time unit, and the X to Y seconds are taken as the specified frame time, where X and Y are positive numbers, and X is less than or equal to Y. Thus, the plurality of synchronized video frames may include all frame-level synchronized video frames corresponding to the specified frame time, with the pixel data of each video frame forming a corresponding frame image.

For example, the data processing device may obtain the 2 nd frame in the multiple paths of video data streams at the specified frame time according to the received video frame capture instruction, and then the data processing device captures the video frames of the 2 nd frame in each path of video data streams respectively, and the captured video frames of the 2 nd frame in each path of video data streams are frame-level synchronized to serve as the obtained multiple synchronized video frames.

For another example, assuming that the capture frame rate is set to 25fps, that is, 25 frames are captured in 1 second, and the data processing device can obtain a video frame with a specified frame time within 1 second in the multiple video data streams according to the received video frame capture instruction, the data processing device can capture 25 video frames in 1 second in each video data stream respectively, and frame-level synchronization between the 1 st video frames in 1 second in each captured video data stream, and frame-level synchronization between the 2 nd video frames in 1 second in each captured video data stream, until frame-level synchronization between the 25 th video frames in 1 second in each captured video data stream, as the obtained multiple synchronized video frames.

For example, the data processing device may obtain the 2 nd frame and the 3 rd frame in the multiple video data streams at the specified frame time according to the received video frame intercepting instruction, and the data processing device may respectively intercept the video frames of the 2 nd frame and the video frames of the 3 rd frame in each video data stream, and respectively perform frame level synchronization between the video frames of the 2 nd frame and between the video frames of the 3 rd frame in each video data stream as a plurality of synchronized video frames.

And S22, determining the parameter data corresponding to the image combination.

In a specific implementation, the parameter data corresponding to the image combination may be obtained through a parameter matrix, and the parameter matrix may include an internal parameter matrix, an external parameter matrix, a rotation matrix, a translation matrix, and the like. Thereby, the mutual relation between the three-dimensional geometrical position of a given point on the surface of the spatial object and its corresponding point in the image combination can be determined.

In the embodiment of the present invention, a Motion reconstruction (SFM) algorithm may be adopted, and based on a parameter matrix, feature extraction, feature matching, and global optimization are performed on an obtained image combination, and an obtained parameter estimation value is used as parameter data corresponding to the image combination. The algorithm adopted by the feature extraction may include any one of the following: scale-Invariant Feature Transform (SIFT) algorithm, speeded-Up Robust Features (SURF) algorithm, and Speeded Segment Test (FAST) algorithm. The algorithm adopted by the feature matching can comprise the following steps: euclidean distance calculation methods, random Sample Consensus (RANSC) algorithms, and the like. The algorithm for global optimization may include: bundle Adjustment (BA), and the like.

And S23, determining the depth data of each frame of image in the image combination.

In a specific implementation, the depth data of each frame image may be determined based on a plurality of frame images in the image combination. Wherein the depth data may include depth values corresponding to pixels of each frame of image in the image combination. The distances of the acquisition points to the various points in the scene may be used as the above-mentioned depth values, which may directly reflect the geometry of the visible surface in the area to be viewed. For example, with the origin of the shooting coordinate system as the optical center, the depth values may be distances of respective points in the field to the optical center along the shooting optical axis. It will be appreciated by those skilled in the art that the above distances may be relative values and that the same reference may be used for multiple frame images.

In an embodiment of the present invention, an algorithm of binocular stereo vision may be adopted to calculate the depth data of each frame of image. In addition, the depth data can be indirectly estimated by analyzing the features of the frame image, such as photometric features, light and shade features, and the like.

In another embodiment of the present invention, a multi-View three-dimensional reconstruction (MVS) algorithm may be used to reconstruct the frame image. In the reconstruction process, all pixels can be adopted for reconstruction, and the pixels can be subjected to down-sampling and reconstructed by only partial pixels. Specifically, the pixel points of each frame image can be matched, the three-dimensional coordinates of each pixel point are reconstructed, points with image consistency are obtained, and then the depth data of each frame image is calculated. Or matching the pixel points of the selected frame images, reconstructing the three-dimensional coordinates of the pixel points of each selected frame image, obtaining points with image consistency, and then calculating the depth data of the corresponding frame image. The pixel data of the frame image corresponds to the calculated depth data, and the manner of selecting the frame image may be set according to a specific situation, for example, the distance between the frame image of the depth data and other frame images may be calculated according to needs, and a part of the frame images may be selected.

And S24, based on the parameter data corresponding to the image combination, and the pixel data and the depth data of the preset frame image in the image combination, carrying out frame image reconstruction on the preset virtual viewpoint path to obtain corresponding multi-angle free visual angle video data.

The multi-angle freeview video data may include: multi-angle free view spatial data and multi-angle free view temporal data of frame images ordered according to frame time.

The pixel data of the frame image may be YUV data or RGB data, or may be other data capable of expressing the frame image; the depth data may include depth values corresponding to the pixel data of the frame image one by one, or may be a partial value selected from a set of depth values corresponding to the pixel data of the frame image one by one, where the specific selection is determined according to a specific scenario; the virtual viewpoint is selected from a multi-angle free visual angle range, and the multi-angle free visual angle range is a range supporting the switching and watching of viewpoints of an area to be watched.

In a specific implementation, the preset frame image may be all frame images in the image combination, or may be a selected partial frame image. The selection mode may be set according to a specific situation, for example, a partial frame image at a corresponding position in the image combination may be selected according to a position relationship between the acquisition points; for another example, the partial frame image of the corresponding frame time in the image combination may be selected according to the frame time or the frame period desired to be acquired.

Because the preset frame image can correspond to different frame times, each virtual viewpoint in the virtual viewpoint path can correspond to each frame time, a corresponding frame image is obtained according to the frame time corresponding to each virtual viewpoint, then, based on the image combination corresponding parameter data, depth data and pixel data of the frame image corresponding to the frame time of each virtual viewpoint, the frame image reconstruction is carried out on each virtual viewpoint, and corresponding multi-angle free view video data is obtained, at this time, the multi-angle free view video data can include: multi-angle free view spatial data and multi-angle free view temporal data of frame images ordered according to frame time. In other words, in the specific implementation, besides the multi-angle free view image at a certain moment, a sequential or discontinuous multi-angle free view video can be realized.

In an embodiment of the present invention, the image combination includes a number of synchronous video frames, where a1 synchronous video frame corresponds to the first frame time, a2 synchronous video frames corresponds to the second frame time, and a1+ a2= a; and, a virtual viewpoint path composed of B virtual viewpoints is preset, where B1 virtual viewpoints correspond to a first frame time, B2 virtual viewpoints correspond to a second frame time, and B1+ B2 is not more than 2B, then based on the image combination corresponding parameter data, pixel data and depth data of frame images of a1 synchronous video frames at the first frame time, performing first frame image reconstruction on the path composed of B1 virtual viewpoints, and based on the image combination corresponding parameter data, pixel data and depth data of frame images of a21 synchronous video frames at the first frame time, performing second frame image reconstruction on the path composed of B2 virtual viewpoints, and finally obtaining corresponding multi-angle free view video data, where the multi-angle free view video data may include: multi-angle free view spatial data and multi-angle free view temporal data of frame images ordered according to frame time.

It can be understood that the specified frame time and the virtual viewpoint may be divided more finely, so as to obtain more synchronous video frames and virtual viewpoints corresponding to different frame times, and the above embodiment is merely an example, and is not a limitation to the specific embodiment.

In the embodiment of the present invention, a Depth Image Based Rendering (DIBR) algorithm may be adopted, and the pixel data and the Depth data of the preset frame Image are combined and rendered according to the Image combination corresponding parameter data and the preset virtual viewpoint path, so as to realize frame Image reconstruction Based on the preset virtual viewpoint path and obtain corresponding multi-angle free view video data.

And S25, inserting the multi-angle free visual angle video data into a data stream to be played of a play control device and playing the data stream through a play terminal.

The playing control device may use multiple video data streams as input, where the video data streams may come from each acquisition device in the acquisition array, or from other acquisition devices. The playing control device can select one path of input video data stream as the data stream to be played according to the requirement, wherein the multi-angle free-view video data obtained in the step S24 can be selected to be inserted into the data stream to be played, or the video data streams of other input interfaces are switched to the input interfaces of the multi-angle free-view video data, the playing control device outputs the selected data stream to be played to the playing terminal, and the data stream to be played can be played through the playing terminal, so that a user can watch the video image of the multi-angle free-view through the playing terminal. The playing terminal can be a video playing device such as a television, a mobile phone, a tablet, a computer and the like or an electronic device comprising a display screen.

In a specific implementation, the multi-angle free-view video data of the to-be-played data stream inserted into the play control device may be retained in the play terminal, so as to facilitate the user to perform time-shifting viewing, where the time-shifting may be operations such as pause, rewind, fast-forward to the current time, and the like performed when the user views the data stream.

By adopting the data processing method, the data processing equipment in the distributed system architecture can be used for processing the interception of the appointed video frame and the reconstruction of the multi-angle free visual angle video obtained by the server after the preset frame image is intercepted, a large number of servers can be prevented from being arranged on site for processing, and the direct uploading of the video data stream acquired by the acquisition equipment of the acquisition array can also be avoided, so that a large number of transmission resources and server processing resources can be saved, and under the condition of limited network transmission bandwidth, the multi-angle free visual angle video of the appointed video frame can be reconstructed in real time, the low-delay playing of the multi-angle free visual angle video is realized, the limitation of the network transmission bandwidth is reduced, the implementation cost can be reduced, the limitation condition is reduced, the realization is easy, and the requirements of the low-delay playing and the real-time interaction of the multi-angle free visual angle video are met.

In a specific implementation, the depth data of the frame image preset in the image combination may be mapped to the corresponding virtual viewpoints respectively according to the relationship between the virtual parameter data of each virtual viewpoint in the preset virtual viewpoint path and the parameter data corresponding to the image combination; and reconstructing the frame image according to the pixel data and the depth data of the preset frame image which are respectively mapped to the corresponding virtual viewpoint and the preset virtual viewpoint path to obtain corresponding multi-angle free visual angle video data.

Wherein the virtual parameter data of the virtual viewpoint may include: virtual viewing position data and virtual viewing angle data; the image combining the corresponding parameter data may include: collecting position data, shooting angle data and the like. The reconstructed image can be obtained by firstly adopting a forward mapping method and then carrying out a reverse mapping method.

In a specific implementation, the collected position data and the shooting angle data may be referred to as external parameter data, and the parameter data may further include internal parameter data, which may include attribute data of the collecting device, so that the mapping relationship may be more accurately determined. For example, the internal parameter data may include distortion data, and the mapping relationship may be further accurately determined spatially due to consideration of distortion factors.

In a specific embodiment, in order to facilitate subsequent data acquisition, a stitched image corresponding to the image combination may be generated based on the pixel data and the depth data of the image combination, and the stitched image may include a first field and a second field, where the first field includes the pixel data of the image combination and the second field includes the depth data of the image combination, and then the stitched image corresponding to the image combination and the corresponding parameter data are stored.

In another embodiment, in order to save a storage space, a stitched image corresponding to a preset frame image in the image combination may be generated based on pixel data and depth data of the preset frame image in the image combination, where the stitched image corresponding to the preset frame image may include a first field and a second field, where the first field includes the pixel data of the preset frame image, and the second field includes the depth data of the preset frame image, and then only the stitched image corresponding to the preset frame image and corresponding parameter data are stored.

The first field corresponds to the second field, the spliced image can be divided into an image area and a depth map area, the pixel field of the image area stores the pixel data of the plurality of frame images, and the pixel field of the depth map area stores the depth data of the plurality of frame images; a pixel field of pixel data of a frame image is stored in the image area as the first field, and a pixel field of depth data of the frame image is stored in the depth map area as the second field; the obtained spliced image of the image combination and the parameter data corresponding to the image combination can be stored in a data file, and when the spliced image or the corresponding parameter data needs to be obtained, the spliced image or the corresponding parameter data can be read from a corresponding storage space according to a storage address contained in a header file of the data file.

In addition, the storage format of the image combination may be a video format, the number of the image combinations may be multiple, and each image combination may be an image combination corresponding to different frame times after the video is decapsulated and decoded.

In specific implementation, interactive frame time information of an interactive time can be determined based on a received image reconstruction instruction from an interactive terminal, a stored spliced image corresponding to the interactive frame time and a parameter data corresponding to the corresponding image combination are sent to the interactive terminal, so that the interactive terminal selects corresponding pixel data and depth data and corresponding parameter data in the spliced image according to a preset rule based on virtual viewpoint position information determined by interactive operation, the selected pixel data and depth data and the parameter data are combined and rendered, and multi-angle free view angle video data corresponding to a to-be-interacted virtual viewpoint position are reconstructed to obtain and play the multi-angle free view angle video data.

The preset rule may be set according to a specific scenario, for example, based on the virtual viewpoint position information determined by the interactive operation, the position information of W adjacent virtual viewpoints, which are sorted by distance and closest to the virtual viewpoint at the interactive time, is selected, and the pixel data and the depth data, which satisfy the interactive frame time information, corresponding to the W +1 virtual viewpoints including the virtual viewpoint at the interactive time are obtained in the stitched image.

The interactive frame time information is determined based on a trigger operation from an interactive terminal, where the trigger operation may be a trigger operation input by a user or a trigger operation automatically generated by the interactive terminal, for example, the interactive terminal may automatically initiate the trigger operation when detecting that the identifier of the multi-angle free viewpoint data frame exists. When the user manually triggers, the interaction terminal may select the time information of triggering the interaction after displaying the interaction prompt information, or the interaction terminal may receive the historical time information of triggering the interaction by the user operation, where the historical time information may be the time information before the current playing time.

In a specific implementation, the interactive terminal may perform combined rendering on pixel data and depth data of a stitched image of a preset frame image in an acquired image combination of interactive frame times by using the same method as that in step S24 based on the stitched image of the preset frame image and corresponding parameter data, the interactive frame time information, and the virtual viewpoint position information of the interactive frame times in the acquired image combination of the interactive frame times, to obtain multi-angle free view video data corresponding to the interactive virtual viewpoint position, and start playing the multi-angle free view video at the interactive virtual viewpoint position.

By adopting the scheme, the multi-angle free visual angle video data corresponding to the interactive virtual viewpoint position can be generated at any time based on the image reconstruction instruction from the interactive terminal, and the user interactive experience can be further improved.

Referring to the schematic structural diagram of the data processing system shown in fig. 1, in an embodiment of the present invention, as shown in fig. 1, the data processing system 10 may include: data processing device 11, server 12, playback control device 13, and playback terminal 14, wherein:

the data processing device 11 is adapted to intercept, based on a video frame interception instruction, a plurality of synchronous video frames at a specified frame time from a plurality of video data streams synchronously acquired in real time at different positions of the field acquisition area, and upload the plurality of obtained synchronous video frames at the specified frame time to the server, where the plurality of video data streams may be video data streams in a compressed format or in a non-compressed format;

the server 12 is adapted to use the received frame images of multiple synchronized video frames at a specified frame time uploaded by the data processing device 11 as an image combination, determine parameter data corresponding to the image combination and depth data of each frame image in the image combination, and perform frame image reconstruction on a preset virtual viewpoint path based on the parameter data corresponding to the image combination and pixel data and depth data of a preset frame image in the image combination to obtain corresponding multi-angle free view video data, where the multi-angle free view video data includes: multi-angle free view spatial data and multi-angle free view temporal data of frame images ordered according to frame time;

The play control device 13 is adapted to insert the multi-angle free-view video data into a data stream to be played;

the playing terminal 14 is adapted to receive the data stream to be played from the playing control device 13 and perform real-time playing.

In a specific implementation, the play control terminal 13 may output the data stream to be played based on the control instruction. As an alternative example, the playback control device 13 may select one of the multiple data streams as the data stream to be played back, or may constantly switch the selection among the multiple data streams to continuously output the data stream to be played back. The director control apparatus may be a play control apparatus in the embodiment of the present invention. The director control device can be an artificial or semi-artificial director control device for carrying out play control based on an external input control instruction, and can also be a virtual director control device capable of automatically carrying out director control based on artificial intelligence or big data learning or a preset algorithm.

By adopting the data processing system, the data processing equipment in the distributed system architecture can be adopted to process the interception of the appointed video frame and the multi-angle free visual angle video reconstruction after the preset frame image is intercepted by the server, the processing of a large number of servers in the field can be avoided, and the direct uploading of the video data stream acquired by the acquisition equipment of the acquisition array can also be avoided, so that a large number of transmission resources and server processing resources can be saved, and under the condition of limited network transmission bandwidth, the multi-angle free visual angle video of the appointed video frame can be reconstructed in real time, the low-delay playing of the multi-angle free visual angle video is realized, the limitation of the network transmission bandwidth is reduced, the implementation cost can be reduced, the limitation condition is reduced, the realization is easy, and the requirements of the low-delay playing and the real-time interaction of the multi-angle free visual angle video are met.

In a specific implementation, the server 12 is further adapted to generate a stitched image at a preset frame time in the image combination based on the pixel data and the depth data of a preset frame image in the image combination, where the stitched image includes a first field and a second field, where the first field includes the pixel data of the preset frame image in the image combination, and the second field includes the second field of the depth data of the preset frame image in the image combination, and store the stitched image of the image combination and the parameter data corresponding to the image combination.

In a specific implementation, the data processing system 10 may further include an interactive terminal 15, adapted to determine interactive frame time information based on a trigger operation, send an image reconstruction instruction including the interactive frame time information to a server, receive a stitched image and corresponding parameter data of a preset frame image in an image combination corresponding to the interactive frame time returned from the server, determine virtual viewpoint position information based on the interactive operation, select corresponding pixel data and depth data in the stitched image according to a preset rule, perform combined rendering based on the selected pixel data and depth data and the parameter data, reconstruct to obtain multi-angle free view video data corresponding to the interactive frame time virtual viewpoint position, and play the multi-angle free view video data.

The number of the playing terminals 14 may be one or more, the number of the interactive terminals 15 may be one or more, and the playing terminals 14 and the interactive terminals 15 may be the same terminal device. In addition, at least one of the server, the play control device, or the interactive terminal may be used as a transmitting terminal for the video frame capturing instruction, or other devices capable of transmitting the video frame capturing instruction may also be used.

It should be noted that, in the specific implementation, the positions of the data processing device and the server may be flexibly deployed according to the user requirement. For example, the data processing device may be located in a field non-collection area or in the cloud. For another example, the server may be placed in a non-acquisition area on the spot, and the cloud or the terminal access side, for example, at the terminal access side, edge node devices such as a base station, a set top box, a router, a home data center server, and a hotspot device may all be used as the server to obtain multi-angle free view data. Or the data processing device and the server can also be arranged together in a centralized manner to be used as a server cluster to perform cooperative work, so that the multi-angle free visual angle data can be rapidly generated, and the low-delay playing and real-time interaction of the multi-angle free visual angle video can be realized.

By adopting the scheme, the multi-angle free visual angle video data corresponding to the virtual viewpoint position to be interacted can be generated at any time based on the image reconstruction instruction from the interactive terminal, and the user interaction experience can be further improved.

In order to enable those skilled in the art to better understand and implement the embodiments of the present invention, the data processing system is described in detail below with a specific application scenario.

As shown in fig. 3, the schematic structural diagram of a data processing system in an application scenario shows a layout scenario of the data processing system of a basketball game, where the data processing system includes an acquisition array 31 composed of a plurality of acquisition devices, a data processing device 32, a cloud server cluster 33, a play control device 34, a play terminal 35, and an interaction terminal 36.

Referring to fig. 3, a basketball frame on the left side is used as a core viewpoint, the core viewpoint is used as a circle center, and a sector area located on the same plane as the core viewpoint is used as a preset multi-angle free viewing angle range. Each acquisition device in the acquisition array 31 can be arranged in different positions of a field acquisition area in a fan shape according to the preset multi-angle free visual angle range, and can synchronously acquire video data streams from corresponding angles in real time.

In particular implementations, the collection devices may also be located in the ceiling area of a basketball venue, on a basketball stand, or the like. The acquisition devices can be arranged and distributed along a straight line, a fan shape, an arc line, a circle or an irregular shape. The specific arrangement mode can be set according to one or more factors such as specific field environment, the number of the acquisition equipment, the characteristics of the acquisition equipment, imaging effect requirements and the like. The acquisition device may be any device having a camera function, such as a general camera, a mobile phone, a professional camera, and the like.

In order not to affect the operation of the acquisition device, the data processing device 32 may be located in a field non-acquisition area, which may be regarded as a field server. The data processing device 32 may send a stream pulling instruction to each acquisition device in the acquisition array 31 through a wireless local area network, and each acquisition device in the acquisition array 31 transmits an obtained video data stream to the data processing device 32 in real time based on the stream pulling instruction sent by the data processing device 32. Wherein, each acquisition device in the acquisition array 31 can transmit the obtained video data stream to the data processing device 32 in real time through the switch 37.

When the data processing device 32 receives a video frame capture instruction, capture a plurality of frame images of the synchronous video frames from the video frames at the specified frame time in the received multiple video data streams, and upload the plurality of obtained synchronous video frames at the specified frame time to the cloud server cluster 33.

Correspondingly, the cloud server cluster 33 uses the received frame images of multiple synchronous video frames as an image combination, determines parameter data corresponding to the image combination and depth data of each frame image in the image combination, and performs frame image reconstruction on a preset virtual viewpoint path based on the parameter data corresponding to the image combination, pixel data and depth data of a preset frame image in the image combination to obtain corresponding multi-angle free view video data, where the multi-angle free view video data may include: multi-angle free view spatial data and multi-angle free view temporal data of frame images ordered according to frame time.

The server may be placed in the cloud, and in order to process data in parallel more quickly, the server cluster 33 in the cloud may be composed of a plurality of different servers or server groups according to different processing data.

For example, the cloud server cluster 33 may include: a first cloud server 331, a second cloud server 332, a third cloud server 333, and a fourth cloud server 334. The first cloud server 331 may be configured to determine parameter data corresponding to the image combination; the second cloud server 332 may be configured to determine depth data of each frame of image in the image combination; the third cloud server 333 may perform frame Image reconstruction on a preset virtual viewpoint path by using a Depth Image Based Rendering (DIBR) algorithm Based on the parameter data corresponding to the Image combination, the pixel data of the Image combination, and the Depth data; the fourth cloud server 334 may be configured to generate a multi-angle freeview video, where the multi-angle freeview video data may include: multi-angle free view spatial data and multi-angle free view temporal data of frame images ordered according to frame time.

It can be understood that the first cloud server 831, the second cloud server 832, the third cloud server 833 and the fourth cloud server 834 may also be a server group composed of a server array or a server sub-cluster, which is not limited in the embodiment of the present invention.

In an implementation, the cloud server cluster 33 may store the pixel data and the depth data of the image combination in the following manner:

generating a stitched image corresponding to a frame time based on the pixel data and the depth data of the image combination, wherein the stitched image comprises a first field and a second field, the first field comprises the pixel data of a preset frame image in the image combination, and the second field comprises the second field of the depth data of the preset frame image in the image combination. The obtained spliced image and the corresponding parameter data can be stored in a data file, and when the spliced image or the parameter data needs to be obtained, the spliced image or the parameter data can be read from the corresponding storage space according to the corresponding storage address in the header file of the data file.

Then, the playing control device 34 may insert the received multi-angle free-viewing angle video data into a data stream to be played, and the playing terminal 35 receives the data stream to be played from the playing control device 34 and plays the data stream in real time. The playing control device 34 may be a manual playing control device or a virtual playing control device. In specific implementation, a special server capable of automatically switching video streams can be set as a virtual play control device to control data sources. A director control apparatus such as a director may be used as a play control apparatus in the embodiments of the present invention.

It can be understood that the data processing device 32 may be disposed in a non-acquisition field area or a cloud end according to a specific scenario, and the server (cluster) and the play control device may be disposed in the non-acquisition field area, the cloud end or a terminal access side according to the specific scenario, which is not intended to limit the specific implementation and protection scope of the present invention.

As shown in the schematic interactive interface diagram of the interactive terminal shown in fig. 4, the interactive interface of the interactive terminal 40 has a progress bar 41, and in conjunction with fig. 3 and fig. 4, the interactive terminal 40 may associate the specified frame time received from the data processing device 32 with the progress bar, and may generate a plurality of interactive identifiers, such as

interactive identifiers

42 and 43, on the progress bar 41. Wherein the black segment of the progress bar 41 is the played part 41a and the blank segment of the progress bar 41 is the unplayed part 41b.

When the system of the interactive terminal reads the corresponding interactive mark 43 on the progress bar 41, the interface of the interactive terminal 40 may display the interactive prompt information. For example, when the user selects to trigger the current interactive identifier 43, the interactive terminal 40 receives the feedback and generates an image reconstruction instruction at the interactive frame time corresponding to the interactive identifier 43, and sends the image reconstruction instruction including the interactive frame time information to the cloud server cluster 33. When the user does not select the trigger, the interactive terminal 40 may continue to read the subsequent video data, and the played portion 41a on the progress bar continues to advance. The user may also select a trigger history interaction identifier, for example, the interaction identifier 42 displayed by the played part 41a on the trigger progress bar, when watching, and the interactive terminal 40 receives the feedback and generates an image reconstruction instruction at the interactive frame time corresponding to the interaction identifier 42.

When the image reconstruction instruction from the interactive terminal 40 is received by the cloud server cluster 33, the stitched image of the preset frame image in the corresponding image combination and the parameter data corresponding to the corresponding image combination may be extracted and transmitted to the interactive terminal 40.

The interactive terminal 40 determines interactive frame time information based on trigger operation, sends an image reconstruction instruction containing the interactive frame time information to the server, receives a spliced image and corresponding parameter data of a preset frame image in an image combination corresponding to the interactive frame time returned from the cloud server cluster 33, determines virtual viewpoint position information based on the interactive operation, selects corresponding pixel data and depth data and corresponding parameter data in the spliced image according to a preset rule, performs combined rendering on the selected pixel data and depth data, reconstructs to obtain multi-angle free view video data corresponding to the virtual viewpoint position at the interactive frame time, and plays the multi-angle free view video data.

It can be understood that each acquisition device in the acquisition array and the data processing device may be connected through a switch and/or a local area network, the number of the playing terminals and the number of the interaction terminals may be one or more, the playing terminals and the interaction terminals may be the same terminal device, the data processing device may be placed in a field non-acquisition area or a cloud according to a specific scenario, the server may be placed in the field non-acquisition area, the cloud or a terminal access side according to the specific scenario, and this embodiment is not used to limit the specific implementation and protection scope of the present invention.

The embodiment of the present invention further provides a server corresponding to the data processing method, and in order to enable those skilled in the art to better understand and implement the embodiment of the present invention, the following detailed description is provided by using specific embodiments with reference to the accompanying drawings.

Referring to the schematic structural diagram of the server shown in fig. 5, in the embodiment of the present invention, as shown in fig. 5, the server 50 may include:

a data receiving unit 51 adapted to receive frame images of a plurality of synchronized video frames uploaded by the data processing apparatus as an image combination;

a parameter data calculation unit 52 adapted to determine parameter data corresponding to the image combination;

a depth data calculation unit 53 adapted to determine depth data for each frame image in the image combination;

a video data obtaining unit 54, adapted to perform frame image reconstruction on a preset virtual viewpoint path based on the parameter data corresponding to the image combination, the pixel data of a preset frame image in the image combination, and the depth data, to obtain corresponding multi-angle free-view video data, where the multi-angle free-view video data includes: multi-angle free view spatial data and multi-angle free view temporal data of frame images ordered according to frame time.

A first data transmission unit 55, adapted to insert the multi-angle free-view video data into a data stream to be played of a play control device and play the data stream through a play terminal.

The plurality of synchronous video frames can be obtained by intercepting video frames of specified frame time in a plurality of video data streams synchronously acquired and uploaded from different positions of a field acquisition area in real time by the data processing equipment based on a video frame intercepting instruction, and the plurality of synchronous video frames have different shooting angles.

The server can be arranged in a field non-acquisition area, a cloud end or a terminal access side according to specific situations.

In a specific implementation, the multi-angle free-view video data of the to-be-played data stream inserted into the play control device may be retained in the play terminal, so as to be viewed in a time shift manner by the user, where the time shift may be pause, rewind, fast-forward to the current time, and the like when the user views the data stream.

In a specific implementation, as shown in fig. 5, the video data obtaining unit 54 may include:

a data mapping subunit 541, adapted to map, according to a relationship between the virtual parameter data of each virtual viewpoint in the preset virtual viewpoint path and the parameter data corresponding to the image combination, depth data of a frame image preset in the image combination to the corresponding virtual viewpoint, respectively;

The data reconstruction subunit 542 is adapted to perform frame image reconstruction according to the pixel data and the depth data of the preset frame image respectively mapped to the corresponding virtual viewpoint, and the preset virtual viewpoint path, so as to obtain corresponding multi-angle free-view video data.

In a specific implementation, as shown in fig. 5, the server 50 may further include:

a stitched image generating unit 56, adapted to generate a stitched image corresponding to the image combination based on the pixel data and the depth data of the preset frame image in the image combination, where the stitched image may include a first field and a second field, where the first field includes the pixel data of the preset frame image in the image combination, and the second field includes the depth data of the preset frame image in the image combination;

and a data storage unit 57 adapted to store a stitched image of the image combination and parameter data corresponding to the image combination.

the data extraction unit 58 is adapted to determine interactive frame time information at an interactive time based on a received image reconstruction instruction from the interactive terminal, and extract a stitched image of a preset frame image in a corresponding image combination corresponding to the interactive frame time and parameter data corresponding to the corresponding image combination;

And a second data transmission unit 59, adapted to transmit the stitched image and the corresponding parameter data extracted by the data extraction unit 58 to the interactive terminal, so that the interactive terminal selects the corresponding pixel data and depth data and the corresponding parameter data in the stitched image according to a preset rule based on the virtual viewpoint position information determined by the interactive operation, performs combined rendering on the selected pixel data and depth data, reconstructs multi-angle free-view video data corresponding to the virtual viewpoint position at the interactive frame time, and plays the multi-angle free-view video data.

The embodiment of the invention also provides a data interaction method and a data processing system, which can acquire a data stream to be played in real time from the playing control equipment and perform real-time playing and displaying, wherein each interactive identifier in the data stream to be played is associated with the appointed frame time of the video data, and then, interactive data of the appointed frame time corresponding to the interactive identifier can be acquired in response to the triggering operation of one interactive identifier.

By adopting the data interaction scheme in the embodiment of the invention, the interactive data can be acquired according to the triggering operation of the interactive identification in the playing process, and then multi-angle free visual angle display is carried out, so that the user interaction experience is improved. The following detailed description is directed in particular to data interaction methods and data processing systems with specific embodiments and with reference to the accompanying drawings.

Referring to a flow chart of the data interaction method shown in fig. 6, a data interaction method adopted by the embodiment of the present invention is described below through specific steps.

S61, acquiring a data stream to be played in real time from a play control device, and playing and displaying the data stream in real time, wherein the data stream to be played comprises video data and interactive identifications, and each interactive identification is associated with a specified frame time of the data stream to be played.

The appointed frame time can take frames as a unit, and takes the N-th to M-th frames as the appointed frame time, wherein N and M are integers not less than 1, and N is not less than M; alternatively, the specified frame time may be in time units, the X to Y seconds are taken as the specified frame time, X and Y are positive numbers, and X is less than or equal to Y.

In a specific implementation, the data stream to be played may be associated with a plurality of specified frame times, and the play control device may generate an interactive identifier corresponding to each specified frame time based on information of each specified frame time, so that when the data stream to be played is played and displayed in real time, the corresponding interactive identifier may be displayed at the specified frame time. And associating each interactive identifier with the video data in different modes according to actual conditions.

In an embodiment of the present invention, the data stream to be played may include a plurality of frame times corresponding to the video data, and each interactive identifier also has a corresponding designated frame time, so that information of the designated frame time corresponding to each interactive identifier and information of each frame time in the data stream to be played may be matched, and the frame time of the same information and the interactive identifier may be associated, so that when the data stream to be played is displayed in real time and the corresponding frame time is reached, the corresponding interactive identifier may be displayed.

For example, the data stream to be played includes N frame times, and the play control device generates corresponding M interactive identifiers based on the information of the M designated frame times. If the information of the ith frame time is the same as the information of the jth appointed frame time, the ith frame time and the jth interactive identifier can be associated, and the jth interactive identifier can be displayed when the ith frame time is displayed in real time, wherein i is a natural number not greater than N, and j is a natural number not greater than M.

S62, responding to the trigger operation of an interaction identifier, and acquiring interaction data corresponding to the specified frame time of the interaction identifier, wherein the interaction data comprises multi-angle free visual angle data.

In specific implementation, each interactive data corresponding to each designated frame time can be stored in a preset storage device, and since the interactive identifier and the designated frame time have a corresponding relationship, the interactive identifier displayed by the interactive terminal can be triggered by executing a triggering operation on the interactive terminal, and the designated frame time corresponding to the triggered interactive identifier can be obtained according to the triggering operation on the interactive identifier. Therefore, the interactive data of the appointed frame time corresponding to the triggered interactive identification can be acquired.

For example, the preset storage device may store M parts of interactive data, where the M parts of interactive data respectively correspond to M designated frame times, and the M designated frame times correspond to M interactive identifiers. Assuming that the triggered interaction identifier Pi is Pi, the specified frame time Ti corresponding to the interaction identifier Pi can be obtained according to the triggered interaction identifier Pi. Thus, the interactive data of the designated frame time Ti corresponding to the interactive mark Pi is acquired. Wherein i is a natural number.

The trigger operation may be a trigger operation input by a user, or a trigger operation automatically generated by the interactive terminal.

And the preset storage device can be arranged in a field non-acquisition area, a cloud end or a terminal access side. Specifically, the preset storage device may be a data processing device, a server, or an interactive terminal in the embodiment of the present invention, or an edge node device located at an interactive terminal side, such as a base station, a set top box, a router, a home data center server, a hotspot device, and the like.

And S63, displaying the images of the multi-angle free visual angle at the appointed frame time based on the interactive data.

In a specific implementation, an image reconstruction algorithm may be used to perform image reconstruction on the multi-angle free view data of the interactive data, and then perform image display of the multi-angle free view at the specified frame time.

If the appointed frame time is a frame time, the static image of the multi-angle free visual angle can be displayed; and if the appointed frame time corresponds to a plurality of frame times, displaying the dynamic image of the multi-angle free visual angle.

By adopting the scheme, in the process of video playing, interactive data can be acquired according to the triggering operation of the interactive identification, and then multi-angle free visual angle display is carried out, so that the interactive experience of users is improved.

In a specific implementation, the multi-angle free view data may be generated based on a plurality of received frame images corresponding to the specified frame time, where the plurality of frame images are obtained by intercepting, by a data processing device, a plurality of video data streams synchronously acquired by a plurality of acquisition devices in an acquisition array at the specified frame time, and the multi-angle free view data may include pixel data, depth data, and parameter data of the plurality of frame images, where an association relationship exists between the pixel data and the depth data of each frame image.

The pixel data of the frame image may be YUV data or RGB data, or may be other data capable of expressing the frame image. The depth data may include depth values corresponding one-to-one to the pixel data of the frame image, or may be partial values selected from a set of depth values corresponding one-to-one to the pixel data of the frame image. The specific selection of depth data depends on the specific situation.

In a specific implementation, the parameter data corresponding to the plurality of frame images may be obtained through a parameter matrix, where the parameter matrix may include an internal parameter matrix, an external parameter matrix, a rotation matrix, a translation matrix, and the like. Thereby, the correlation between the three-dimensional geometrical position of a given point on the surface of the spatial object and its corresponding point in the plurality of frame images can be determined.

In the embodiment of the invention, an SFM algorithm can be adopted, the feature extraction, the feature matching and the global optimization are carried out on the obtained frame images based on the parameter matrix, and the obtained parameter estimation value is used as the corresponding parameter data of the frame images. The specific algorithms used in the feature extraction, feature matching and global optimization processes can be found in the foregoing description.

In a particular implementation, depth data for each frame image may be determined based on the plurality of frame images. The depth data may include depth values corresponding to pixels of each frame image, among others. The distances of the acquisition points to the various points in the scene may be used as the above-mentioned depth values, which may directly reflect the geometry of the visible surface in the area to be viewed. For example, with the origin of the shooting coordinate system as the optical center, the depth values may be distances of respective points in the field to the optical center along the shooting optical axis. It will be appreciated by those skilled in the art that the above distances may be relative values and that the same reference may be used for multiple frame images.

In an embodiment of the present invention, an algorithm of binocular stereo vision may be adopted to calculate depth data from each frame of image. In addition, the depth data can be indirectly estimated by analyzing the features of the frame image, such as photometric features, light and shade features, and the like.

In another embodiment of the present invention, an MVS algorithm may be used to reconstruct frame images, pixel points of each frame image may be matched, three-dimensional coordinates of each pixel point may be reconstructed to obtain points with image consistency, and then depth data of each frame image may be calculated. Or matching the pixel points of the selected frame images, reconstructing the three-dimensional coordinates of the pixel points of each selected frame image, obtaining points with image consistency, and then calculating the depth data of the corresponding frame image. The pixel data of the frame image corresponds to the calculated depth data, and the manner of selecting the frame image may be set according to a specific situation, for example, the distance between the frame image of the depth data and other frame images may be calculated according to needs, and a part of the frame images may be selected.

In a specific implementation, the data processing device may intercept, based on the received video frame interception instruction, a frame-level synchronized video frame at the specified frame time in the multiple video data streams.

In a specific implementation, the video frame capture instruction may include frame time information for capturing a video frame, and the data processing device captures the video frame at a corresponding frame time from the multi-channel video data stream according to the frame time information in the video frame capture instruction. And the data processing equipment sends the frame time information in the video frame capturing instruction to the playing control equipment, and the playing control equipment can obtain corresponding appointed frame time according to the received frame time information and generate a corresponding interactive identifier according to the received frame time information.

In specific implementation, a plurality of acquisition devices in the acquisition array are arranged at different positions of an on-site acquisition area according to a preset multi-angle free visual angle range, and the data processing device can be arranged in an on-site non-acquisition area or a cloud end.

In a specific implementation, the multi-angle free view may refer to a spatial position and a view of a virtual viewpoint that enables a scene to be freely switched. For example, the multi-angle free view may be a 6 degree of freedom (6 DoF) view in which a spatial position of a virtual viewpoint may be represented as (x, y, z), and a view may be represented as three rotational directions

There are 6 degrees of freedom directions as a 6 degree of freedom (6 DoF) viewing angle.

And, the multi-angle free view range can be determined according to the needs of the application scenario.

In a specific implementation, the play control device may generate, based on information of frame time of the captured video frame from the data processing device, an interaction identifier associated with the video frame at the corresponding time in the data stream to be played. For example, after receiving a video frame capture instruction, the data processing device sends frame time information in the video frame capture instruction to the play control device. Then, the play control device may generate a corresponding interactive identifier based on the frame time information.

In specific implementation, corresponding interactive data can be generated according to the objects displayed on site, the associated information of the displayed objects and the like. For example, the interaction data may further include at least one of: the system comprises field analysis data, information data of a collected object, information data of equipment related to the collected object, information data of an article deployed on the field, and information data of a logo displayed on the field. Then, based on the interactive data, multi-angle free visual angle display is carried out, richer interactive information can be displayed for the user through the multi-angle free visual angle, and therefore user interactive experience can be further enhanced.

For example, when playing a basketball game, the interaction data may include one or more of analysis data of the ball game, information data of a certain player, information data of shoes worn by the player, information data of a basketball, information data of a logo of a live sponsor, and the like, in addition to the multi-angle free view data.

In a specific implementation, in order to conveniently return to the data stream to be played after the image presentation is finished, with reference to fig. 6, after step 63, the method may further include:

and S64, when the interaction ending signal is detected, switching to the playing control equipment to acquire the data stream to be played in real time and play and display the data stream in real time.

For example, when receiving an interaction end operation instruction, switching to a data stream to be played, which is obtained from the play control device in real time, and performing real-time play and display.

For another example, when the image display of the multi-angle free view at the specified frame time is detected to be the last image, the data stream to be played, which is obtained from the play control device in real time, is switched to and played and displayed in real time.

In an embodiment, the step 63 of displaying the multi-angle freeview image based on the interactive data may specifically include the following steps:

And determining a virtual viewpoint according to the interactive operation, wherein the virtual viewpoint is selected from a multi-angle free visual angle range, the multi-angle free visual angle range is a range supporting the switching and watching of the virtual viewpoint on the region to be watched, then displaying an image watched on the region to be watched based on the virtual viewpoint, and the image is generated based on the interactive data and the virtual viewpoint.

In particular, a virtual viewpoint path may be preset, and the virtual viewpoint path may include a plurality of virtual viewpoints. Because the virtual viewpoints are selected from a multi-angle free view range, a corresponding first virtual viewpoint can be determined according to the image view angle displayed during interactive operation, and then, images corresponding to the virtual viewpoints can be displayed in sequence from the first virtual viewpoint according to the preset sequence of the virtual viewpoints.

In the embodiment of the present invention, a DIBR algorithm may be adopted to perform combined rendering on pixel data and depth data corresponding to a triggered interaction identifier at a specified frame time according to parameter data in the multi-angle free view data and a preset virtual viewpoint path, so as to realize image reconstruction based on the preset virtual viewpoint path, obtain corresponding multi-angle free view video data, and further sequentially display corresponding images from the first virtual viewpoint according to a sequence of the preset virtual viewpoint.

If the specified frame time corresponds to the same frame time, the obtained multi-angle free view video data can comprise multi-angle free view spatial data of images sequenced according to the frame time, and can display static images of the multi-angle free view; if the specified frame time corresponds to different frame times, the obtained multi-angle free view video data may include multi-angle free view spatial data and multi-angle free view time data of frame images ordered according to frame time, and may display a dynamic image of a multi-angle free view, that is, a frame image of a video frame of a multi-angle free view is displayed.

The embodiment of the present invention further provides a system corresponding to the data interaction method, so that those skilled in the art can better understand and implement the embodiment of the present invention, and the detailed description is given through specific embodiments with reference to the accompanying drawings.

Referring to the schematic diagram of the data processing system shown in fig. 7, the data processing system 70 may include: acquisition array 71, data processing device 72, server 73, play control device 74, and interactive terminal 75, wherein:

the acquisition array 71 may include a plurality of acquisition devices, which are disposed at different positions in an on-site acquisition area according to a preset multi-angle free view range, and are adapted to synchronously acquire multiple video data streams in real time and upload the video data streams to the data processing device 72 in real time;

The data processing device 72 is adapted to intercept the multiple video data streams at a specified frame time according to a received video frame interception instruction, so as to obtain multiple frame images corresponding to the specified frame time and frame time information corresponding to the specified frame time, upload the multiple frame images at the specified frame time and the frame time information corresponding to the specified frame time to the server 73, and send the frame time information at the specified frame time to the play control device 74;

the server 73 is adapted to receive the plurality of frame images and the frame time information uploaded by the data processing device 72, and generate interactive data for interaction based on the plurality of frame images, where the interactive data includes multi-angle free view data, and the interactive data is associated with the frame time information;

the playing control device 74 is adapted to determine a specified frame time corresponding to the frame time information uploaded by the data processing device 72 in the data stream to be played, generate an interaction identifier associated with the specified frame time, and transmit the data stream to be played containing the interaction identifier to the interaction terminal 75;

The interactive terminal 75 is adapted to play and display a video including the interactive identifier in real time based on the received data stream to be played, and acquire interactive data stored in the server 73 and corresponding to the specified frame time based on the triggering operation on the interactive identifier, so as to display a multi-angle free view image.

It should be noted that, in the specific implementation, the positions of the data processing device and the server may be flexibly deployed according to the user requirement. For example, the data processing device may be located in a field non-collection area or in the cloud. For another example, the server may be placed in a non-acquisition area on the spot, and the cloud or the terminal access side, for example, at the terminal access side, edge node devices such as a base station, a set top box, a router, a home data center server, and a hotspot device may all be used as the server to obtain multi-angle free view data. Or the data processing device and the server can also be arranged together in a centralized manner to be used as a server cluster for cooperative work, so that the multi-angle free visual angle data can be rapidly generated, and the low-delay playing and real-time interaction of the multi-angle free visual angle video can be realized.

By adopting the scheme, in the playing process, the interactive data can be acquired according to the triggering operation of the interactive identification, and then the multi-angle free visual angle display is carried out, so that the user interaction experience is improved.

In a specific implementation, the multi-angle free view may refer to a spatial position and a view of a virtual viewpoint that enables a scene to be freely switched. And, the multi-angle free view range can be determined according to the needs of the application scenario. The multi-angle freeview may be a 6 degree of freedom (6 DoF) view.

In a specific implementation, the capturing device itself may have encoding and encapsulating functions, so that the original video data captured synchronously from the corresponding angle in real time may be encoded and encapsulated. Also, the acquisition device may have a compression function.

In a specific implementation, the server 73 is adapted to generate the multi-angle freeview data based on a plurality of received frame images corresponding to the specified frame time, the multi-angle freeview data including pixel data, depth data, and parameter data of the plurality of frame images, wherein an association exists between the pixel data and the depth data of each frame image.

In specific implementation, a plurality of acquisition devices in the acquisition array 71 can be placed at different positions of a field acquisition area according to a preset multi-angle free view range, the data processing device 72 can be placed in a field non-acquisition area or a cloud, and the server 73 can be placed in a field non-acquisition area, a cloud or a terminal access side.

In a specific implementation, the playing control device 74 is adapted to generate an interactive identifier associated with a corresponding video frame in the data stream to be played, based on a frame information time of the video frame captured by the data processing device 72.

In a specific implementation, the interactive terminal 75 is further adapted to switch to the data stream to be played, which is obtained from the play control device 74 in real time, and perform real-time play display when detecting the interaction end signal.

For those skilled in the art to better understand and implement the embodiment of the present invention, the data processing system is described in detail below through a specific application scenario, as shown in fig. 8, which is a schematic structural diagram of a data processing system in another application scenario in the embodiment of the present invention, and shows a basketball game playing application scenario, where a live site is a basketball game area on the left side, and the data processing system 80 may include: the system comprises an acquisition array 81 consisting of acquisition devices, a data processing device 82, a cloud server cluster 83, a play control device 84 and an interactive terminal 85.

The basketball hoop is used as a core viewpoint, the core viewpoint is used as a circle center, and a sector area which is positioned on the same plane with the core viewpoint can be used as a preset multi-angle free visual angle range. Correspondingly, each acquisition device in the acquisition array 81 can be arranged in different positions of the field acquisition area in a fan shape according to a preset multi-angle free visual angle range, and can synchronously acquire video data streams from corresponding angles in real time.

In particular implementations, the collection device may also be located in a ceiling area of a basketball court, on a basketball stand, or the like. The acquisition devices can be arranged and distributed along a straight line, a fan shape, an arc line, a circle or an irregular shape. The specific arrangement mode can be set according to one or more factors such as specific field environment, the number of the acquisition equipment, the characteristics of the acquisition equipment, imaging effect requirements and the like. The acquisition device may be any device having a camera function, such as a general camera, a mobile phone, a professional camera, and the like.

The data processing device 82 may be located in a field non-acquisition area in order not to affect the operation of the acquisition device. The data processing device 82 may send a stream pulling instruction to each acquisition device in the acquisition array 81 through a wireless local area network. Each acquisition device in the acquisition array 81 transmits the obtained video data stream to the data processing device 82 in real time based on the stream pulling instruction sent by the data processing device 82. Each acquisition device in the acquisition array 81 can transmit the obtained video data stream to the data processing device 82 through the switch 87 in real time. Each acquisition device can compress the acquired original video data in real time and transmit the compressed original video data to the data processing device in real time, so that the local area network transmission resources are further saved.

When the data processing device 82 receives a video frame capture instruction, capture a video frame at a specified frame time from a received multi-channel video data stream to obtain frame images corresponding to a plurality of video frames and frame time information corresponding to the specified frame time, upload the frame images at the specified frame time and the frame time information corresponding to the specified frame time to the cloud server cluster 83, and send the frame time information at the specified frame time to the play control device 84. The video frame intercepting instruction can be sent out manually by a user or can be automatically generated by data processing equipment.

The server may be placed in the cloud, and in order to process data in parallel more quickly, the server 83 in the cloud may be composed of a plurality of different servers or server groups according to the difference in processing data.

For example, the cloud server cluster 83 may include: first cloud server 831, second cloud server 832, third cloud server 833, and fourth cloud server 834. The first cloud server 831 may be configured to determine corresponding parameter data of the plurality of frame images; the second cloud server 832 may be configured to determine depth data for each of the plurality of frame images; the third cloud server 833 may perform frame image reconstruction on a preset virtual viewpoint path by using a DIBR algorithm based on the parameter data corresponding to the plurality of frame images, and the depth data and the pixel data of a preset frame image in the plurality of frame images; the fourth cloud server 834 may be configured to generate a multi-angle free-view video.

In a specific implementation, the multi-angle freeview video data may include: multi-angle free view spatial data and multi-angle free view temporal data of frame images ordered according to frame time. The interactive data may include multi-angle free view data, which may include pixel data and depth data of a plurality of frame images, and parameter data, with an association between the pixel data and the depth data of each frame image.

The server cluster 83 in the cloud may store the interactive data according to the specified frame time information.

The play control device 84 may generate an interaction identifier associated with the specified frame time according to the frame time information uploaded by the data processing device, and transmit the data stream to be played including the interaction identifier to the interaction terminal 85.

The interactive terminal 85 may play the display video in real time and display the interactive identifier at the corresponding video frame time based on the received data stream to be played. When an interactive identifier is triggered, the interactive terminal 85 may acquire the interactive data stored in the cloud server cluster 83 and corresponding to the designated frame time, so as to display the multi-angle free view image. When detecting the interaction end signal, the interactive terminal 85 may switch to obtain the data stream to be played from the play control device 84 in real time and perform real-time play and display.

Referring to the schematic diagram of another data processing system shown in FIG. 38, a data processing system 380 may include: an acquisition array 381, a data processing device 382, a play control device 383, and an interactive terminal 384; wherein:

the acquisition array 381 comprises a plurality of acquisition devices, the acquisition devices are arranged at different positions of a field acquisition area according to a preset multi-angle free visual angle range, and are suitable for synchronously acquiring multiple video data streams in real time and uploading the video data streams to the data processing device in real time;

the data processing device 382, for the uploaded multiple video data streams, is adapted to intercept, according to the received video frame interception instruction, the multiple video data streams at a specified frame time, obtain multiple frame images corresponding to the specified frame time and frame time information corresponding to the specified frame time, and send the frame time information at the specified frame time to the play control device 383;

the playing control device 383 is adapted to determine a specified frame time corresponding to the frame time information uploaded by the data processing device 382 in the data stream to be played, generate an interaction identifier associated with the specified frame time, and transmit the data stream to be played containing the interaction identifier to the interaction terminal 384;

The interactive terminal 384 is adapted to play and display a video including the interactive identifier in real time based on a received data stream to be played, acquire a plurality of frame images corresponding to a specified frame time of the interactive identifier from the data processing device 382 based on a trigger operation on the interactive identifier, generate interactive data for interaction based on the plurality of frame images, and then perform multi-angle free view image display, where the interactive data includes multi-angle free view data.

In specific implementation, the data processing device may be flexibly deployed according to user requirements, for example, the data processing device may be placed in a field non-acquisition area or a cloud.

By adopting the data processing system, in the playing process, interactive data can be acquired according to the triggering operation of the interactive identification, and then multi-angle free visual angle display is carried out, so that the user interaction experience is improved.

The embodiment of the present invention further provides a terminal corresponding to the data interaction method, so that those skilled in the art can better understand and implement the embodiment of the present invention, and the detailed description is given below through specific embodiments with reference to the accompanying drawings.

Referring to the structural diagram of the interactive terminal shown in fig. 9, the interactive terminal 90 may include:

A data stream obtaining unit 91, adapted to obtain a data stream to be played in real time from a play control device, where the data stream to be played includes video data and an interactive identifier, and the interactive identifier is associated with a specified frame time of the data stream to be played;

the playing and displaying unit 92 is suitable for playing and displaying the video and the interactive identification of the data stream to be played in real time;

an interactive data acquiring unit 93, adapted to respond to a trigger operation on the interactive identifier, and acquire interactive data corresponding to the specified frame time, where the interactive data includes multi-angle free view data;

an interactive presentation unit 94 adapted to perform an image presentation of the multi-angle free view at the specified frame time based on the interactive data;

the switching unit 95 is adapted to trigger to switch to the data stream to be played, which is obtained by the data stream obtaining unit 91 from the playing control device in real time, and to perform real-time playing and displaying by the playing and displaying unit 92 when detecting the interaction ending signal.

The interactive data may be generated by the server and transmitted to the interactive terminal, or may be generated by the interactive terminal.

The interactive terminal can acquire the data stream to be played from the playing control equipment in real time in the process of playing the video, and can display the corresponding interactive identification at the corresponding frame time. In specific implementation, as shown in fig. 4, a schematic view of an interactive interface of an interactive terminal in an embodiment of the present invention is shown.

The interactive terminal 40 obtains the data stream to be played from the play control device in real time, and when the real-time play display is performed to the 1 st frame time T1, the first interactive mark 42 may be displayed on the progress bar 41, and when the real-time play display is performed to the second frame time T2, the second interactive mark 43 may be displayed on the progress bar. Wherein, the black part of the progress bar is the played part, and the white part is the non-played part.

The trigger operation may be a trigger operation input by a user, or may be a trigger operation automatically generated by the interactive terminal, for example, the interactive terminal may automatically initiate the trigger operation when detecting that the identifier of the multi-angle free viewpoint data frame exists. When the user manually triggers, the interaction terminal may select the time information of triggering the interaction after displaying the interaction prompt information, or the interaction terminal may receive the historical time information of triggering the interaction by the user operation, where the historical time information may be the time information before the current playing time.

With reference to fig. 4, fig. 7 and fig. 9, when the system of the interactive terminal reads the corresponding interactive mark 43 on the progress bar 41, the interactive prompt information may be displayed, and when the user does not select the trigger, the interactive terminal 40 may continue to read the subsequent video data, and the played part of the progress bar 41 continues to advance. When the user selects triggering, the interactive terminal 40 receives the feedback and generates an image reconstruction instruction of the corresponding interactive identifier at the specified frame time, and sends the image reconstruction instruction to the server 73.

For example, when the user selects to trigger the current interactive identifier 43, the interactive terminal 40 receives the feedback, generates an image reconstruction instruction of the interactive identifier 43 corresponding to the specified frame time T2, and sends the image reconstruction instruction to the server 73. The server can send interactive data corresponding to the appointed frame time T2 according to the image reconstruction instruction.

The user may also select a trigger history interaction identifier, for example, the interaction identifier 42 displayed by the played part 41a on the trigger progress bar, when viewing, the interaction terminal 40 receives the feedback, generates an image reconstruction instruction of the interaction identifier 42 corresponding to the designated frame time T1, and sends the image reconstruction instruction to the server 73. The server can send interactive data corresponding to the appointed frame time T1 according to the image reconstruction instruction. The interactive terminal 40 may perform image processing on the multi-angle free view data of the interactive data by using an image reconstruction algorithm, and then perform image display of the multi-angle free view at the specified frame time. If the appointed frame time is a frame time, displaying a static image of a multi-angle free visual angle; and if the appointed frame time corresponds to a plurality of frame times, displaying the dynamic image of the multi-angle free visual angle.

With reference to fig. 4, fig. 38 and fig. 9, when the system of the interactive terminal reads the corresponding interactive mark 43 on the progress bar 41, the interactive prompt message may be displayed, and when the user does not select the trigger, the interactive terminal 40 may continue to read the subsequent video data, and the played part of the progress bar 41 continues to advance. When the user selects the trigger, the interactive terminal 40 receives the feedback and generates an image reconstruction instruction at the specified frame time of the corresponding interactive identifier, and sends the image reconstruction instruction to the data processing device 382.

For example, when the user selects to trigger the current interactive identifier 43, the interactive terminal 40 receives the feedback, generates an image reconstruction instruction of the interactive identifier 43 corresponding to the designated frame time T2, and sends the image reconstruction instruction to the data processing device. The data processing device 382 may transmit a plurality of frame images corresponding to the specified frame time T2 according to the image reconstruction instruction.

The user may also select a trigger history interaction identifier, for example, the interaction identifier 42 displayed by the played part 41a on the trigger progress bar, when watching, the interaction terminal 40 receives the feedback, generates an image reconstruction instruction of the interaction identifier 42 corresponding to the designated frame time T1, and sends the image reconstruction instruction to the data processing device. The data processing device can transmit a plurality of frame images corresponding to the specified frame time T1 according to the image reconstruction instruction.

The interactive terminal 40 may generate interactive data for performing interaction based on the plurality of frame images, perform image processing on the multi-angle free view data of the interactive data by using an image reconstruction algorithm, and then perform image display of the multi-angle free view at the specified frame time. If the appointed frame time is a frame time, displaying a static image of a multi-angle free visual angle; and if the appointed frame time corresponds to a plurality of frame times, displaying the dynamic image of the multi-angle free visual angle.

In specific implementation, the interactive terminal according to The embodiment of The present invention may be an electronic device with a touch screen function, a head mounted Virtual Reality (VR) terminal, an edge node device connected to a display, and an IoT (The Internet of Things) device with a display function.

As shown in fig. 40, which is a schematic view of an interaction interface of another interaction terminal in the embodiment of the present invention, the interaction terminal is an electronic device 400 having a touch screen function, and when a corresponding interaction identifier 402 on a progress bar 401 is read, an interaction prompt information box 403 may be displayed on the interface of the electronic device 400. The user may select according to the content of the interaction prompt information box 403, when the user performs a trigger operation of selecting "yes", the electronic device 400 may generate an image reconstruction instruction at an interaction frame time corresponding to the interaction identifier 402 after receiving the feedback, and when the user performs a non-trigger operation of selecting "no", the electronic device 400 may continue to read subsequent video data.

As shown in fig. 41, which is a schematic view of an interaction interface of another interaction terminal according to an embodiment of the present invention, the interaction terminal is a head mounted VR terminal 410, and when a corresponding interaction identifier 412 on a progress bar 411 is read, the interface of the head mounted VR terminal 410 may display an interaction prompt information box 413. The user may select according to the content of the interaction prompt information box 413, when the user performs a trigger operation (for example, nodding the head) of selecting "yes", the head-mounted VR terminal 410 may generate an image reconstruction instruction at an interaction frame time corresponding to the interaction identifier 412 after receiving the feedback, and when the user performs a non-trigger operation (for example, shaking the head) of selecting "no", the head-mounted VR terminal 410 may continue to read subsequent video data.

As shown in fig. 42, which is a schematic view of an interaction interface of another interaction terminal in the embodiment of the present invention, the interaction terminal is an edge node device 421 connected to the display 420, and when the edge node device 421 reads a corresponding interaction identifier 423 on the progress bar 422, the display 420 may display an interaction prompt information frame 424. The user may select according to the content of the interaction prompt information box 424, when the user performs a trigger operation of selecting "yes", the edge node device 421 may generate an image reconstruction instruction at an interaction frame time corresponding to the interaction identifier 423 after receiving the feedback, and when the user performs a non-trigger operation of selecting "no", the edge node device 421 may continue to read subsequent video data.

In a specific implementation, the interactive terminal may establish a communication connection with at least one of the data processing device and the server, and may adopt a wired connection or a wireless connection.

Fig. 43 is a schematic connection diagram of an interactive terminal in the embodiment of the present invention. The edge node device 430 establishes wireless connections with the

interactive devices

431, 432, and 433 through the internet of things.

In a specific implementation, after the interactive identifier is triggered, the interactive terminal may perform image display of a multi-angle free view at a specified frame time corresponding to the triggered interactive identifier, and determine virtual viewpoint position information based on an interactive operation, as shown in fig. 44, which is an interactive operation diagram of the interactive terminal in an embodiment of the present invention, a user may perform horizontal operation or vertical operation on an interactive operation interface, and an operation trajectory may be a straight line or a curved line.

In specific implementation, as shown in fig. 45, a schematic view of an interactive interface of another interactive terminal in the embodiment of the present invention is shown. And after the user clicks the interactive identification, the interactive terminal acquires interactive data of the interactive identification at the appointed frame time.

If the user does not take new operation, the triggering operation is interactive operation, and the corresponding first virtual viewpoint can be determined according to the image visual angle displayed during interactive operation. If the user takes a new operation, the new operation is an interactive operation, and the corresponding first virtual viewpoint can be determined according to the image viewing angle displayed during the interactive operation.

Then, the images corresponding to the virtual viewpoints may be sequentially displayed in the order of the preset virtual viewpoints from the first virtual viewpoint. If the specified frame time corresponds to the same frame time, the obtained multi-angle free view video data can comprise multi-angle free view spatial data of images sequenced according to the frame time, and static images of the multi-angle free view can be displayed; if the specified frame time corresponds to different frame times, the obtained multi-angle free view video data may include multi-angle free view spatial data and multi-angle free view time data of frame images ordered according to frame time, and may display a dynamic image of a multi-angle free view, that is, a frame image of a video frame of a multi-angle free view is displayed.

In an embodiment of the present invention, refer to fig. 45 and 46. The multi-angle free view video data obtained by the interactive terminal may include multi-angle free view spatial data and multi-angle free view temporal data of frame images ordered according to frame time, the user slides horizontally to the right to generate an interactive operation, determines a corresponding first virtual viewpoint, and since different virtual viewpoints may correspond to different multi-angle free view spatial data and multi-angle free view temporal data, as shown in fig. 46, a frame image displayed in an interactive interface changes in time and space with the interactive operation, the content displayed by the frame image changes from a runner of fig. 45 to a finish line to a runner of fig. 46 about to cross the finish line, and the view displayed by the frame image changes from a left view to a front view with the runner as a target object.

Similarly, fig. 45 and 47 may be taken, the content of the frame image presentation changes from the player of fig. 45 running towards the finish line to the player of fig. 47 having crossed the finish line, and the perspective of the frame image presentation changes from a left view to a right view with the player as the target object.

Similarly, as can be seen in fig. 45 and 48, the user slides vertically upward to cause an interaction that changes the content of the frame image presentation from the player in fig. 45 running toward the finish line to the player in fig. 48 having passed the finish line and, with the player as the target object, the perspective of the frame image presentation changes from a left view to a top view.

It can be understood that different interactive operations can be obtained according to the operation of the user, and a corresponding first virtual viewpoint can be determined according to the view angle of the image played and displayed during the interactive operation; according to the obtained multi-angle free view video data, a static image or a dynamic image of the multi-angle free view can be displayed, and the embodiment of the invention is not limited.

In a specific implementation, the interaction data may further include at least one of: the system comprises field analysis data, information data of a collected object, information data of equipment related to the collected object, information data of an article deployed on the field, and information data of a logo displayed on the field.

In an embodiment of the present invention, as shown in fig. 10, an interactive interface schematic diagram of another interactive terminal in an embodiment of the present invention is shown. After triggering the interactive identifier, the interactive terminal 100 may perform image presentation of a multi-angle free view at a specified frame time corresponding to the triggered interactive identifier, and may superimpose live analysis data on an image (not shown), as shown by the live analysis data 101 in fig. 10.

In an embodiment of the present invention, as shown in fig. 11, an interactive interface schematic diagram of another interactive terminal in the embodiment of the present invention is shown. After the user triggers the interactive identifier, the interactive terminal 110 may perform image display of the multi-angle free view at the specified frame time corresponding to the triggered interactive identifier, and may superimpose the information data of the collection object on an image (not shown), as shown by the information data 111 of the collection object in fig. 11.

In an embodiment of the present invention, as shown in fig. 12, an interactive interface schematic diagram of another interactive terminal in the embodiment of the present invention is shown. After the user triggers the interactive identifier, the interactive terminal 120 may perform image display of a multi-angle free view at a designated frame time corresponding to the triggered interactive identifier, and may superimpose information data of the collection object on an image (not shown), as shown in information data 121-123 of the collection object in fig. 12.

In an embodiment of the present invention, as shown in fig. 13, an interactive interface schematic diagram of another terminal in the embodiment of the present invention is shown. After the user triggers the interactive identifier, the interactive terminal 130 may perform image display of a multi-angle free view at a specified frame time corresponding to the triggered interactive identifier, and may superimpose information data of an article deployed in the field on an image (not shown), as shown by the information data 131 of the file package in fig. 13.

In an embodiment of the present invention, as shown in fig. 14, an interactive interface schematic diagram of another terminal in the embodiment of the present invention is shown. After triggering the interactive identifier, the interactive terminal 140 may perform image presentation of a multi-angle free view at a designated frame time corresponding to the triggered interactive identifier, and may superimpose information data of a logo to be presented on the scene on an image (not shown), as shown by logo information data 141 in fig. 14.

Therefore, the user can acquire more associated interactive information through the interactive data, and can know the watched content more deeply, comprehensively and professionally, so that the interactive experience of the user can be further enhanced.

Referring to fig. 39, which is a schematic structural diagram of another interactive terminal, the interactive terminal 390 may include: processor 391, network component 392, memory 393 and display unit 394; wherein:

The processor 391 is adapted to obtain a data stream to be played in real time through the network component 392, and obtain, in response to a trigger operation on an interaction identifier, interactive data at a specified frame time corresponding to the interaction identifier, where the data stream to be played includes video data and the interaction identifier, the interaction identifier is associated with the specified frame time of the data stream to be played, and the interactive data includes multi-angle free view data;

the memory 393 is suitable for storing the data stream to be played acquired in real time;

the display unit 394 is adapted to display the video and the interactive identifier of the to-be-played data stream in real time based on the to-be-played data stream acquired in real time, and display the multi-angle free view image at the specified frame time based on the interactive data.

The interactive terminal 390 may obtain the interactive data at the specified frame time from a server storing the interactive data, or may obtain a plurality of frame images corresponding to the specified frame time from a data processing device storing the frame images, and then generate corresponding interactive data.

In order to make the embodiment of the present invention better understood and realized by those skilled in the art, the following describes the processing scheme of the multi-angle freeview video image on the scene side in further detail.

Referring to the flowchart of the data processing method shown in fig. 15, in the embodiment of the present invention, the method may specifically include the following steps:

and S151, when the sum of the code rates of the compressed video data streams pre-transmitted by the acquisition devices in the acquisition array is determined to be not more than a preset bandwidth threshold, respectively sending a stream pulling instruction to the acquisition devices in the acquisition array, wherein the acquisition devices in the acquisition array are arranged at different positions of a field acquisition area according to a preset multi-angle free visual angle range.

The multi-angle free view may refer to a spatial position and a view of a virtual viewpoint that enables a scene to be freely switched. And, the multi-angle free view range can be determined according to the needs of the application scenario.

In a specific implementation, the preset bandwidth threshold may be determined according to a transmission capability of a transmission network in which each acquisition device in the acquisition array is located. For example, if the uplink bandwidth of the transmission network is 1000Mbps, the preset bandwidth value may be 1000Mbps.

And S152, receiving a compressed video data stream transmitted by each acquisition device in the acquisition array in real time based on the stream pulling instruction, wherein the compressed video data stream is obtained by real-time synchronous acquisition and data compression from a corresponding angle by each acquisition device in the acquisition array.

In a specific implementation, the capture device itself may have encoding and packaging functions, so as to encode and package original Video data captured synchronously in Real time from a corresponding angle, where the package Format adopted by the capture device may be any one of AVI, quickTime File Format, MPEG, WMV, real Video, flash Video, matroska, or other package formats, and the encoding Format adopted by the capture device may be h.261, h.263, h.264, h.265, MPEG, AVS, or other encoding formats. Moreover, the acquisition equipment can have a compression function, the higher the compression ratio is, the smaller the compressed data amount can be under the condition that the data amount before compression is the same, and the bandwidth pressure of real-time synchronous transmission can be relieved, so that the acquisition equipment can adopt the technologies of predictive coding, transform coding, entropy coding and the like to improve the compression ratio of the video.

By adopting the data processing method, whether the transmission bandwidths are matched is determined before the stream is pulled, so that data transmission congestion in the stream pulling process can be avoided, data acquired by each acquisition device and obtained by data compression can be synchronously transmitted in real time, the processing speed of multi-angle free visual angle video data is increased, low-delay playing of multi-angle free visual angle videos is realized under the condition that bandwidth resources and data processing resources are limited, and the implementation cost is reduced.

In specific implementation, the sum of the code rates of the compressed video data streams pre-transmitted by the acquisition devices in the acquisition array is not greater than the preset bandwidth threshold value through calculation by acquiring the numerical values of the parameters of the acquisition devices. For example, the acquisition array may include 40 acquisition devices, and the bitrate of the compressed video data stream of each acquisition device is 15Mbps, the bitrate of the entire acquisition array is 15 × 40=600mbps, and if the preset bandwidth threshold is 1000Mbps, it is determined that the sum of the bitrates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is not greater than the preset bandwidth threshold. Then, a stream pulling instruction can be sent to each acquisition device according to the IP addresses of 40 acquisition devices in the acquisition array.

In specific implementation, in order to ensure that the values of the parameters of the acquisition devices in the acquisition array are uniform, so that the acquisition devices can synchronously acquire and compress data in real time, before the stream pulling instruction is sent to the acquisition devices in the acquisition array, the values of the parameters of the acquisition devices in the acquisition array may be set. Wherein the parameters of the acquisition device may include: and acquiring parameters and compression parameters, wherein the sum of code rates of compressed video data streams acquired by each acquisition device in the acquisition array from corresponding angles in real time and synchronously acquiring and compressing data is not more than a preset bandwidth threshold according to the set numerical value of the parameter of each acquisition device.

Because the acquisition parameters and the compression parameters supplement each other, under the condition that the numerical values of the compression parameters are not changed, the data size of the original video data can be reduced by setting the numerical values of the acquisition parameters, so that the time for data compression processing is shortened; under the condition that the numerical value of the acquisition parameter is unchanged, the data volume after compression can be correspondingly reduced by setting the numerical value of the compression parameter, so that the data transmission time is shortened. For another example, setting a higher compression rate may save transmission bandwidth, and setting a lower sampling rate may also save transmission bandwidth. Therefore, the acquisition parameters and/or the compression parameters can be set according to actual conditions.

Therefore, before the stream is pulled, the numerical values of the parameters of the acquisition equipment in the acquisition array can be set, the numerical values of the parameters of the acquisition equipment in the acquisition array are ensured to be uniform, the acquisition equipment can synchronously acquire and compress data from corresponding angles in real time, and the sum of the obtained code rates of compressed video data streams is not greater than a preset bandwidth threshold value, so that network congestion can be avoided, and low-delay playing of multi-angle free visual angle videos can be realized under the condition of limited bandwidth resources.

In a specific embodiment, the acquisition parameters may include a focal length parameter, an exposure parameter, a resolution parameter, an encoding rate parameter, an encoding format parameter, and the like, the compression parameters may include a compression ratio parameter, a compression format parameter, and the like, and the numerical values most suitable for the transmission network where each acquisition device is located are obtained by setting numerical values of different parameters.

In order to simplify the setting process and save the setting time, before setting the values of the parameters of the acquisition devices in the acquisition array, it may be determined whether the sum of the code rates of the compressed video data streams acquired by the acquisition devices in the acquisition array according to the set values of the parameters and obtained by data compression is greater than a preset bandwidth threshold, and when the sum of the code rates of the obtained compressed video data streams is greater than the preset bandwidth threshold, before sending a stream pulling instruction to the acquisition devices in the acquisition array, the values of the parameters of the acquisition devices in the acquisition array may be set. It is understood that, in the specific implementation, the values of the acquisition parameters and the values of the compression parameters may also be set according to the imaging quality requirements, such as the resolution of the multi-angle free view image, which is to be displayed.

In a specific implementation, the process from transmission to writing-in of the compressed video data streams obtained by each acquisition device occurs continuously, and therefore, before sending a stream pulling instruction to each acquisition device in the acquisition array, it may be further determined whether a sum of code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than a preset writing-in speed threshold, and when the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than the preset writing-in speed threshold, a value of a parameter of each acquisition device in the acquisition array may be set, so that the sum of the code rates of the compressed video data streams obtained by each acquisition device in the acquisition array synchronously acquiring and compressing data from a corresponding angle in real time according to the set value of the parameter of each acquisition device is not greater than the preset writing-in speed threshold.

In a specific implementation, the preset writing speed threshold may be determined according to the data storage writing speed of the storage medium. For example, if the upper limit of the data storage write speed of a Solid State Disk (SSD) of the data processing apparatus is 100Mbps, the preset write speed threshold may be 100Mbps.

By adopting the scheme, before the stream is pulled, the sum of the code rates of the compressed video data streams acquired by the acquisition equipment from the corresponding angles in real time and data compression can be ensured not to be larger than the preset writing speed threshold value, so that data writing congestion can be avoided, the smooth link of the compressed video data streams in the acquisition, transmission and writing processes can be ensured, the compressed video streams uploaded by the acquisition equipment can be processed in real time, and the playing of the multi-angle free visual angle video can be realized.

In a specific implementation, the compressed video data streams obtained by the respective acquisition devices may be stored. When a video frame intercepting instruction is received, video frames with synchronous frame levels in each compressed video data stream can be intercepted according to the received video frame intercepting instruction, and the intercepted video frames are synchronously uploaded to the specified target end.

The specified target end may be a preset target end, or a target end specified by the video frame capture instruction. And the intercepted video frame can be packaged, uploaded to the appointed target end through a network transmission protocol, and analyzed to obtain a corresponding video frame with frame level synchronization in the compressed video data stream.

Therefore, the subsequent processing of the video frame intercepted by the compressed video data stream is delivered to the specified target end, so that the network transmission resource can be saved, the pressure and the difficulty of deploying a large number of server resources on site can be reduced, the data processing load can be greatly reduced, and the transmission delay of the multi-angle free visual angle video frame can be shortened.

In a specific implementation, in order to ensure that video frames synchronized at the frame level in each compressed video data stream are intercepted, as shown in fig. 16, the following steps may be included:

s161, determining one of the compressed video data streams of each acquisition device in the acquisition array received in real time as a reference data stream;

s162, determining a video frame to be intercepted in the reference data stream based on the received video frame intercepting instruction, and selecting video frames in other compressed video data streams which are synchronous with the video frame to be intercepted in the reference data stream as the video frames to be intercepted of the other compressed video data streams;

And S163, intercepting the video frames to be intercepted in each compressed video data stream.

In order to make the embodiment of the present invention better understood and realized by those skilled in the art, the following describes in detail how to determine the video frames to be intercepted in each compressed video data stream through a specific application scenario.

In an embodiment of the present invention, the acquisition array may include 40 acquisition devices, and therefore, 40 paths of compressed video data streams may be received in real time, and it is assumed that, in the compressed video data streams of the acquisition devices in the acquisition array received in real time, the compressed video data stream A1 corresponding to the acquisition device A1' is determined as a reference data stream, then, based on the characteristic information X of the object in the video frame that is indicated to be intercepted in the received video frame interception instruction, the video frame A1 in the reference data stream, which is consistent with the characteristic information X of the object, is determined as a video frame to be intercepted, and then, according to the characteristic information X1 of the object in the video frame A1 to be intercepted in the reference data stream, the video frames A2-a40 in the remaining compressed video data streams A2-a40, which are consistent with the characteristic information X1 of the object, are selected as video frames to be intercepted of the remaining compressed video data streams.

The feature information of the object may include at least one of shape feature information, color feature information, and position feature information. The feature information X of the object in the video frame to be captured in the reference data stream and the feature information X1 of the object in the video frame a1 to be captured in the reference data stream may be the same representation manner for the feature information of the same object, for example, the feature information X and X1 of the object are both two-dimensional feature information; the feature information X of the object and the feature information X1 of the object may be represented differently from each other for the same object, for example, the feature information X of the object may be two-dimensional feature information, and the feature information X1 of the object may be three-dimensional feature information. Moreover, a similarity threshold may be preset, and when the similarity threshold is satisfied, the feature information X of the object may be considered to be consistent with X1, or the feature information X1 of the object may be considered to be consistent with the feature information X2-X40 of the object in the other compressed video data streams A2-a 40.

The specific representation mode and the similar threshold of the feature information of the object can be determined according to the preset multi-angle free visual angle range and the scene of the scene, and the embodiment of the invention is not limited at all.

In another embodiment of the present invention, the acquisition array may include 40 acquisition devices, and therefore, 40 paths of compressed video data streams may be received in real time, and it is assumed that, in the compressed video data streams of the acquisition devices in the acquisition array received in real time, a compressed video data stream B1 corresponding to an acquisition device B1' is determined as a reference data stream, then, based on timestamp information Y indicating an intercepted video frame in a received video frame interception instruction, a video frame B1 corresponding to the timestamp information Y in the reference data stream is determined as a video frame to be intercepted, and then, according to the timestamp information Y1 in the video frame B1 to be intercepted in the reference data stream, a video frame B2-B40 consistent with the timestamp information Y1 in each of the remaining compressed video data streams B2-B40 is selected as a video frame to be intercepted of each of the remaining compressed video data streams.

The timestamp information Y indicating the captured video frame in the video frame capture instruction may have a certain error with the timestamp information Y1 in the video frame b1 to be captured in the reference data stream, for example, the timestamp information corresponding to the video frame in the reference data stream is not consistent with the timestamp information Y, and has an error of 0.1ms, so that an error range may be preset, for example, the error range is ± 1ms, and the error of 0.1ms is within the error range, so that the video frame b1 corresponding to the timestamp information Y1 which has a difference of 0.1ms from the timestamp information Y may be selected as the video frame to be captured in the reference data stream. The specific error range and the selection rule of the timestamp information y1 in the reference data stream may be determined according to the field acquisition device and the transmission network, which is not limited in this embodiment.

It can be understood that, in the above embodiments, the method for determining the video frame to be intercepted in each compressed video data stream may be used alone or simultaneously, and the embodiments of the present invention are not limited thereto.

By using the data processing method, the data processing equipment can smoothly and smoothly pull the data acquired by each acquisition equipment and compressed by the data.

The technical solution of performing data processing on the acquisition array in the embodiment of the present specification will be clearly and completely described below with reference to the drawings in the embodiment of the present specification.

Referring to the flowchart of the data processing method shown in fig. 17, in an embodiment of the present invention, the method may specifically include the following steps:

s171, synchronously acquiring original video data from corresponding angles respectively by the acquisition equipment arranged at different positions of a field acquisition area in the acquisition array according to a preset multi-angle free visual angle range, and performing real-time data compression on the acquired original video data respectively to obtain corresponding compressed video data streams.

And S172, when the data processing equipment connected with the acquisition array link determines that the sum of the code rates of the compressed video data streams pre-transmitted by the acquisition equipment in the acquisition array is not greater than a preset bandwidth threshold, respectively sending a stream pulling instruction to the acquisition equipment in the acquisition array.

In specific implementation, the preset bandwidth threshold may be determined according to transmission capability of a transmission network in which each acquisition device in the acquisition array is located, for example, if the uplink bandwidth of the transmission network is 1000Mbps, the preset bandwidth value may be 1000Mbps.

And S173, each acquisition device in the acquisition array transmits the obtained compressed video data stream to the data processing device in real time based on the stream pulling instruction.

In a specific implementation, the data processing device may be set according to actual situations. For example, when there is suitable space on site, the data processing device may be placed in a non-acquisition area on site as an on-site server; when no suitable space exists in the field, the data processing device can be arranged in the cloud end and used as a cloud end server.

By adopting the scheme, when the data processing equipment connected with the acquisition array link determines that the sum of the code rates of the compressed video data streams pre-transmitted by the acquisition equipment in the acquisition array is not greater than the preset bandwidth threshold, the data processing equipment respectively sends a stream pulling instruction to the acquisition equipment in the acquisition array, so that the data acquired by the acquisition equipment and obtained by data compression can be synchronously transmitted in real time, real-time stream pulling can be carried out through a transmission network where the data processing equipment is located, and data transmission congestion in a stream pulling process can be avoided; and then, each acquisition device in the acquisition array transmits the obtained compressed video data stream to the data processing device in real time based on the stream pulling instruction, and the data transmitted by each acquisition device is compressed, so that the bandwidth pressure of real-time synchronous transmission can be relieved, and the processing speed of the multi-angle free visual angle video data is accelerated.

From this, can avoid arranging a large amount of servers in the scene and carry out data processing, also need not to gather the raw data of gathering through the SDI collection card, the calculation server in the rethread field computer lab processes raw data, can avoid adopting expensive SDI video transmission cable and SDI interface, but carries out data transmission through ordinary transmission network, can realize the low time delay broadcast of multi-angle free visual angle video under the limited condition of bandwidth resource and data processing resource, reduces implementation cost.

In specific implementation, in order to simplify a setting process and save setting time, before setting values of parameters of each acquisition device in the acquisition array, the data processing device may determine whether a sum of code rates of compressed video data streams acquired by each acquisition device in the acquisition array according to the set values of the parameters and obtained by data compression is greater than a preset bandwidth threshold, and when the sum of the code rates of the obtained compressed video data streams is greater than the preset bandwidth threshold, the data processing device may set the values of the parameters of each acquisition device in the acquisition array, and then send a stream pulling instruction to each acquisition device in the acquisition array.

In a specific implementation, the process from transmission to writing of the compressed video data streams obtained by each acquisition device occurs continuously, and it is also necessary to ensure that the compressed video data streams obtained by each acquisition device are written into by the data processing device smoothly, so that before sending a stream pulling instruction to each acquisition device in the acquisition array, the data processing device may further determine whether the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than a preset writing speed threshold, and when the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than the preset writing speed threshold, the data processing device may set the value of the parameter of each acquisition device in the acquisition array, so that the sum of the code rates of the compressed video data streams obtained by real-time synchronous acquisition and data compression from a corresponding angle is not greater than the preset writing speed threshold by each acquisition device in the acquisition array according to the set value of the parameter of each acquisition device in the acquisition array.

In a specific implementation, the preset writing speed threshold may be determined according to a data storage writing speed of the data processing device.

In a specific implementation, data transmission between each acquisition device in the acquisition array and the data processing device may be performed in at least one of the following manners:

1. Data transmission is carried out through the switch;

the switch is used for collecting all the collection devices in the array and connecting the collection devices with the data processing device, the switch can collect and uniformly transmit compressed video data streams of more collection devices to the data processing device, and the number of ports supported by the data processing device can be reduced. For example, the switch supports 40 inputs, so that the data processing device can simultaneously receive the video stream of a capture array composed of 40 capture devices at most through the switch, thereby reducing the number of the data processing devices.

2. And carrying out data transmission through the local area network.

The local area network can transmit the compressed video data stream of the acquisition equipment to the data processing equipment in real time, so that the number of ports supported by the data processing equipment is reduced, and the number of the data processing equipment can be reduced.

In specific implementation, the data processing device may store (may be a cache) the compressed video data streams obtained by the respective acquisition devices, and when a received video frame capture instruction is received, the data processing device may capture video frames of frame-level synchronization in the respective compressed video data streams according to the received video frame capture instruction, and synchronously upload the captured video frames to the designated target end.

The data processing device may establish a connection with a target terminal through a port or an IP address in advance, or may upload the captured video frame to a port or an IP address specified by the video frame capture instruction in synchronization. And the data processing device can firstly encapsulate the intercepted video frame, upload the video frame to the appointed target end through a network transmission protocol, and then analyze the video frame to obtain the frame-level synchronous video frame in the corresponding compressed video data stream.

By adopting the scheme, the compressed video data streams acquired by the acquisition devices in the acquisition array in real time and synchronously acquired and compressed by data compression can be uniformly transmitted to the data processing device, after the data processing device receives the video frame interception instruction, the video frames in frame level synchronization in the intercepted compressed video data streams can be synchronously uploaded to the designated target end through the initial processing of dotting and frame interception, and the subsequent processing of the video frames intercepted by the compressed video data streams is handed over to the designated target end, so that the network transmission resources can be saved, the pressure and difficulty of field deployment can be reduced, the data processing load can be greatly reduced, and the transmission delay of the multi-angle free visual angle video frames can be shortened.

In a specific implementation, in order to capture a frame-level synchronized video frame in each compressed video data stream, the data processing device may first determine, as a reference data stream, one of the compressed video data streams of each capture device in the capture array received in real time, then, the data processing device may determine, based on a received video frame capture instruction, a video frame to be captured in the reference data stream, and select, as a video frame to be captured in the remaining compressed video data streams, a video frame in the remaining compressed video data streams that is synchronized with the video frame to be captured in the reference data stream, and finally, the data processing device captures the video frame to be captured in each compressed video data stream. For a specific frame truncation method, reference may be made to the example of the foregoing embodiment, and details are not described here.

The embodiment of the present invention further provides a data processing device corresponding to the data processing method in the foregoing embodiment, so that those skilled in the art can better understand and implement the embodiment of the present invention, which is described in detail below with reference to the accompanying drawings by using specific embodiments.

Referring to the schematic structural diagram of the data processing apparatus shown in fig. 18, in the embodiment of the present invention, as shown in fig. 18, the data processing apparatus 180 may include:

The first transmission matching unit 181 is adapted to determine whether the sum of the code rates of the compressed video data streams pre-transmitted by the acquisition devices in the acquisition array is not greater than a preset bandwidth threshold, wherein the acquisition devices in the acquisition array are placed at different positions in a field acquisition area according to a preset multi-angle free view range.

And the instruction sending unit 182 is adapted to send a stream pulling instruction to each acquisition device in the acquisition array when it is determined that the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is not greater than the preset bandwidth threshold.

And the data stream receiving unit 183 is adapted to receive a compressed video data stream transmitted by each acquisition device in the acquisition array in real time based on the stream pulling instruction, where the compressed video data stream is obtained by real-time synchronous acquisition and data compression from a corresponding angle by each acquisition device in the acquisition array.

By adopting the data processing equipment, before the stream pulling instruction is sent to each acquisition equipment in the acquisition array, whether the transmission bandwidths are matched or not is determined, and data transmission congestion in the stream pulling process can be avoided, so that data acquired by each acquisition equipment and obtained by data compression can be synchronously transmitted in real time, the processing speed of multi-angle free visual angle video data is increased, the multi-angle free visual angle video is realized under the condition of limited bandwidth resources and data processing resources, and the implementation cost is reduced.

In an embodiment of the present invention, as shown in fig. 18, the data processing device 180 may further include:

the first parameter setting unit 184 is adapted to set a numerical value of a parameter of each acquisition device in the acquisition array before sending a stream pulling instruction to each acquisition device in the acquisition array, respectively;

wherein the parameters of the acquisition device may include: and acquiring parameters and compression parameters, wherein the sum of code rates of compressed video data streams acquired by each acquisition device in the acquisition array from corresponding angles in real time and synchronously acquiring and compressing data is not more than a preset bandwidth threshold according to the set numerical value of the parameter of each acquisition device.

In an embodiment of the present invention, in order to simplify the setting process and save the setting time, as shown in fig. 18, the data processing device 180 may further include:

the second transmission matching unit 185 is adapted to determine, before setting the values of the parameters of the acquisition devices in the acquisition array, whether the sum of the code rates of the compressed video data streams acquired by the acquisition devices in the acquisition array according to the set values of the parameters and obtained by data compression is not greater than a preset bandwidth threshold.

The write matching unit 186 is adapted to determine whether the sum of the code rates of the compressed video data streams pre-transmitted by the acquisition devices in the acquisition array is greater than a preset write speed threshold;

a second parameter setting unit 187, adapted to set a value of a parameter of each acquisition device in the acquisition array when a sum of code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than a preset writing speed threshold, so that the sum of the code rates of the compressed video data streams obtained by real-time synchronous acquisition and data compression from corresponding angles by each acquisition device in the acquisition array is not greater than the preset writing speed threshold according to the set value of the parameter of each acquisition device.

Therefore, before the stream pulling is started, the sum of the code rates of the compressed video data streams acquired by the acquisition devices from the corresponding angles in real time and in synchronous acquisition and data compression can be ensured to be not more than the preset writing speed threshold value, so that data writing congestion can be avoided, the smooth link of the compressed video data streams in the acquisition, transmission and writing processes can be ensured, the compressed video streams uploaded by the acquisition devices can be processed in real time, and the playing of the multi-angle free visual angle video can be realized.

a frame-cut processing unit 188 adapted to cut out frame-level synchronized video frames in each compressed video data stream according to the received video frame-cut instruction;

an uploading unit 189, adapted to upload the captured video frames to the designated target end synchronously.

The specified target end may be a preset target end, or a target end specified by the video frame capture instruction.

Therefore, the subsequent processing of the video frame intercepted by the compressed video data stream is delivered to the specified target end, so that the network transmission resource can be saved, the pressure and the difficulty of field deployment can be reduced, the data processing load can be greatly reduced, and the transmission delay of the multi-angle free visual angle video frame can be shortened.

In an embodiment of the present invention, as shown in fig. 18, the frame truncation processing unit 188 may include:

a reference data stream selecting subunit 1881, adapted to determine, as a reference data stream, one of the compressed video data streams of each acquisition device in the acquisition array received in real time;

a video frame selecting subunit 1882, adapted to determine, based on the received video frame capturing instruction, a video frame to be captured in the reference data stream, and select, as video frames to be captured in the other compressed video data streams, video frames in the other compressed video data streams that are synchronized with the video frame to be captured in the reference data stream;

A video frame truncation sub-unit 1883 adapted to truncate the video frames to be truncated in each compressed video data stream.

In an embodiment of the present invention, as shown in fig. 18, the video frame selecting subunit 1882 may include at least one of:

the first video frame selecting module 18821 is adapted to select, according to the feature information of the object in the video frame to be intercepted in the reference data stream, a video frame in each of the remaining compressed video data streams, which is consistent with the feature information of the object, as a video frame to be intercepted in each of the remaining compressed video data streams;

the second video frame selecting module 18822 is adapted to select, according to the timestamp information of the video frame to be captured in the reference data stream, a video frame in each of the remaining compressed video data streams, which is consistent with the timestamp information, as the video frame to be captured in each of the remaining compressed video data streams.

The embodiment of the present invention further provides a data processing system corresponding to the data processing method, and the data processing apparatus is adopted to implement real-time receiving of multiple compressed video data streams, so that those skilled in the art can better understand and implement the embodiment of the present invention, and detailed descriptions are provided below through specific embodiments with reference to the attached drawings.

Referring to the schematic structural diagram of the data processing system shown in fig. 19, in an embodiment of the present invention, the data processing system 190 may include: an acquisition array 191 and a data processing device 192, the acquisition array 191 comprising a plurality of acquisition devices disposed at different positions of a field acquisition region according to a preset multi-angle free view range, wherein:

each acquisition device in the acquisition array 191 is adapted to synchronously acquire original video data from a corresponding angle in real time, and respectively perform real-time data compression on the acquired original image data to obtain a compressed video data stream synchronously acquired from a corresponding angle in real time, and transmit the obtained compressed video data stream to the data processing device 192 in real time based on a stream pulling instruction sent by the data processing device 192;

the data processing device 192 is adapted to send a stream pulling instruction to each acquisition device in the acquisition array 191 and receive the compressed video data stream transmitted by each acquisition device in the acquisition array 191 in real time when it is determined that the sum of the code rates of the compressed video data streams transmitted by each acquisition device in the acquisition array is not greater than a preset bandwidth threshold.

By adopting the scheme, a large number of servers can be prevented from being arranged on site to perform data processing, the collected original data do not need to be collected through an SDI acquisition card, the original data are processed through a calculation server in a site machine room, expensive SDI video transmission cables and SDI interfaces can be avoided from being adopted, data transmission and stream drawing are performed through a common transmission network, low-delay playing of multi-angle free visual angle videos is achieved under the condition that bandwidth resources and data processing resources are limited, and implementation cost is reduced.

In an embodiment of the present invention, the data processing device 192 is further adapted to set a value of a parameter of each acquisition device in the acquisition array before sending a stream pulling command to each acquisition device in the acquisition array 191, respectively;

wherein the parameters of the acquisition device include: and acquiring parameters and compression parameters, wherein the sum of code rates of compressed video data streams acquired by each acquisition device in the acquisition array from corresponding angles in real time and synchronously acquiring and compressing data is not more than a preset bandwidth threshold according to the set numerical value of the parameter of each acquisition device.

Therefore, before the stream is pulled, the data processing equipment can set the numerical values of the parameters of all the acquisition equipment in the acquisition array, the numerical values of the parameters of all the acquisition equipment in the acquisition array are ensured to be uniform, all the acquisition equipment can synchronously acquire and compress data from corresponding angles in real time, and the sum of the code rates of the obtained compressed video data streams is not greater than a preset bandwidth threshold value, so that network congestion can be avoided, and low-delay playing of multi-angle free-view videos can be realized under the condition of limited bandwidth resources.

In an embodiment of the present invention, before the data processing device 192 sends the stream pulling instruction to each acquisition device in the acquisition array 191, it is determined whether the sum of the code rates of the compressed video data streams that are pre-transmitted by each acquisition device in the acquisition array 191 is greater than a preset writing speed threshold, and when the sum of the code rates of the compressed video data streams that are pre-transmitted by each acquisition device in the acquisition array 191 is greater than the preset writing speed threshold, the value of the parameter of each acquisition device in the acquisition array 191 is set, so that the sum of the code rates of the compressed video data streams that are obtained by real-time synchronous acquisition and data compression from the corresponding angle by each acquisition device in the acquisition array 192 is not greater than the preset writing speed threshold according to the set value of the parameter of each acquisition device.

Therefore, before the stream is pulled, the sum of the code rates of the compressed video data streams acquired by the acquisition devices from the corresponding angles in real time and in synchronous acquisition and data compression can be ensured to be not more than the preset writing speed threshold value, so that data writing congestion of the data processing device can be avoided, a smooth link of the compressed video data streams in the acquisition, transmission and writing processes can be ensured, the compressed video streams uploaded by the acquisition devices can be processed in real time, and the multi-angle free visual angle video can be played.

In a specific implementation, each acquisition device in the acquisition array and the data processing device are adapted to be connected through a switch and/or a local area network.

In one embodiment of the invention, the data processing system 190 may also include a designated target 193.

The data processing device 192 is adapted to intercept, according to the received video frame interception instruction, video frames of frame-level synchronization in each compressed video stream, and synchronously upload the intercepted video frames to the designated target end 193;

the designated target end 193 is adapted to receive the video frame intercepted by the data processing device 192 based on the video frame intercepting instruction.

The data processing device may establish a connection with a target terminal through a port or an IP address in advance, or may upload the captured video frame to a port or an IP address specified by the video frame capture instruction in synchronization.

By adopting the scheme, the compressed video data streams acquired by the acquisition devices in the acquisition array in real time and synchronously acquired and data compressed can be uniformly transmitted to the data processing device, after the data processing device receives a video frame interception instruction, the video frames in the intercepted compressed video data streams with synchronous frame levels can be synchronously uploaded to the appointed target end through the initial processing of dotting and frame interception, and the subsequent processing of the video frames intercepted by the compressed video data streams is carried out by the appointed target end, so that the network transmission resources can be saved, the pressure and difficulty of field deployment can be reduced, the data processing load can be greatly reduced, and the transmission delay of multi-angle free visual angle video frames can be shortened.

In an embodiment of the present invention, the data processing device 192 is adapted to determine one of the compressed video data streams of each of the acquisition devices in the acquisition array 191 received in real time as a reference data stream; determining a video frame to be intercepted in the reference data stream based on the received video frame intercepting instruction, and selecting video frames in other compressed video data streams which are synchronous with the video frame to be intercepted in the reference data stream as the video frames to be intercepted of the other compressed video data streams; and finally, intercepting video frames to be intercepted in each compressed video data stream.

In order to make the embodiment of the present invention better understood and realized by those skilled in the art, the following detailed description of the frame synchronization scheme between the data processing device and the acquisition device is provided by way of specific embodiments.

Referring to the flowchart of the data synchronization method shown in fig. 20, in the embodiment of the present invention, the method may specifically include the following steps:

s201, sending a stream pulling instruction to each acquisition device in the acquisition array, wherein each acquisition device in the acquisition array is arranged at different positions of a field acquisition area according to a preset multi-angle free visual angle range, and each acquisition device in the acquisition array synchronously acquires video data streams from corresponding angles in real time.

In particular implementations, there may be multiple implementations for implementing pull synchronization. For example, the stream pulling instruction may be sent to each acquisition device in the acquisition array at the same time; or, a stream pulling instruction may be sent only to the master acquisition device in the acquisition array to trigger a stream pulling of the master acquisition device, and then the master acquisition device synchronizes the stream pulling instruction to all the slave acquisition devices to trigger a stream pulling of all the slave acquisition devices.

S202, receiving video data streams respectively transmitted by the acquisition devices in the acquisition array based on the stream pulling instruction in real time, and determining whether the video data streams respectively transmitted by the acquisition devices in the acquisition array are frame-level synchronous or not.

In a specific implementation, the capturing device itself may have encoding and encapsulating functions, so that the original video data captured synchronously from the corresponding angle in real time may be encoded and encapsulated. Moreover, each acquisition device can also have a compression function, the higher the compression ratio is, the smaller the compressed data amount can be under the condition that the data amount before compression is the same, and the bandwidth pressure of real-time synchronous transmission can be relieved, so that the acquisition devices can adopt technologies such as predictive coding, transform coding and entropy coding to improve the compression ratio of videos.

And S203, when the frame level synchronization does not exist among the video data streams respectively transmitted by the acquisition devices in the acquisition array, respectively sending a stream pulling instruction to the acquisition devices in the acquisition array again until the frame level synchronization among the video data streams respectively transmitted by the acquisition devices in the acquisition array.

By adopting the data synchronization method, whether the video data streams respectively transmitted by the acquisition equipment in the acquisition array are frame-level synchronized or not is determined, so that the synchronous transmission of multiple paths of data can be ensured, the problems of frame missing and multi-frame transmission can be avoided, the data processing speed is improved, and the requirement of low-delay playing of the multi-angle free visual angle video is met.

In a specific implementation, when each acquisition device in the acquisition array is manually started, there is a start time error, and it is possible that the acquisition of video data streams does not start at the same time. Therefore, at least one of the following manners can be adopted to ensure that each acquisition device in the acquisition array respectively acquires the video data stream synchronously from the corresponding angle in real time:

1. when at least one acquisition device acquires an acquisition starting instruction, the acquisition device acquiring the acquisition starting instruction synchronizes the acquisition starting instruction to other acquisition devices, so that each acquisition device in the acquisition array starts to synchronously acquire video data streams from corresponding angles in real time based on the acquisition starting instruction.

For example, the acquisition array may include 40 acquisition devices, and when the acquisition device A1 acquires the acquisition start instruction, the acquisition device A1 synchronously sends the acquired acquisition start instruction to the other acquisition devices A2 to a40, and after all the acquisition devices receive the acquisition start instruction, each acquisition device starts to synchronously acquire the video data stream from a corresponding angle in real time based on the acquisition start instruction. Because the data transmission speed between each acquisition equipment is far faster than the speed of manual start, therefore, the start time error produced by manual start can be reduced.

2. And each acquisition device in the acquisition array synchronously acquires the video data stream from a corresponding angle in real time based on a preset clock synchronization signal.

For example, a clock signal synchronizer may be provided, each of the acquisition devices may be respectively connected to the clock signal synchronizer, and when the clock signal synchronizer receives a trigger signal (e.g., a synchronous acquisition start instruction), the clock signal synchronizer may transmit a clock synchronization signal to each of the acquisition devices, and each of the acquisition devices respectively starts to synchronously acquire a video data stream from a corresponding angle in real time based on the clock synchronization signal. Because the clock signal transmitting device can transmit the clock synchronization signal to each acquisition equipment based on the preset trigger signal, each acquisition equipment can synchronously acquire and is not easily interfered by external conditions and manual operation, and therefore, the synchronization precision and the synchronization efficiency of each acquisition equipment can be improved.

In a specific implementation, due to the influence of a network transmission environment, it may be possible that each acquisition device in the acquisition array cannot receive a pull stream instruction at the same time, and there may be a time difference of several milliseconds or less between the acquisition devices, which causes that video data streams transmitted by the acquisition devices in real time are not synchronized, as shown in fig. 21, the acquisition array includes acquisition devices 1 and 2, acquisition parameters of the acquisition devices 1 and 2 are set to be the same, wherein acquisition frame rates are both X fps, and video frames acquired by the acquisition devices 1 and 2 are synchronously acquired at a frame level.

The acquisition interval T of each frame in the acquisition devices 1 and 2 is

Assuming that the data processing device sends a stream pulling instruction r at the time T0, the acquisition device 1 receives the stream pulling instruction r at the time T1, and the acquisition device 2 receives the stream pulling instruction r at the time T2, if the acquisition devices 1 and 2 receive the stream pulling instruction r within the same acquisition interval T, it can be considered that the acquisition devices 1 and 2 receive the stream pulling instruction at the same time, and the acquisition devices 1 and 2 can respectively transmit frame-level synchronous video data streams; if the acquisition devices 1 and 2 do not receive the video data stream in the same acquisition interval, it may be considered that the acquisition devices 1 and 2 do not receive the stream pulling instruction at the same time, and the acquisition devices 1 and 2 cannot realize the synchronous transmission of the frame-level video data stream. Frame-level synchronization of video streaming may also be referred to as pull synchronization. Once the pull synchronization is achieved, it automatically continues until the pull is stopped.

The reason why the frame-level synchronized video data stream cannot be transmitted may be:

1) A stream pulling instruction needs to be sent to each acquisition device respectively;

2) The local area network has a delay in transmitting the pull stream command.

Therefore, whether the video data streams respectively transmitted by the acquisition devices in the acquisition array are frame-level synchronized can be determined in at least one of the following manners:

1. When the N-th frame of the video data streams respectively transmitted by the acquisition devices in the acquisition array is acquired, the characteristic information of the object of the N-th frame of each video data stream is matched, and when the characteristic information of the object of the N-th frame of each video data stream meets a preset similarity threshold, the characteristic information of the object of the N-th frame of the video data streams respectively transmitted by the acquisition devices in the acquisition array is determined to be consistent, so that the frame level synchronization among the video data streams respectively transmitted by the acquisition devices is realized.

Where N is an integer not less than 1, and the feature information of the object in the nth frame of each video data stream may include at least one of shape feature information, color feature information, and position feature information.

2. The timestamp information of the nth frame of each video data stream may be matched when the nth frame of the video data stream transmitted by each acquisition device in the acquisition array is acquired, where N is an integer not less than 1. And when the timestamp information of the Nth frame of each video data stream is consistent, determining the frame level synchronization among the video data streams respectively transmitted by each acquisition device.

And when the frame level synchronization does not exist among the video data streams respectively transmitted by the acquisition devices in the acquisition array, respectively sending a stream pulling instruction to the acquisition devices in the acquisition array again, and determining whether the frame level synchronization exists or not by adopting at least one mode until the frame level synchronization exists among the video data streams respectively transmitted by the acquisition devices in the acquisition array.

In a specific implementation, a video frame in the video data stream of each acquisition device may be further intercepted and transmitted to a specified target end, and in order to ensure frame-level synchronization of the intercepted video frame, as shown in fig. 22, the following steps may be included:

s221, determining one of the video data streams of the acquisition devices in the acquisition array received in real time as a reference data stream.

S222, based on the received video frame intercepting instruction, determining the video frame to be intercepted in the reference data stream, and selecting the video frame in each of the other video data streams which is synchronous with the video frame to be intercepted in the reference data stream as the video frame to be intercepted of each of the other video data streams.

And S223, intercepting the video frames to be intercepted in each video data stream.

S224, synchronously uploading the intercepted video frames to the appointed target end.

By adopting the scheme, the frame cutting synchronization can be realized, the frame cutting efficiency is improved, the display effect of the generated multi-angle free visual angle video is further improved, and the user experience is enhanced. Moreover, the coupling between the process of selecting and intercepting the video frames and the process of generating the multi-angle free visual angle video can be reduced, the independence among all the processes is enhanced, the later maintenance is convenient, the intercepted video frames are uploaded to the appointed target end synchronously, the network transmission resources can be saved, the data processing load can be reduced, and the speed of generating the multi-angle free visual angle video through data processing is improved.

In order to make the embodiment of the present invention better understood and implemented by those skilled in the art, the following describes in detail how to determine the video frames to be intercepted in each video data stream by using a specific application example.

One mode is to select a video frame in the rest video data streams, which is consistent with the feature information of the object, as the video frame to be intercepted of the rest video data streams according to the feature information of the object in the video frame to be intercepted in the reference data stream.

For example, the acquisition array includes 40 acquisition devices, and therefore, 40 paths of video data streams may be received in real time, and it is assumed that, in the video data streams of the acquisition devices in the acquisition array received in real time, the video data stream A1 corresponding to the acquisition device A1' is determined as a reference data stream, then, based on the feature information X of the object in the video frame that is indicated and intercepted in the received video frame interception instruction, the video frame A1 in the reference data stream that is consistent with the feature information X of the object is determined as a video frame to be intercepted, and then, according to the feature information X1 of the object in the video frame A1 to be intercepted in the reference data stream, the video frame A2-a40 in the remaining video data streams A2-a40 that is consistent with the feature information X1 of the object is selected as a video frame to be intercepted of the remaining video data streams.

The feature information of the object may include shape feature information, color feature information, position feature information, and the like; the feature information X of the object in the video frame to be captured in the reference data stream and the feature information X1 of the object in the video frame a1 to be captured in the reference data stream may be the same representation manner for the feature information of the same object, for example, the feature information X and X1 of the object are both two-dimensional feature information; the feature information X of the object and the feature information X1 of the object may be represented differently from each other for the same object, for example, the feature information X of the object may be two-dimensional feature information, and the feature information X1 of the object may be three-dimensional feature information. Moreover, a similarity threshold may be preset, and when the similarity threshold is satisfied, the feature information X of the object may be considered to be consistent with X1, or the feature information X1 of the object may be considered to be consistent with the feature information X2-X40 of the object in the remaining video data streams A2-a 40.

The specific representation mode and the similar threshold of the feature information of the object may be determined according to a preset multi-angle free view range and a scene of a scene, which is not limited in this embodiment.

In another mode, according to the timestamp information of the video frame in the reference data stream, a video frame in the remaining video data streams, which is consistent with the timestamp information, is selected as a video frame to be intercepted of the remaining video data streams.

For example, the acquisition array may include 40 acquisition devices, and therefore, 40 paths of video data streams may be received in real time, and it is assumed that, in the video data streams of the acquisition devices in the acquisition array received in real time, a video data stream B1 corresponding to an acquisition device B1 is determined as a reference data stream, then, based on timestamp information Y indicating an intercepted video frame in a received video frame interception instruction, a video frame B1 corresponding to the timestamp information Y in the reference data stream is determined as a video frame to be intercepted, and then, according to the timestamp information Y1 in the video frame B1 to be intercepted in the reference data stream, a video frame B2-B40 consistent with the timestamp information Y1 in each of the remaining video data streams B2-B40 is selected as a video frame to be intercepted of each of the remaining video data streams.

It can be understood that, in the above embodiments, the method for determining the video frame to be intercepted in each video data stream may be used alone or simultaneously, and the embodiments of the present invention are not limited thereto.

By adopting the scheme, the efficiency and the result accuracy of synchronous selection and synchronous interception of the video frames can be improved, so that the integrity and the synchronism of transmitted data can be improved.

The embodiment of the present invention further provides a data processing device corresponding to the data processing method, and in order to enable those skilled in the art to better understand and implement the embodiment of the present invention, the following detailed description is provided by specific embodiments with reference to the accompanying drawings.

Referring to the schematic structural diagram of the data processing apparatus shown in fig. 23, in the embodiment of the present invention, as shown in fig. 23, the data processing apparatus 230 may include:

the instruction sending unit 231 is adapted to send a stream pulling instruction to each acquisition device in the acquisition array, wherein each acquisition device in the acquisition array is arranged at different positions in a field acquisition area according to a preset multi-angle free view range, and each acquisition device in the acquisition array is arranged to synchronously acquire video data streams from corresponding angles in real time;

the data stream receiving unit 232 is adapted to receive, in real time, video data streams respectively transmitted by the acquisition devices in the acquisition array based on the stream pulling instruction;

The first synchronization judging unit 233 is adapted to determine whether the video data streams respectively transmitted by the collecting devices in the collecting array are frame-level synchronized, and when the video data streams respectively transmitted by the collecting devices in the collecting array are not frame-level synchronized, re-trigger the instruction transmitting unit 231 until the video data streams respectively transmitted by the collecting devices in the collecting array are frame-level synchronized.

Wherein the data processing device may be set according to an actual situation. For example, when the site has free space, the data processing device can be placed in a non-acquisition area of the site to be used as a site server; when the site has no free space, the data processing equipment can be arranged in the cloud end to serve as a cloud end server.

By adopting the data processing equipment, whether the frame level synchronization exists among the video data streams respectively transmitted by the acquisition equipment in the acquisition array is determined, so that the synchronous transmission of multiple paths of data can be ensured, the transmission problems of frames missing and multiple frames can be avoided, the data processing speed is improved, and the requirement of low-delay playing of multi-angle free visual angle videos is met.

In an embodiment of the present invention, as shown in fig. 23, the data processing device 230 may further include:

A reference video stream determining unit 234, adapted to determine one of the video data streams received in real time from each of the collecting devices in the collecting array as a reference data stream;

a video frame selecting unit 235 adapted to determine a video frame to be intercepted in the reference data stream based on the received video frame intercepting instruction, and select a video frame in each of the remaining video data streams that is synchronized with the video frame to be intercepted in the reference data stream as a video frame to be intercepted in each of the remaining video data streams;

a video frame intercepting unit 236 adapted to intercept video frames to be intercepted in each video data stream;

and an uploading unit 237, adapted to upload the captured video frames to the specified target end synchronously.

The data processing device 230 may establish a connection with a target terminal through a port or an IP address in advance, or may upload the captured video frame to a port or an IP address specified by the video frame capture instruction synchronously.

By adopting the scheme, the frame cutting synchronization can be realized, the frame cutting efficiency is improved, the display effect of the generated multi-angle free visual angle video is further improved, and the user experience is enhanced. And moreover, the coupling of the process of selecting and intercepting the video frames and the process of generating the multi-angle free visual angle video is reduced, the independence among all the processes is enhanced, the later maintenance is facilitated, the intercepted video frames are synchronously uploaded to the specified target end, the network transmission resources can be saved, the data processing load can be reduced, and the speed of generating the multi-angle free visual angle video through data processing is improved.

In an embodiment of the present invention, as shown in fig. 23, the video frame selecting unit 235 includes at least one of the following:

the first video frame selection module 2351 is adapted to select, according to the feature information of the object in the video frame to be intercepted in the reference data stream, a video frame in the remaining video data streams, which is consistent with the feature information of the object, as a video frame to be intercepted in the remaining video data streams;

the second video frame selecting module 2352 is adapted to select, according to the timestamp information of the video frame in the reference data stream, a video frame in the remaining video data streams, which is consistent with the timestamp information, as a video frame to be captured of the remaining video data streams.

The embodiment of the present invention further provides a data synchronization system corresponding to the data processing method, and the data processing device is adopted to receive multiple video data streams in real time, so that a person skilled in the art can better understand and realize the embodiment of the present invention, and the detailed description is given below through specific embodiments with reference to the attached drawings.

Referring to the schematic structural diagram of the data synchronization system shown in fig. 24, in an embodiment of the present invention, the data synchronization system 240 may include: the system comprises a collection array 241 arranged in a field collection area and data processing equipment 242 arranged in a link connection with the collection array, wherein the collection array 241 comprises a plurality of collection equipment, and each collection equipment in the collection array 241 is arranged at different positions of the field collection area according to a preset multi-angle free visual angle range, wherein:

each acquisition device in the acquisition array 241 is adapted to synchronously acquire video data streams from corresponding angles in real time, and transmit the acquired video data streams to the data processing device 242 in real time based on a stream pulling instruction sent by the data processing device 242;

the data processing device 242 is adapted to send a stream pulling instruction to each acquisition device in the acquisition array 241, receive, in real time, video data streams transmitted by each acquisition device in the acquisition array 241 based on the stream pulling instruction, and send a stream pulling instruction to each acquisition device in the acquisition array 241 again when there is no frame level synchronization between the video data streams transmitted by each acquisition device in the acquisition array 241 until the frame level synchronization between the video data streams transmitted by each acquisition device in the acquisition array 241.

By adopting the data synchronization system in the embodiment of the invention, whether the video data streams respectively transmitted by the acquisition equipment in the acquisition array are frame-level synchronized or not is determined, so that the multi-path data synchronous transmission can be ensured, the transmission problems of frame missing and multi-frame can be avoided, the data processing speed is improved, and the requirement of low-delay playing of multi-angle free visual angle videos is further met.

In a specific implementation, the data processing device 242 is further adapted to determine one of the video data streams of each acquisition device in the acquisition array 241 received in real time as a reference data stream; determining a video frame to be intercepted in the reference data stream based on the received video frame intercepting instruction, and selecting video frames in other video data streams which are synchronous with the video frame to be intercepted in the reference data stream as the video frames to be intercepted of other video data streams; and intercepting video frames to be intercepted in each video data stream and synchronously uploading the intercepted video frames to the specified target end.

The data processing device 240 may establish a connection with a target terminal through a port or an IP address in advance, or may upload the captured video frame to a port or an IP address specified by the video frame capture instruction synchronously.

In an embodiment of the present invention, the data synchronization system 240 may further include a cloud server, which is adapted to serve as a designated target.

In another embodiment of the present invention, as shown in fig. 34, the data synchronization system 240 may further include a play control device 341 adapted to serve as a designated destination.

In another embodiment of the present invention, as shown in fig. 35, the data synchronization system 240 may further include an interactive terminal 351 adapted to be a designated target.

In an embodiment of the present invention, at least one of the following manners may be used to ensure that each acquisition device in the acquisition array 241 synchronously acquires video data streams from corresponding angles in real time:

1. the acquisition devices in the acquisition array are connected through a synchronization line, wherein when at least one acquisition device acquires an acquisition starting instruction, the acquisition device acquiring the acquisition starting instruction synchronizes the acquisition starting instruction to other acquisition devices through the synchronization line, so that the acquisition devices in the acquisition array start to synchronously acquire video data streams from corresponding angles in real time respectively based on the acquisition starting instruction;

For a person skilled in the art to better understand and implement the embodiment of the present invention, a data synchronization system is described in detail below through a specific application scenario, as shown in fig. 25, which is a schematic structural diagram of the data synchronization system in the application scenario, where the data synchronization system includes a collection array 251 composed of collection devices, a data processing device 252, and a cloud server cluster 253.

At least one of the acquisition devices in the acquisition array 251 acquires an acquisition start instruction, and synchronizes the acquired acquisition start instruction to other acquisition devices through a synchronization line 254, so that each acquisition device in the acquisition array starts to synchronously acquire a video data stream from a corresponding angle in real time based on the acquisition start instruction.

The data processing device 252 may send a stream pulling instruction to each acquisition device in the acquisition array 251 through a wireless local area network. Each acquisition device in the acquisition array 251 transmits the obtained video data stream to the data processing device 252 in real time through the switch 255 based on the pull stream command sent by the data processing device 252.

The data processing device 252 determines whether frame level synchronization exists between the video data streams respectively transmitted by the acquisition devices in the acquisition array 251, and re-sends a stream pulling instruction to each acquisition device in the acquisition array 251 when frame level synchronization does not exist between the video data streams respectively transmitted by the acquisition devices in the acquisition array 251 until frame level synchronization exists between the video data streams transmitted by the acquisition devices in the acquisition array 251.

After the data processing device 252 determines frame level synchronization among the video data streams transmitted by the acquisition devices in the acquisition array 251, it determines one of the video data streams of the acquisition devices in the acquisition array 251 received in real time as a reference data stream, and after a received video frame capture instruction, it determines a video frame to be captured in the reference data stream according to the video frame capture instruction, and then the data processing device 252 selects a video frame in the other video data streams synchronized with the video frame to be captured in the reference data stream as a video frame to be captured in the other video data streams, and then captures the video frame to be captured in the video data streams and synchronously uploads the captured video frame to the cloud.

The cloud server cluster 253 performs subsequent processing on the captured video frames to obtain a multi-angle free-view video for playing.

In a specific implementation, the server cluster 253 in the cloud may include: a first cloud server 2531, a second cloud server 2532, a third cloud server 2533, and a fourth cloud server 2534. The first cloud server 2531 can be used for parameter calculation; the second cloud server 2532 may be configured to perform depth calculations, generating a depth map; the third cloud server 2533 may be configured to perform frame image reconstruction on a preset virtual viewpoint path by using DIBR; the fourth cloud server 2534 may be configured to generate a multi-angle freeview video.

It can be understood that the data processing device may be placed in a field non-acquisition area according to an actual situation, or placed in a cloud, and in an actual application, the data synchronization system may use at least one of a cloud server, a play control device, or an interactive terminal as a transmitting end of the video frame capture instruction, or may use other devices capable of transmitting the video frame capture instruction, which is not limited in the embodiment of the present invention.

It should be noted that the data processing system and the like in the foregoing embodiments can all apply the data synchronization system in the embodiments of the present invention.

The embodiment of the invention also provides acquisition equipment corresponding to the data processing method, wherein the acquisition equipment is suitable for synchronizing the acquisition starting instruction to other acquisition equipment when acquiring the acquisition starting instruction, starting to synchronously acquire video data streams from corresponding angles in real time, and transmitting the acquired video data streams to the data processing equipment in real time when receiving the stream pulling instruction sent by the data processing equipment. In order that those skilled in the art may better understand and realize the embodiments of the present invention, the detailed description is provided below with reference to the accompanying drawings by way of specific embodiments.

Referring to the schematic structural diagram of the collecting apparatus shown in fig. 36, in the embodiment of the present invention, the collecting apparatus 360 includes: photoelectric conversion camera module 361, processor 362, encoder 363, transmission part 365, wherein:

a photoelectric conversion camera component 361 adapted to collect an image;

the processor 362 is adapted to synchronize the acquisition start instruction to other acquisition devices through the transmission component 365 when the acquisition start instruction is acquired, start to process the image acquired by the photoelectric conversion camera module 361 in real time to obtain an image data sequence, and transmit the acquired video data stream to the data processing device through the transmission component 365 when the stream pulling instruction is acquired;

the encoder 363 is adapted to encode the image data sequence to obtain a corresponding video data stream.

Alternatively, as shown in fig. 36, the capturing device 360 may further include a recording component 364 adapted to capture sound signals to obtain audio data.

The captured image data sequence and audio data may be processed by the processor 362 and then encoded by the encoder 363 to obtain a corresponding video data stream. When acquiring the acquisition start instruction, the processor 362 may synchronize the acquisition start instruction to other acquisition devices through the transmission component 365; upon receiving the pull instruction, the obtained video data stream is transmitted to the data processing apparatus through the transmission component 365 in real time.

In specific implementation, the collecting devices can be arranged at different positions of a field collecting area according to a preset multi-angle free visual angle range, and the collecting devices can be fixedly arranged at a certain point of the field collecting area and can also move in the field collecting area so as to form a collecting array. Therefore, the acquisition equipment can be fixed equipment or mobile equipment, and therefore the video data stream can be flexibly acquired from multiple angles.

As shown in fig. 37, which is a schematic diagram of an acquisition array in an application scenario according to an embodiment of the present invention, a stage center is used as a core viewpoint, the core viewpoint is used as a circle center, and a sector area where the core viewpoint is located on the same plane is used as a preset multi-angle free viewing angle range. And the acquisition equipment 371-375 in the acquisition array is arranged at different positions of an on-site acquisition area in a fan shape according to the preset multi-angle free visual angle range. The collection device 376 is a movable device that can be moved to a designated location according to instructions for flexible collection. Also, the collecting device may be a handheld device to supplement the collected data when the collecting device fails or in a narrow space area, for example, the handheld device 377 located in the stage audience area in fig. 37 may be added to the collecting array to provide the video data stream of the stage audience area.

As described above, depth map calculation is required to generate multi-angle free view data, but the time for depth map calculation is long at present, and how to reduce the time for depth map generation and increase the rate of depth map generation is a problem to be solved urgently.

In view of the foregoing problems, embodiments of the present invention provide a compute node cluster, where multiple compute nodes may concurrently generate depth maps in parallel and in batch processing on texture data synchronously acquired by a same acquisition array. Specifically, the depth map calculation process may be divided into a plurality of steps, such as obtaining a rough depth map through a first depth calculation, determining an unstable region in the rough depth map and then performing a second depth calculation, in each step, a plurality of calculation nodes in the calculation node cluster may perform the first depth calculation on texture data acquired by a plurality of acquisition devices in parallel to obtain the rough depth map, and perform verification and the second depth calculation on the obtained rough depth map in parallel, so that time for calculating the depth map may be saved and a depth map generation rate may be increased. The following is a further detailed description by way of specific embodiments with reference to the accompanying drawings.

Referring to a flowchart of a depth map generation method shown in fig. 26, in the embodiment of the present invention, a plurality of computing nodes in a computing node cluster are respectively used to generate a depth map, and for convenience of description, any computing node in the computing node cluster is referred to as a first computing node. The method for generating the depth map of the computing node cluster is explained in detail through the following specific steps:

S261, texture data is received, and the texture data are synchronously acquired by a plurality of acquisition devices in the same acquisition array.

In specific implementation, the plurality of collecting devices can be arranged at different positions of a field collecting area according to a preset multi-angle free visual angle range, and the collecting devices can be fixedly arranged at a certain point of the field collecting area and can also move in the field collecting area so as to form a collecting array. The multi-angle free view may refer to a spatial position and a view of a virtual viewpoint that enables a scene to be freely switched. For example, the multi-angle free view may be a 6DoF (6 DoF) view, the acquisition devices used in the acquisition array may be general cameras, video recorders, handheld devices such as mobile phones, and the like, and specific implementation may refer to other embodiments of the present invention, which is not described herein again.

The texture data, that is, the pixel data of the two-dimensional image frame acquired by the acquiring device, may be an image at one frame time, or may also be pixel data of a frame image corresponding to a video stream formed by continuous or discontinuous frame images.

And S262, the first computing node performs first depth computation according to the first texture data and the second texture data to obtain a first rough depth map.

Here, for clarity and conciseness of description, texture data satisfying a preset first mapping relationship with the first computing node in the texture data is referred to as first texture data; and the texture data acquired by the acquisition equipment which meets the preset first spatial position relation with the acquisition equipment of the first texture data is called as second texture data.

In a specific implementation, the first mapping relationship may be obtained based on a preset first mapping relationship table or through random mapping. For example, the texture data processed by each compute node may be pre-allocated according to the number of compute nodes in the compute node cluster and the number of acquisition devices in the acquisition array corresponding to the texture data. A special allocation node may be set to allocate the computation tasks of the computation nodes in the computation node cluster, and the allocation node may obtain the first mapping relationship based on a preset first mapping relationship table or through random mapping. For example, if there are 40 collection devices in the collection array, in order to achieve the highest concurrent processing efficiency, 40 computing nodes may be configured, and each collection device corresponds to one computing node. If there are only 20 computing nodes, under the condition that the processing capacities of the computing nodes are the same or approximately equivalent, texture data acquired by two acquisition devices corresponding to each computing node may be set to meet the requirements of the highest concurrent processing efficiency and load balancing. Specifically, a mapping relationship between the identifier of the acquisition device corresponding to the texture data and the identifier of each computing node may be set as the first mapping relationship, and the texture data acquired by the corresponding acquisition device in the acquisition array is directly distributed to the corresponding computing node based on the first mapping relationship. In specific implementation, a computing task may also be randomly allocated, and texture data acquired by each acquisition device in the acquisition array is randomly allocated to each computing node in the computing node cluster.

As an example, any server in the server cluster may perform the first depth calculation according to the first texture data and the second texture data.

For the preset first spatial position relationship between the first texture data and the second texture data, for example, the second texture data may be texture data acquired by an acquisition device satisfying a preset first distance relationship with an acquisition device of the first texture data, or texture data acquired by an acquisition device satisfying a preset first quantity relationship with an acquisition device of the first texture data, or texture data acquired by an acquisition device satisfying a preset first distance relationship with an acquisition device of the first texture data and satisfying a preset first quantity relationship with the acquisition device of the first texture data.

The first preset number may be any integer value from 1 to N-1, where N is the total number of the collection devices in the collection array. In an embodiment of the present invention, the first predetermined number is 2, so that the image quality as high as possible can be obtained with the least computation. For example, assuming that the computing node 9 corresponds to the camera 9 in the preset first mapping relationship, the rough depth map of the camera 9 can be computed by using the texture data of the camera 9 and the texture data of the

cameras

5, 6, 7, 10, 11, and 12 adjacent to the camera 9.

It is to be understood that, in a specific implementation, the second texture data may also be data acquired by an acquisition device that satisfies other types of first spatial position relationships with the acquisition device of the first texture data, for example, the first spatial position relationship may also satisfy a preset angle, satisfy a preset relative position, and so on.

And S263, the first computing node synchronizes the first rough depth map to the rest computing nodes in the computing node cluster to obtain a rough depth map set.

The coarse depth maps obtained after the coarse depth map calculation need to be cross-validated to determine the unstable region in each coarse depth map, so as to perform a refined solution in the next step. For any one of the coarse depth maps in the coarse depth map set, cross-validation needs to be performed through the coarse depth maps corresponding to the multiple acquisition devices around the acquisition device corresponding to the coarse depth map. (typically, the rough depth map to be verified and the rough depth maps corresponding to all other acquisition devices are cross-verified together), therefore, the rough depth maps calculated by each computing node need to be synchronized to the other computing nodes in the computing node cluster, after synchronization in step S263, each computing node in the computing node cluster obtains the rough depth maps calculated by the other computing nodes in the computing node cluster, and each server obtains the identical rough depth map set.

And S264, the first computing node verifies a second rough depth map in the rough depth map set by using a third rough depth map to obtain an unstable region in the second rough depth map.

The second rough depth map and the first computing node can meet a preset second mapping relation; the third coarse depth map may be a coarse depth map corresponding to an acquisition apparatus whose acquisition apparatus corresponding to the second coarse depth map satisfies a preset second spatial position relationship.

The second mapping relationship may be obtained based on a preset second mapping relationship table or through random mapping. For example, the texture data processed by each compute node may be pre-allocated according to the number of compute nodes in the compute node cluster and the number of acquisition devices in the acquisition array corresponding to the texture data. In specific implementation, a special allocation node may be set to allocate the computation tasks of the computation nodes in the computation node cluster, and the allocation node may obtain the second mapping relationship based on a preset second mapping relationship table or through random mapping. Specific examples of setting the second mapping relationship can be found in the foregoing implementation examples of the first mapping relationship.

It is understood that, in a specific implementation, the second mapping relationship may or may not completely correspond to the first mapping relationship. For example, in the case that the number of cameras is equal to the number of computing nodes, a one-to-one correspondence second mapping relationship may be established according to the hardware identifier, where the data (including texture data and a coarse depth map) corresponds to the acquisition device and the identifier of the computing node processing the data.

It is to be understood that the descriptions of the first coarse depth map, the second coarse depth map and the third coarse depth map are only for clarity and conciseness. In a specific implementation, the first coarse depth map may be the same as or different from the second coarse depth map; and acquiring equipment corresponding to the third rough depth map and acquiring equipment corresponding to the second rough depth map meet a preset second spatial position relation.

As for the second spatial position relationship, as a specific example, the texture data corresponding to the third coarse depth map may be texture data acquired by an acquisition device that satisfies a preset second distance relationship with an acquisition device corresponding to the second coarse depth map, or the texture data corresponding to the third coarse depth map may be texture data acquired by an acquisition device that satisfies a preset second quantity relationship with an acquisition device corresponding to the second coarse depth map, or the texture data corresponding to the third coarse depth map may be texture data acquired by an acquisition device that satisfies a preset second distance relationship and a second quantity relationship with an acquisition device corresponding to the second coarse depth map.

The second preset number may be any integer value from 1 to N-1, where N is the total number of the collection devices in the collection array. In a specific implementation, the second preset number may be equal to or different from the first preset number. In an embodiment of the present invention, the second predetermined number is 2, so that the image quality as high as possible can be obtained with the least computation.

In a specific implementation, the second spatial position relationship may also be other types of spatial position relationships, for example, a predetermined angle is satisfied, a predetermined relative position is satisfied, and the like.

And S265, performing second depth calculation by the first calculation node according to the unstable region in the second rough depth map, the texture data corresponding to the second rough depth map and the texture data corresponding to the third rough depth map to obtain a corresponding fine depth map.

It should be noted that the difference between the second depth calculation and the first depth calculation is that the depth map candidate value in the second coarse depth map selected by the second depth calculation does not include the depth value of the unstable region, so that the unstable region in the generated depth map can be excluded, the generated depth map is more accurate, and the quality of the generated multi-angle free-viewing-angle image can be improved.

An application scenario illustrates:

a first round of depth calculation (first depth calculation) may be performed by the server S based on the texture data of the assigned camera M and the texture data of the camera satisfying a preset first spatial position relationship with the camera M, so as to obtain a coarse depth map.

After the cross-validation in step S264, the refinement solution of the depth map can be continuously performed on the same server. Specifically, the server S may perform cross validation on the results of the coarse depth map corresponding to the allocated camera M and all other coarse depth maps to obtain an unstable region in the coarse depth map corresponding to the camera M, and then perform a round of depth map calculation (second depth calculation) on the unstable region in the coarse depth map corresponding to the allocated camera M, texture data acquired by the camera M, and texture information of N cameras around the camera M to obtain a refined depth map corresponding to the first texture data (texture data acquired by the camera M).

Here, the rough depth map corresponding to the camera M is a rough depth map calculated based on texture data acquired by the camera M and texture data acquired by an acquisition device that satisfies a preset first spatial position relationship with the camera M.

And S266, taking the fine depth map set of the fine depth map obtained by each computing node as the finally generated depth map.

By adopting the embodiment, the depth map can be generated by a plurality of computing nodes in parallel and in batch mode on the texture data synchronously acquired by the same acquisition array, so that the generation efficiency of the depth map can be greatly improved.

In addition, by adopting the scheme, the unstable area in the generated depth map is eliminated through secondary depth calculation, so that the obtained fine depth map is more accurate, and the quality of the generated multi-angle free visual angle image can be improved.

In specific implementation, according to the size of the data volume of the texture data to be processed and the requirement for the generation speed of the depth map, the configuration of the compute nodes in the compute node cluster and the number of the compute nodes may be selected as appropriate. For example, the computing node cluster may be a server cluster composed of a plurality of servers, and the plurality of servers in the server cluster may be deployed in a centralized manner or in a distributed manner. In some embodiments of the present invention, some or all of the computing node devices in the computing node cluster may serve as a local server, or may serve as an edge node device, or serve as a cloud computing device.

As another example, the compute node cluster may also be a compute device formed for multiple CPUs or GPUs. An embodiment of the present invention further provides a computing node, which is adapted to form a computing node cluster with at least another computing node to generate a depth map, where referring to a schematic structural diagram of the computing node shown in fig. 27, the computing node 270 may include:

an input unit 271 adapted to receive texture data, said texture data originating from a plurality of acquisition devices in the same acquisition array being acquired synchronously;

the first depth calculating unit 272 is adapted to perform a first depth calculation according to the first texture data and the second texture data to obtain a first coarse depth map, where: the first texture data and the computing node meet a preset first mapping relation; the second texture data is texture data acquired by acquisition equipment which meets a preset first spatial position relation with the acquisition equipment of the first texture data;

a synchronization unit 273 adapted to synchronize the first coarse depth map to the remaining compute nodes in the cluster of compute nodes, resulting in a coarse depth map set;

the verifying unit 274, for a second coarse depth map in the set of coarse depth maps, is adapted to perform verification using a third coarse depth map to obtain an unstable region in the second coarse depth map, where: the second rough depth map and the computing node meet a preset second mapping relation; the third rough depth map is a rough depth map corresponding to the acquisition equipment of which the acquisition equipment corresponding to the second rough depth map meets a preset second spatial position relationship;

A second depth calculating unit 275, adapted to perform a second depth calculation according to the unstable region in the second coarse depth map, the texture data corresponding to the second coarse depth map, and the texture data corresponding to the third coarse depth map, to obtain a corresponding fine depth map, where: the candidate value of the depth map in the second rough depth map selected by the second depth calculation does not contain the depth value of the unstable area;

an output unit 276, adapted to output the fine depth map, so that the computing node cluster obtains a fine depth map set as a finally generated depth map.

By adopting the computing nodes, the depth map computing process can comprise multiple steps of obtaining a rough depth map through first depth computing, determining an unstable region in the rough depth map and then second depth computing, and the like.

The embodiment of the invention also provides a computing node cluster which can comprise a plurality of computing nodes, and the plurality of computing nodes in the computing node cluster can simultaneously generate the depth map in parallel and in batch processing mode on the texture data synchronously acquired by the same acquisition array. For convenience of description, any one of the compute nodes in the compute node cluster is referred to as a first compute node.

In some embodiments of the present invention, the first computing node is adapted to perform a first depth computation according to first texture data and second texture data in the received texture data, so as to obtain a first rough depth map; synchronizing the first rough depth map to the rest of the computing nodes in the computing node cluster to obtain a rough depth map set; verifying a second rough depth map in the rough depth map set by using a third rough depth map to obtain an unstable region in the second rough depth map; performing second depth calculation according to an unstable region in the second rough depth map, texture data corresponding to the second rough depth map and texture data corresponding to the third rough depth map to obtain a corresponding fine depth map, and outputting the obtained fine depth map so that the calculation node cluster takes the obtained fine depth map set as a finally generated depth map;

the first texture data and the first computing node meet a preset first mapping relation; the second texture data is texture data acquired by acquisition equipment which meets a preset first spatial position relation with the acquisition equipment of the first texture data; the second rough depth map and the first computing node meet a preset second mapping relation; the third rough depth map is a rough depth map corresponding to the acquisition equipment of which the acquisition equipment corresponding to the second rough depth map meets a preset second spatial position relationship; and the depth map candidate in the second coarse depth map selected by the second depth calculation does not include the depth value of the unstable region.

Referring to a schematic diagram of a server cluster for depth map processing shown in fig. 28, texture data acquired by N cameras in a camera array is respectively input to N servers in the server cluster, first, a first depth calculation is respectively performed to obtain rough depth maps 1 to N, then, each server copies the rough depth map obtained by its own calculation to other servers of the server cluster respectively and realizes time synchronization, and then, each server verifies the rough depth map allocated by itself, and performs a second depth calculation to obtain a depth map after fine calculation as a depth map generated by the server cluster. According to the calculation process, each server in the server cluster can perform first depth calculation on texture data acquired by a plurality of cameras in parallel, and verify each rough depth map in the rough depth map set and perform second depth calculation, and a plurality of servers are performed in parallel in the whole depth map generation process, so that the time for calculating the depth map can be greatly saved, and the depth map generation efficiency is improved.

For specific implementation and beneficial effects of the compute nodes and the compute node clusters in the embodiments of the present invention, reference may be made to the depth map generation method in the foregoing embodiments of the present invention, which is not described herein again.

The server cluster may further store the generated depth map, or output the depth map to the terminal device according to the request, so as to further generate and display the virtual viewpoint image, which is not described herein again.

The embodiment of the present invention further provides a computer-readable storage medium, where computer instructions are stored, and when the computer instructions are executed, the steps of the depth map generation method according to any one of the foregoing embodiments may be executed, which may specifically refer to the steps of the depth map generation method, and details are not described here.

In addition, the currently known Depth-Image-Based Rendering (DIBR) -Based virtual viewpoint Image generation method is difficult to meet the requirement of multi-angle free view application in playing.

The inventor researches and discovers that the conventional DIBR virtual viewpoint image generation method is not high in concurrency and is generally processed by a CPU (central processing unit), however, the generation method involves more steps and each step is more complex for each virtual viewpoint image, so that the method is difficult to realize by a parallel processing method.

In order to solve the above problems, embodiments of the present invention provide a method for generating a virtual viewpoint image through parallel processing, which can greatly accelerate the timeliness of the generation of the virtual viewpoint image at multiple angles and free viewing angles, thereby meeting the requirements of low-delay playing and real-time interaction of a video at multiple angles and improving user experience.

To make the objects, features, and advantages of the embodiments of the present invention more comprehensible to those of ordinary skill in the art, a detailed description of the embodiments of the present invention is provided below with reference to the accompanying drawings.

Referring to the flowchart of the virtual viewpoint image generation method shown in fig. 29, in a specific implementation, the virtual viewpoint image may be generated by:

and S291, acquiring an image combination of a multi-angle free view, parameter data of the image combination and preset virtual viewpoint path data, wherein the image combination comprises a plurality of angle-synchronous texture maps and depth maps with corresponding relations.

The multi-angle free view may refer to a spatial position and a view of a virtual viewpoint that enables a scene to be freely switched. The multi-angle free view range can be determined according to the requirements of an application scene.

In specific implementation, a collection array consisting of a plurality of collection devices can be arranged on site, each collection device in the collection array can be arranged at different positions of a site collection area according to a preset multi-angle free visual angle range, and each collection device can synchronously collect site images to obtain a plurality of texture maps with synchronous angles. For example, a scene may be captured by multiple cameras, video cameras, etc. at multiple angles in a synchronized image.

The images in the multi-angle freeview image combination may be images of complete freeviews. In a specific implementation, a viewing angle of 6 degrees of Freedom (DoF) is possible, that is, a spatial position of a viewpoint and a viewing angle can be freely switched. As previously described, the spatial location of the viewpoint may be represented as coordinates (x, y, z) and the viewing angle may be represented as three rotational directions

And thus may be referred to as 6DoF.

In the process of generating the virtual viewpoint image, an image combination of a multi-angle free view and parameter data of the image combination can be acquired first.

In a specific implementation, the texture map and the depth map in the image combination are in one-to-one correspondence. The texture map may adopt any type of two-dimensional image format, for example, any one of BMP, PNG, JPEG, webp format, and the like. The depth map may represent the distance of points in the scene relative to the capture device, i.e. each pixel value in the depth map represents the distance between a point in the scene and the capture device.

Texture maps in the image combination are a plurality of two-dimensional images that are synchronized. Depth data for each two-dimensional image may be determined based on the plurality of two-dimensional images.

Wherein the depth data may comprise depth values corresponding to pixels of the two-dimensional image. The distances of the acquisition device to the various points in the area to be viewed may be used as the above-mentioned depth values, which may directly reflect the geometry of the visible surface in the area to be viewed. For example, the depth value may be a distance of each point in the area to be viewed along the camera optical axis to the optical center, and the origin of the camera coordinate system may be the optical center. It will be appreciated by those skilled in the art that the distance may be a relative value, as long as the same reference is used for a plurality of images.

The depth data may include depth values corresponding one-to-one to the pixels of the two-dimensional image, or may be partial values selected from a set of depth values corresponding one-to-one to the pixels of the two-dimensional image. It will be understood by those skilled in the art that the depth value set may be stored in the form of a depth map, in a specific implementation, the depth data may be obtained by down-sampling an original depth map, and the depth value sets corresponding to pixels of the two-dimensional image (texture map) in a one-to-one manner are stored in the form of an image in which the pixels of the two-dimensional image (texture map) are arranged and stored as the original depth map.

In the embodiment of the present invention, an image combination of multi-angle free views and parameter data of the image combination can be obtained through the following steps, which are described below by specific application scenarios.

As a specific embodiment of the present invention, the method may include the following steps: the first step is acquisition and depth map calculation, comprising three main steps, respectively: multi-Camera Video Capturing, camera intra-Camera parameters Calculation (Camera Parameter Estimation), and Depth Map Calculation (Depth Map prediction). For multi-camera acquisition, it is desirable that the video acquired by the various cameras be frame-level aligned.

Texture images (Texture images), namely a plurality of synchronous images, can be obtained through video acquisition of a plurality of cameras; camera parameters (Camera Parameter), namely Parameter data of image combination, including internal Parameter data and external Parameter data, can be obtained through internal and external Parameter calculation of the Camera; through the Depth Map calculation, a Depth Map (Depth Map) can be obtained.

Multiple groups of synchronous texture maps and depth maps with corresponding relations in image combination can be spliced together to form a frame of spliced image. The stitched image may have a variety of stitching structures. Each frame of stitched images may be combined as one image. And multiple groups of texture maps and depth maps in the image combination can be spliced and combined and arranged according to a preset relationship. Specifically, the texture map and the depth map of the image combination may be divided into a texture map area and a depth map area according to a position relationship, the texture map area stores pixel values of each texture map, and the depth map area stores depth values corresponding to each texture map according to a preset position relationship. The texture map region and the depth map region may be continuous or spaced apart. In the embodiment of the invention, no limitation is made on the position relationship between the texture map and the depth map in the image combination.

In a specific implementation, the parameter data of each image in the image combination can be acquired from the attribute information of the image. The parameter data may include external parameter data and may also include internal parameter data. The external parameter data is used for describing space coordinates, postures and the like of the shooting device, and the internal parameter data is used for expressing attribute information of the shooting device, such as an optical center, a focal length and the like of the shooting device. The internal parameter data may also include distortion parameter data. The distortion parameter data includes radial distortion parameter data and tangential distortion parameter data. Radial distortion occurs during the transformation of the coordinate system of the photographing equipment to the physical coordinate system of the image. And the tangential distortion is generated in the manufacturing process of the shooting equipment because the plane of the photosensitive element is not parallel to the lens. Information such as a photographing position, a photographing angle of the image can be determined based on the external parameter data. In the virtual viewpoint image generation, the determined spatial mapping relationship can be made more accurate by combining the internal parameter data including the distortion parameter data.

In a specific implementation, the virtual viewpoint path may be set in advance. For example, for a sports game, such as a basketball game or a soccer game, an arc-shaped path may be planned in advance, and a corresponding virtual viewpoint image may be generated according to the arc-shaped path each time a brilliant shot appears, for example.

During a particular application, the virtual viewpoint path may be set based on a particular location or perspective in the scene (e.g., under the basket, around the field, referee perspective, coach perspective, etc.), or based on a particular object (e.g., a player on the field, a presenter in the scene, a spectator, an actor in a movie image, etc.).

The path data corresponding to the virtual viewpoint path may include position data of a series of virtual viewpoints in the path.

And S292, selecting texture maps and depth maps of corresponding groups of each virtual viewpoint in the virtual viewpoint paths from the image combinations according to the preset virtual viewpoint path data and the parameter data of the image combinations.

In a specific implementation, a texture map and a depth map of a corresponding group that satisfies a preset positional relationship and/or a number relationship with each virtual viewpoint position may be selected from the image combination according to the position data of each virtual viewpoint in the virtual viewpoint path data and the parameter data of the image combination. For example, for a virtual viewpoint position area with a high camera density, only texture maps and corresponding depth maps captured by two cameras closest to the virtual viewpoint may be selected, while for a virtual viewpoint position area with a low camera density, texture maps and corresponding depth maps captured by three or four cameras closest to the virtual viewpoint may be selected.

In an embodiment of the present invention, a texture map and a depth map corresponding to 2 to N acquisition devices closest to each virtual viewpoint position in a virtual viewpoint path may be respectively selected, where N is the number of all acquisition devices in an acquisition array. For example, the texture map and the depth map corresponding to the two acquisition devices closest to the respective virtual viewpoint positions may be selected by default. In specific implementation, the user may set the number of the selected acquisition devices closest to the virtual viewpoint position by himself, and the number of the acquisition devices corresponding to the image combination is not exceeded to the maximum.

By adopting the mode, no special requirement is required on the spatial position distribution of the acquisition equipment in the acquisition array (for example, linear distribution, arc array arrangement or any irregular arrangement form), the actual distribution condition of the acquisition equipment is determined according to the acquired virtual viewpoint position data and the parameter data corresponding to the image combination, and then the selection of the texture map and the depth map of the corresponding group in the image combination is selected by adopting an adaptive strategy, so that higher selection freedom and flexibility can be provided under the conditions of reducing the data calculation amount and ensuring the quality of the generated virtual viewpoint image, in addition, the installation requirement on the acquisition equipment in the acquisition array is reduced, and the method is convenient to adapt to different site requirements and installation operability.

In an embodiment of the present invention, a preset number of texture maps and depth maps of a corresponding group closest to the virtual viewpoint position are selected from the image combination according to the virtual viewpoint position data and the parameter data of the image combination.

It will be appreciated that, in particular implementations, other preset rules may also be used to select a corresponding set of texture map and depth map from the image combination. The respective sets of texture maps and depth maps may also be selected from the image combination, for example, according to the processing power of the virtual viewpoint image generation device, or according to the requirements of the user on the generation speed, the requirements on the definition of the generated image (such as normal definition, high definition, or super definition, etc.).

And S293, inputting the texture map and the depth map of the corresponding group of each virtual viewpoint into a graphics processor, and respectively performing combined rendering on the pixel points in the texture map and the depth map of the corresponding group in the selected image combination by a plurality of threads by taking the pixel points as processing units aiming at each virtual viewpoint in a virtual viewpoint path to obtain an image corresponding to the virtual viewpoint.

A Graphics Processing Unit (GPU), also called a display core, a vision processor, a display chip, etc., is a microprocessor dedicated to image and Graphics related operations, and can be configured in a personal computer, a workstation, an electronic game machine, and some electronic devices (e.g., a tablet computer, a smart phone, etc.) having image related operations requirements.

In order to enable those skilled in the art to better understand and implement the embodiments of the present invention, a brief description of the architecture of a GPU employed in some embodiments of the present invention follows. It should be noted that the GPU architecture is only a specific example, and does not limit the GPU applicable to the embodiment of the present invention.

In some embodiments of the present invention, the GPU may adopt a Unified Device Architecture (CUDA) parallel programming Architecture to perform combined rendering on pixel points in a texture map and a depth map of a corresponding group in the selected image combination. CUDA is a new hardware and software architecture for distributing and managing computations on GPUs as data parallel computing devices without mapping them to graphics Application Programming Interfaces (APIs).

When programmed by CUDA, a GPU may be viewed as a computing device capable of executing a large number of threads in parallel. It operates as a coprocessor to the host CPU or host, in other words, the data-parallel, computationally intensive part of the application running on the host is dropped onto the GPU.

More specifically, multiple executions of an application, independent of different data, may be segregated into one function that runs on the GPU device, just like many different threads. To do so, such functions may be compiled into an instruction set of the GPU device, and the resulting program (called a Kernel) downloaded onto the GPU. The Thread batch of the execution kernel is organized as Thread blocks (Thread blocks).

Thread blocks are a collection of threads that can cooperate by efficiently sharing data through some fast shared memory and synchronizing their execution to coordinate memory accesses. In particular implementations, a synchronization point may be specified in the kernel, and the threads in its thread block will suspend until they all reach the synchronization point.

In a particular implementation, the maximum number of threads that a thread block can contain is limited. However, blocks of the same dimension and size that execute the same kernel can be batched into a Grid of Blocks (Grid of Thread Blocks) so that the total number of threads that can be started in a single kernel call is much larger.

As can be seen from the above, with the CUDA structure, a large number of threads can be concurrently and concurrently processed on the GPU, and thus the speed of generating the virtual viewpoint image can be greatly increased.

For better understanding and implementation by those skilled in the art, the following describes in detail the process of processing in units of pixel points for each step of the combined rendering.

In a specific implementation, referring to the flowchart of the method for performing combined rendering by the GPU shown in fig. 30, step S293 may be implemented by the following steps:

s2931, forward mapping the depth maps of the corresponding group in parallel, and mapping the depth maps to the virtual viewpoint.

The forward mapping of the depth map is to map the depth map of the original camera (acquisition device) to the position of the virtual camera through the transformation of the coordinate space position, thereby obtaining the depth map of the virtual camera position. Specifically, the forward mapping of the depth map is an operation of mapping each pixel of the depth map of the original camera (capture device) to a virtual viewpoint according to a preset coordinate mapping relationship.

In a specific implementation, a first Kernel (Kernel) function may be run on the GPU, and pixels in the depth maps of the respective sets may be forward mapped in parallel to the corresponding virtual viewpoint positions.

The inventor finds in research and practice that in the forward mapping process, the occlusion problem of the front background and the mapping gap effect may exist, and the generated image quality is affected. First, for the problem of occlusion of the front background, in the embodiment of the present invention, for a plurality of depth values mapped to the same pixel of the virtual viewpoint, an atomic operation may be adopted to obtain a first depth map of a corresponding virtual viewpoint position by taking a value with a maximum pixel value. Then, in order to improve the influence caused by the mapping gap effect, a second depth map of the virtual viewpoint position may be created based on the first depth map of the virtual viewpoint position, each pixel in the second depth map is processed in parallel, and the maximum value of the pixel points in the preset region around the corresponding pixel position in the first depth map is taken.

In the forward mapping process, each pixel can be processed in parallel, so that the forward mapping processing speed can be greatly increased, and the timeliness of forward mapping is improved.

S2932, performs post-processing on the depth maps after the forward mapping in parallel.

After the forward mapping is finished, post-processing may be performed on the virtual viewpoint depth map, specifically, a preset second kernel function may be run on the GPU, and a median filtering process may be performed on each pixel in the second depth map obtained by the forward mapping in a preset area around the pixel position. Because the median filtering processing can be performed on each pixel in the second depth map in parallel, the post-processing speed can be greatly increased, and the aging performance of the post-processing is improved.

S2933, texture maps of the respective groups are reversely mapped in parallel.

The virtual viewpoint position is calculated out the coordinate in the texture map of the original camera according to the value of the depth map, and the corresponding value is calculated by the fractional pixel interpolation calculation. In the GPU, the sub-pixel value can be directly interpolated according to the bilinear property, so that in this step, the value can be directly taken in the original camera texture only by the coordinate calculated according to each pixel. In a specific implementation, a preset third kernel function may be run on the GPU, and the pixels in the texture maps of the selected corresponding group are interpolated in parallel, so that the corresponding virtual texture map may be generated.

By running the third kernel function on the GPU, the selected pixels in the texture maps of the corresponding groups are subjected to interpolation operation in parallel to generate corresponding virtual texture maps, so that the processing speed of reverse mapping can be greatly increased, and the timeliness of the reverse mapping is improved.

S2934, the pixels in the virtual texture maps generated by the reverse mapping are merged in parallel.

In a specific implementation, a fourth kernel function may be run on the GPU, and pixels at the same position in each virtual texture map generated after inverse mapping are weighted and fused in parallel.

And running a fourth kernel function on the GPU, and performing weighted fusion on the pixels at the same position in each virtual texture map generated after reverse mapping in parallel, so that the fusion speed of the virtual texture maps can be greatly increased, and the timeliness of image fusion is improved.

The following is a detailed description with a specific example.

In step S2931, for the forward mapping of the depth map, first, the projection mapping relationship of each pixel point may be calculated through the first Kernel function of the GPU.

Assuming a certain pixel point (u, v) in the image of the real camera, firstly, the image coordinate (u, v) is changed to the coordinate [ X, Y, Z ] under the camera coordinate system through the perspective projection model of the corresponding camera ] ^T . It will be appreciated that there are different transformation methods for the perspective projection models of the different cameras.

For example, for a perspective projection model:

wherein [ u, v,1 ]] ^T Is the homogeneous coordinate of the pixel (u, v) [ X, Y.Z ]] ^T Is (u, v) the coordinates of the corresponding real object in the camera coordinate system, f _x 、f _y Focal lengths in the x and y directions, c, respectively _x 、c _y The optical center coordinates in the x and y directions, respectively.

Therefore, for a certain pixel point (u, v) in the image, the depth value Z of the pixel and the physical parameter (f) of the corresponding camera lens are known _x 、f _y 、c _x 、c _y Obtained from the parameter data of the aforementioned image combination), the coordinates [ X, y.z ] of the corresponding point in the camera coordinate system can be obtained by the aforementioned formula (1)] ^T 。

After the conversion of the image coordinate system into the camera coordinate system, the coordinates of the object in the current camera coordinate system may be transformed into the coordinate system of the camera in which the virtual viewpoint is located according to the coordinate transformation in the three-dimensional space. Specifically, the following transformation formula can be adopted:

wherein R is ₁₂ A rotation matrix of 3x3, T ₁₂ Is a translation vector.

Suppose the three-dimensional coordinate after transformation is [ X ] ₁ ,Y ₁ ,Z ₁ ] ^T And by the description from the image coordinate system to the camera coordinate system and the application of the inverse transformation, the corresponding relation position of the transformed virtual camera three-dimensional coordinate and the virtual camera image coordinate can be obtained. Thereby, a projection relationship from the real viewpoint image to a point between the virtual viewpoint images is established. And transforming each pixel point in the real viewpoint and carrying out the rounding operation of the coordinate point to obtain a projection depth map in the virtual viewpoint image.

After the point-to-point mapping relationship between the original camera depth map and the virtual camera depth map is established, because in the projection process of the depth map, a plurality of positions in the depth map of the original camera may be mapped to the same position in the virtual camera depth map, leading to the existence of a foreground and background occlusion relationship in the depth map forward mapping process, in the embodiment of the present invention, an atomic operation may be adopted, and the smallest depth map is taken as the final result of the mapping position. As shown in equation (3):

Depth(u,v)＝min[Depth _1-N (u,v)] (3)

it should be noted that, since the value with the smallest depth value is also the value with the largest depth image pixel value, the first depth map of the corresponding virtual viewpoint position can be obtained by taking the value with the largest pixel value on the mapped depth map.

In a specific implementation, the operation of taking the maximum or minimum value of the plurality of point maps may be provided in a CUDA parallel environment, and specifically may be performed by calling an atomic operation function atomicMin or atomicMax provided in the CUDA.

In the process of obtaining the first depth map, a gap effect may be generated, that is, a part of the pixel points may not be covered due to the problem of the mapping accuracy. In view of such a problem, the embodiment of the present invention may perform gap masking processing on the obtained first depth map. In an embodiment of the present invention, the first depth map is subjected to a 3 × 3 gap masking process. The specific masking treatment process is as follows:

Firstly, creating a second depth map of a virtual viewpoint position, then, for each pixel D (x, y) in the second depth map, taking an existing pixel D _ old (x, y) within a surrounding 3 x 3 range in the first depth map of the virtual viewpoint position, and taking a maximum value of the pixel within the surrounding 3 x 3 range in the first depth map, which can be realized by the following kernel function operations:

D(x,y)＝Max[D_old(X,y)] (4)

it is understood that the size range of the surrounding area during the gap masking process may also take other values, such as 5 x 5. In order to obtain better treatment effect, the setting can be specifically carried out according to experience.

For step S2932, in a specific implementation, the second depth map of the virtual viewpoint position may be subjected to 3 × 3 or 5 × 5 median filtering. For example, for a median filter of 3 x 3, the second core mapping function of the GPU may operate according to the following formula:

in step S2933, a third kernel function running on the GPU calculates the virtual viewpoint position from the values of the depth map to the coordinates in the original camera texture map, and the third kernel function may perform the inverse implementation of step S2391.

In step S2934, for the pixel point f (x, y) at the virtual viewpoint position (x, y), the pixel values at the corresponding positions of the texture map mapped by all the original cameras may be weighted according to the confidence conf (x, y). The fourth kernel function may be calculated using the following formula:

f(x,y)＝∑conf(x,y)*f(x,y) (6)

The virtual viewpoint image can be obtained by the steps S2931 to S2934 described above. In specific implementation, the virtual texture map obtained after weighted fusion can be further processed and optimized. For example, the weighted and fused texture map may be subjected to hole filling in parallel to obtain an image corresponding to the virtual viewpoint.

For the hole filling of the virtual texture map, in a specific implementation, for each pixel, a separate windowing method may be adopted to perform parallel operations. For example, for each hole pixel, a window of size N × M may be opened, and then the value of the hole pixel is weighted according to the non-hole pixel values in the window. By the method, the generation of the virtual viewpoint image can be completely calculated on the GPU in parallel, so that the generation process can be greatly accelerated.

As shown in the schematic diagram of the hole filling method shown in fig. 31, for the generated virtual viewpoint view G, there is a hole region F, and rectangular windows a and b are respectively opened for a pixel F1 and a pixel F2 in the hole region F. Then, for the pixel F1, all pixels (or partial pixels obtained by down-sampling) are obtained from the existing non-hole pixels in the rectangular window, and the value of the pixel F1 in the hole area F is obtained by weighting according to the distance (or average weighting). Likewise, for the pixel f2, the same operation is performed, and the value of the pixel f2 can be obtained. In specific implementation, the fifth kernel function can be run on the GPU, parallelized, and accelerate the time for hole filling.

The fifth kernel function may be calculated by using the following formula:

P(x,y)＝Average[Window(x,y)] (7)

where P (x, y) is the value of a certain point in the hole, window (x, y) is the value (or down-sampled value) of all existing pixels in the hole area, and Average is the Average (or weighted Average) value of these pixels.

In the embodiment of the present invention, in addition to the generation of the virtual viewpoint images at the respective virtual viewpoint positions in parallel in units of pixels, in order to further increase the generation efficiency of the virtual viewpoint path image, texture maps and depth maps of respective groups of virtual viewpoints in the virtual viewpoint path may be respectively input to the multiple GPUs, and the multiple virtual viewpoint images may be generated in parallel.

In a specific implementation, to further improve the processing efficiency, the above steps may be performed by different block grids respectively.

Referring to the schematic structural diagram of the virtual viewpoint image generation system shown in fig. 32, in an embodiment of the present invention, the virtual viewpoint image generation system 320 may include a CPU321 and a GPU322, where:

the CPU321 is adapted to obtain an image combination of a multi-angle free view, parameter data of the image combination, and preset virtual viewpoint path data, wherein the image combination includes multiple sets of texture maps and depth maps having a corresponding relationship and being synchronized by multiple angles; selecting a texture map and a depth map of a corresponding group of each virtual viewpoint in the virtual viewpoint path from the image combination according to the preset virtual viewpoint path data and the parameter data of the image combination;

And the GPU322 is adapted to call a corresponding core function for each virtual viewpoint in the virtual viewpoint path, and perform combined rendering on pixel points in a texture map and a depth map of a corresponding group in the selected image combination in parallel to obtain an image corresponding to the virtual viewpoint.

In particular, the GPU322 is adapted to forward map respective sets of depth maps in parallel, onto the virtual viewpoints; carrying out post-processing on the depth map subjected to forward mapping in parallel; reversely mapping the texture maps of the corresponding groups in parallel; and merging the pixels in each virtual texture map generated after the reverse mapping in parallel.

The GPU322 may generate the virtual viewpoint images of the virtual viewpoints by using the steps S2931 to S2934 and the like in the virtual viewpoint image generating method and the hole filling step, which may be specifically described in the foregoing embodiments and will not be described herein again.

In a specific implementation, there may be one or more GPUs, as shown in fig. 32.

In specific application, the GPU may be an independent GPU chip, or a GPU core in one GPU chip, or one GPU server, or a GPU chip formed by packaging multiple GPU chips or multiple GPU cores, or a GPU cluster formed by multiple GPU servers.

Accordingly, the texture map and the depth map of the corresponding group of each virtual viewpoint in the virtual viewpoint path may be respectively input to a plurality of GPU chips, a plurality of GPU cores, or a plurality of GPU servers, and a plurality of virtual viewpoint images may be generated in parallel. For example, the virtual viewpoint path data corresponding to a certain virtual viewpoint path contains 20 virtual viewpoint position coordinates in total, and the data corresponding to the 20 virtual viewpoint position coordinates can be input into a plurality of GPU chips in parallel, for example, 10 GPU chips in total, so that the data corresponding to the 20 virtual viewpoint position coordinates can be processed in parallel in two batches, and each GPU chip can generate a virtual viewpoint image corresponding to a virtual viewpoint position in parallel by taking a pixel as a unit, thereby greatly increasing the generation speed of the virtual viewpoint image and improving the aging performance of the virtual viewpoint image generation.

Referring to the structural schematic diagram of the electronic device shown in fig. 33, the electronic device 330 may include a memory 331, a CPU332, and a GPU333, where the memory 331 stores computer instructions that can be executed on the CPU332 and the GPU333, and when the CPU332 and the GPU333 cooperatively execute the computer instructions, the method for generating a virtual viewpoint image according to any of the foregoing embodiments of the present invention is suitable for being executed.

In a specific implementation, the electronic device may be one server or a server cluster formed by a plurality of servers.

The above embodiments can be applied to live scenes, and two or more embodiments can be used in combination as needed in the application process. Those skilled in the art can understand that the above embodiment scheme is not limited to a live broadcast scene, and the schemes in the embodiment of the present invention for video or image acquisition, data processing of a video data stream, image generation of a server, and the like may also be applicable to the playing requirements of a non-live broadcast scene, such as recorded broadcast, rebroadcast, and other scenes with low delay requirements.

The specific implementation, working principle, specific action and effect of each device or system in the embodiments of the present invention may be referred to the specific description in the corresponding method embodiments.

The embodiment of the present invention further provides a computer-readable storage medium, on which computer instructions are stored, and when the computer instructions are executed, the steps of the method according to any of the above embodiments of the present invention may be executed.

The computer readable storage medium may be various suitable readable storage media such as an optical disc, a mechanical hard disc, a solid state hard disc, and the like. For the method executed by the instructions stored in the computer-readable storage medium, reference may be made to the embodiments of the foregoing methods, and details are not described again.

The embodiment of the present invention further provides a server, which includes a memory and a processor, where the memory stores computer instructions executable on the processor, and the processor, when executing the computer instructions, may perform the steps of the method according to any one of the above embodiments of the present invention. The specific implementation of the method executed when the computer instruction runs may refer to the steps of the method in the above embodiments, and details are not described again.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A virtual viewpoint image generation method, comprising:

acquiring an image combination of a multi-angle free visual angle, parameter data of the image combination and preset virtual viewpoint path data, wherein the image combination comprises a plurality of angle-synchronous texture maps and depth maps with corresponding relations, and the multi-angle free visual angle refers to a space position and a visual angle of a virtual viewpoint for freely switching scenes;

2. The method for generating a virtual viewpoint image according to claim 1, wherein the rendering, by combining pixel points in a texture map and a depth map of a corresponding group in a selected image combination by a plurality of threads using pixel points as processing units for each virtual viewpoint in a virtual viewpoint path, comprises:

reversely mapping the texture maps of the corresponding groups in parallel;

3. The virtual viewpoint image generating method according to claim 2, wherein the mapping of the respective sets of depth maps onto the virtual viewpoints in parallel by forward mapping comprises:

running a first kernel function on the graphics processor, forward mapping pixels in the respective sets of depth maps in parallel to corresponding virtual viewpoint positions, wherein: adopting atomic operation to obtain a maximum value of a pixel value for a plurality of depth values mapped to the same pixel of the virtual viewpoint to obtain a first depth map of a corresponding virtual viewpoint position; and creating a second depth map of the virtual viewpoint position based on the first depth map of the virtual viewpoint position, performing parallel processing on each pixel in the second depth map, and taking the maximum value of pixel points in a preset area around the corresponding pixel position in the first depth map.

4. The virtual-viewpoint-image generating method according to claim 2, wherein the post-processing of the depth map after the forward mapping in parallel includes:

and running a second kernel function on the graphics processor, and performing median filtering processing on each pixel in a second depth map obtained after forward mapping in a preset area around the position of the pixel.

5. The virtual-viewpoint-image generating method according to claim 2, wherein the inverse mapping of the texture maps of the respective groups in parallel includes:

and running a third kernel function on the graphics processor, and performing interpolation operation on the pixels in the texture maps of the selected corresponding groups in parallel to generate corresponding virtual texture maps.

6. The method for generating a virtual-viewpoint image according to claim 2, wherein the fusing pixels in the respective virtual texture maps generated by inverse mapping in parallel includes:

and running a fourth kernel function on the graphics processor, and performing weighted fusion on the pixels at the same position in each virtual texture map generated after reverse mapping in parallel.

7. The method according to any one of claims 2 to 6, further comprising, after fusing pixels in the respective virtual texture maps generated by inverse mapping in parallel:

and filling holes in parallel for each pixel in the texture map after weighted fusion to obtain an image corresponding to the virtual viewpoint.

8. The method according to claim 1, wherein the inputting of the texture map and the depth map of the corresponding set of each virtual viewpoint into a graphics processor comprises:

And respectively inputting texture maps and depth maps of corresponding groups of virtual viewpoints in the virtual viewpoint path into a plurality of graphics processors, and processing the texture maps and the depth maps in parallel by the graphics processors to generate a plurality of virtual viewpoint images.

9. A virtual visual point image generation system, comprising:

the system comprises a central processing unit, a video processing unit and a video processing unit, wherein the central processing unit is suitable for acquiring an image combination of a multi-angle free visual angle, parameter data of the image combination and preset virtual viewpoint path data, the image combination comprises a plurality of angle-synchronous texture maps and depth maps with corresponding relations, and the multi-angle free visual angle refers to a spatial position and a visual angle of a virtual viewpoint for freely switching a scene; selecting a texture map and a depth map of a corresponding group of each virtual viewpoint in the virtual viewpoint path from the image combination according to the preset virtual viewpoint path data and the parameter data of the image combination;

10. The virtual viewpoint image generating system according to claim 9, wherein the graphics processor is adapted to forward map respective sets of depth maps in parallel onto the virtual viewpoint; carrying out post-processing on the depth map subjected to forward mapping in parallel; reversely mapping the texture maps of the corresponding groups in parallel; and merging the pixels in each virtual texture map generated after the reverse mapping in parallel.

11. An electronic device, comprising: memory, central processing unit and graphics processing unit, said memory having stored thereon computer instructions executable on said central processing unit and graphics processing unit, wherein said central processing unit and said graphics processing unit when executing said computer instructions in cooperation perform the steps of the method of any one of claims 1 to 8.

12. A computer readable storage medium having computer instructions stored thereon, wherein the computer instructions when executed perform the steps of the method of any one of claims 1 to 8.