WO2021083177A1

WO2021083177A1 - Method for generating depth map, computing nodes, computing node cluster, and storage medium

Info

Publication number: WO2021083177A1
Application number: PCT/CN2020/124277
Authority: WO
Inventors: 盛骁杰
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2019-10-28
Filing date: 2020-10-28
Publication date: 2021-05-06
Also published as: CN112734821A; CN112734821B

Abstract

A method for generating a depth map, computing nodes, a computing node cluster, and a storage medium, the method comprising: texture data are received; a first computing node in the computing node cluster carries out a first depth calculation according to first texture data and second texture data so as to obtain a first rough depth map; the first computing node synchronizes the first rough depth map to other computing nodes so as to obtain a rough depth map set; the first computing node uses a third rough depth map to verify a second rough depth map in the rough depth map set so as to obtain an unstable region in the second rough depth map; and the first computing node carries out a second depth calculation according to the unstable region in the second rough depth map, texture data corresponding to the second rough depth map and texture data corresponding to the third rough depth map, so as to obtain a corresponding fine depth map as a final generated depth map. The described solution can increase the rate of depth map generation.

Description

Depth map generation method, computing node and computing node cluster, storage medium

This application claims the priority of a Chinese patent application filed on October 28, 2019 with the application number 201911033628.0 and the invention title "Depth Map Generation Method, Computing Nodes and Computing Node Clusters, and Storage Media", the entire contents of which are incorporated by reference In this application.

Technical field

The embodiment of the present invention relates to the field of image processing technology, and in particular to a depth map generation method, computing nodes, computing node clusters, and storage media.

Background technique

6 Degrees of Freedom (6DoF) technology is a technology to provide a high degree of freedom viewing experience. Users can adjust the viewing angle through interactive operations during viewing, so that they can view from the free point of view they want to watch .

The generation of 6DoF data for interaction includes multiple steps such as camera parameter calculation, depth map calculation, and depth image based rendering (DIBR).

However, at present, the calculation time of the depth map is relatively long, and how to reduce the calculation time of the depth map and increase the speed of the depth map calculation has become an urgent problem to be solved.

Summary of the invention

The problem solved by the embodiment of the present invention increases the rate of depth map calculation in the 6DoF data calculation process.

In order to solve the above problems, the embodiment of the present invention discloses a depth map generation method, which uses multiple computing nodes in a computing node cluster to generate the depth map respectively, and the generation method includes:

Receiving texture data, where the texture data is synchronously collected by multiple collection devices in the same collection array;

The first computing node in the computing node cluster performs a first depth calculation according to the first texture data and the second texture data to obtain a first rough depth map, wherein: the first computing node is any of the computing node clusters A computing node, where the first texture data and the first computing node meet a preset first mapping relationship; the second texture data is a first space that meets a preset first space with the collection device of the first texture data Texture data collected by location-based collection equipment;

The first computing node synchronizes the first rough depth map to the remaining computing nodes in the computing node cluster to obtain a rough depth atlas;

For the second rough depth map in the rough depth map set, the first computing node uses a third rough depth map for verification to obtain an unstable region in the second rough depth map, wherein: the second rough depth map The depth map and the first computing node meet a preset second mapping relationship; the third rough depth map is a collection device whose acquisition device corresponding to the second rough depth map meets a preset second spatial position relationship Corresponding rough depth map;

The first calculation node performs a second depth calculation according to the unstable region in the second rough depth map, the texture data corresponding to the second rough depth map, and the texture data corresponding to the third rough depth map, Obtain a corresponding fine depth map, where: the depth map candidate value in the second rough depth map selected by the second depth calculation does not include the depth value of the unstable region;

The fine depth atlas of the fine depth map obtained by each computing node is used as the final generated depth map.

Optionally, the second texture data is texture data collected by a collection device that satisfies a preset first distance relationship and/or a first quantitative relationship with the collection device of the first texture data; the third rough The texture data corresponding to the depth map is the texture data collected by the collection device corresponding to the second rough depth map that satisfies the preset second distance relationship and/or the second quantitative relationship.

Optionally, the second texture data is texture data collected by a first preset number of collection devices that are closest to the location of the collection device of the first texture data; texture data corresponding to the third rough depth map It is texture data collected by a second preset number of collection devices that are closest to the location of the collection device corresponding to the second rough depth map.

Optionally, the first preset number of collection devices and the second preset number of collection devices are both 1 to N-1 collection devices, and N is the total number of collection devices in the collection array .

Optionally, the first preset number is equal to the second preset number.

Optionally, the first mapping relationship between the first computing node and the first texture data is obtained in the following manner: the allocation node obtains the first mapping relationship based on a preset first mapping relationship table or through random mapping ; Obtain the second mapping relationship between the first computing node and the second rough depth map in the following manner: the allocation node obtains the second mapping relationship based on a preset second mapping relationship table or through random mapping.

The embodiment of the present invention provides a computing node, which is suitable for forming a computing node cluster with at least another computing node to generate a depth map, and the computing node includes:

The input unit is adapted to receive texture data, which originates from synchronous collection of multiple collection devices in the same collection array;

The first depth calculation unit is adapted to perform a first depth calculation according to the first texture data and the second texture data to obtain a first rough depth map, wherein: the first texture data and the calculation node satisfy the preset first depth map A mapping relationship; the second texture data is texture data collected by a collection device that meets a preset first spatial position relationship with the collection device of the first texture data;

A synchronization unit, adapted to synchronize the first rough depth map to the remaining computing nodes in the computing node cluster to obtain a rough depth atlas;

The verification unit, for the second rough depth map in the rough depth map set, is adapted to use a third rough depth map for verification to obtain an unstable region in the second rough depth map, wherein: the second rough depth map The map and the computing node meet the preset second mapping relationship; the third rough depth map is a rough map corresponding to the acquisition device corresponding to the second rough depth map that satisfies the second preset spatial position relationship. Depth map

The second depth calculation unit is adapted to perform the second depth based on the unstable region in the second rough depth map, the texture data corresponding to the second rough depth map, and the texture data corresponding to the third rough depth map. Calculation to obtain a corresponding fine depth map, where: the depth map candidate value in the second rough depth map selected by the second depth calculation does not include the depth value of the unstable region;

The output unit is adapted to output the fine depth map, so that the computing node cluster obtains the fine depth map set as the final generated depth map.

The embodiment of the present invention provides another computing node, including a memory and a processor, the memory stores computer instructions that can run on the processor, and the processor executes the foregoing instructions of the present invention when the computer instructions are executed. Steps of the depth map generating method described in any embodiment.

The embodiment of the present invention provides a computing node cluster, including multiple computing nodes, the multiple computing nodes include a first computing node, and the first computing node is any computing node in the computing node cluster, wherein:

The first calculation node is adapted to perform a first depth calculation according to the first texture data and the second texture data in the received texture data to obtain a first rough depth map; and synchronize the first rough depth map to The remaining computing nodes in the computing node cluster obtain a rough depth atlas; for the second rough depth map in the rough depth atlas, the third rough depth map is used for verification, and the second rough depth map is obtained An unstable region; and performing a second depth calculation according to the unstable region in the second rough depth map, the texture data corresponding to the second rough depth map, and the texture data corresponding to the third rough depth map, to obtain A corresponding fine depth map, and outputting the obtained fine depth map so that the computing node cluster uses the obtained fine depth map set as the final generated depth map;

Wherein, the first texture data meets a preset first mapping relationship with the first computing node; the second texture data meets a preset first spatial position relationship with the acquisition device of the first texture data The texture data collected by the acquisition device; the second rough depth map and the first computing node meet the preset second mapping relationship; the third rough depth map is the acquisition corresponding to the second rough depth map The device satisfies the rough depth map corresponding to the acquisition device with the preset second spatial position relationship; and the depth map candidate value in the second rough depth map selected by the second depth calculation does not include the depth value of the unstable region.

The embodiment of the present invention also provides a computer-readable storage medium on which computer instructions are stored, and the computer instructions are executed during the steps of the depth map generating method according to any of the foregoing embodiments of the present invention.

Compared with the prior art, the technical solution of the present invention has the following beneficial effects:

In the embodiment of the present invention, as the first computing node of any computing node in the computing node cluster, the first depth calculation is performed according to the first texture data and the second texture data in the received texture data to obtain the first rough depth Graph, and synchronize the first rough depth map to the rest of the computing nodes in the computing node cluster to obtain a rough depth atlas, and then use the third rough depth for the second rough depth map in the rough depth atlas The map is verified to obtain the unstable region in the second rough depth map, and the second depth calculation is performed to obtain the corresponding fine depth map, where: the depth map in the second rough depth map selected by the second depth calculation The candidate value does not include the depth value of the unstable region. On the one hand, multiple computing nodes can simultaneously generate depth maps in parallel and batch processing on the texture data synchronously collected by the same acquisition array, which can greatly improve the efficiency of depth map generation. In addition, the above solution is adopted to eliminate unstable regions in the generated depth map through secondary depth calculation, so the obtained fine depth map is more accurate, and the quality of the generated multi-angle free-view image can be improved.

Description of the drawings

Figure 1 is a schematic structural diagram of a data processing system in an embodiment of the present invention;

Figure 2 is a flowchart of a data processing method in an embodiment of the present invention;

Figure 3 is a schematic structural diagram of a data processing system in an application scenario in an embodiment of the present invention;

4 is a schematic diagram of an interactive interface of an interactive terminal in an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a server in an embodiment of the present invention;

Figure 6 is a flowchart of a data exchange method in an embodiment of the present invention;

Figure 7 is a schematic structural diagram of another data processing system in an embodiment of the present invention;

Figure 8 is a schematic structural diagram of a data processing system in another application scenario in an embodiment of the present invention;

Figure 9 is a schematic structural diagram of an interactive terminal in an embodiment of the present invention;

FIG. 10 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention;

FIG. 11 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention;

FIG. 12 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention;

FIG. 13 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention;

FIG. 14 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention;

Figure 15 is a flowchart of another data processing method in an embodiment of the present invention;

16 is a flowchart of a method for intercepting synchronized video frames in a compressed video data volume in an embodiment of the present invention;

Figure 17 is a flowchart of another data processing method in an embodiment of the present invention;

FIG. 18 is a schematic structural diagram of a data processing device in an embodiment of the present invention;

Figure 19 is a schematic structural diagram of another data processing system in an embodiment of the present invention;

Figure 20 is a flowchart of a data synchronization method in an embodiment of the present invention;

FIG. 21 is a timing diagram of a pull stream synchronization in an embodiment of the present invention;

FIG. 22 is a flowchart of another method for intercepting synchronized video frames in a compressed video data volume in an embodiment of the present invention;

Figure 23 is a schematic structural diagram of another data processing device in an embodiment of the present invention;

24 is a schematic diagram of the structure of a data synchronization system in an embodiment of the present invention;

25 is a schematic structural diagram of a data synchronization system in an application scenario in an embodiment of the present invention;

FIG. 26 is a flowchart of a method for generating a depth map in an embodiment of the present invention;

Figure 27 is a schematic structural diagram of a server in an embodiment of the present invention;

28 is a schematic diagram of depth map processing performed by a server cluster in an embodiment of the present invention;

FIG. 29 is a flowchart of a method for generating a virtual viewpoint image in an embodiment of the present invention;

FIG. 30 is a flowchart of a method for GPU to perform combined rendering in an embodiment of the present invention;

FIG. 31 is a schematic diagram of a method for filling holes in an embodiment of the present invention;

Fig. 32 is a schematic structural diagram of a virtual viewpoint image generation system in an embodiment of the present invention;

FIG. 33 is a schematic structural diagram of an electronic device in an embodiment of the present invention.

Figure 34 is a schematic structural diagram of another data synchronization system in an embodiment of the present invention;

35 is a schematic structural diagram of another data synchronization system in an embodiment of the present invention;

Fig. 36 is a schematic structural diagram of a collection device in an embodiment of the present invention.

Fig. 37 is a schematic diagram of a collection array in an application scenario in an embodiment of the present invention.

Fig. 38 is a schematic structural diagram of another data processing system in an embodiment of the present invention.

Figure 39 is a schematic structural diagram of another interactive terminal in an embodiment of the present invention.

Fig. 40 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.

Figure 41 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.

Fig. 42 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.

FIG. 43 is a schematic diagram of the connection of an interactive terminal in an embodiment of the present invention.

FIG. 44 is a schematic diagram of interactive operation of an interactive terminal in an embodiment of the present invention.

Fig. 45 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.

Fig. 46 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.

Fig. 47 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.

Fig. 48 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.

Detailed ways

In traditional broadcasting scenes such as live broadcasting, rebroadcasting and recording and broadcasting, as mentioned above, users can only watch the game through one viewpoint during the viewing process, and cannot freely switch the viewpoint position by themselves to watch the game scenes from different viewpoints. Or during the game, it is impossible to experience the feeling of watching the game while moving the viewpoint on the spot.

The use of 6 Degrees of Freedom (6DoF) technology can provide a high degree of freedom viewing experience. Users can adjust the viewing angle of the video through interactive means during the viewing process, and watch from the free point of view they want to watch, thus greatly To enhance the viewing experience.

In order to realize 6DoF scenes, there are currently Free-D playback technology, light field rendering technology and 6DoF video generation technology based on depth map. Among them, the Free-D playback technology is to obtain the point cloud data of the scene through multi-angle shooting to express the 6DoF image, and the light field rendering technology is to obtain the depth of field information and three-dimensional position information of the pixels through the focal length and spatial position changes of the dense light field. Then express the 6DoF image. The 6DoF video generation method based on the depth map is based on the virtual viewpoint position and the corresponding parameter data of the texture map and depth map of the corresponding group, and the texture map and depth map of the corresponding group in the image combination of the video frame at the time of user interaction are performed. Combine rendering to reconstruct 6DoF images or videos.

For example, when the Free-D playback solution is used in the field, a large number of cameras need to be used for raw data collection, and collected to the on-site computer room through a digital component serial interface (SDI) capture card, and then through the on-site computer room The computing server processes the original data, obtains point cloud data that expresses the three-dimensional position and pixel information of all points in the space, and reconstructs the 6DoF scene. This solution makes the amount of data collected, transmitted and calculated on-site extremely large, especially for broadcast scenes such as live broadcast and rebroadcast that have high requirements on the transmission network and computing server. The implementation cost of the 6DoF reconstruction scene is too high and there are too many restrictions. Moreover, there are currently no good technical standards and industrial-grade software and hardware to support point cloud data. Therefore, it takes a long time to process data from the original data collection on site to the final 6DoF reconstruction scene, which cannot meet the requirements of multiple angles. The demand for low-latency playback and real-time interaction of free-view videos.

For another example, when using a light field rendering solution in the field, it is necessary to obtain the depth of field information and three-dimensional position information of the pixels through the focal length and spatial position changes of the dense light field. Because the resolution of the light field image obtained by the dense light field is too large, it often needs to be decomposed. There are hundreds of conventional two-dimensional pictures. Therefore, this solution also makes the amount of data collected, transmitted and calculated on site is extremely large. It has high requirements on the on-site transmission network and computing server. The implementation cost is too high and restrictive conditions Too much to process data quickly. Moreover, the technical means of reconstructing 6DoF scenes through light field images is still under experimental exploration, and currently cannot effectively meet the needs of low-latency playback and real-time interaction of multi-angle free-view videos.

To sum up, whether it is Free-D playback technology or light field rendering technology, the demand for storage and calculation is very large, so it is necessary to deploy a large number of servers on site for processing, resulting in high implementation costs and excessive restrictions. , Unable to process data quickly, and thus unable to meet the needs of viewing and interaction, which is not conducive to popularization.

Although the 6DoF video reconstruction method based on the depth map can reduce the amount of data calculations in the video reconstruction process, it is difficult to meet the low-latency playback of multi-angle free-view videos due to the constraints of network transmission bandwidth and device decoding capabilities. And the need for real-time interaction.

In response to the above problems, some embodiments of the present invention propose a multi-angle free-view image generation scheme, which adopts a distributed system architecture, in which an acquisition array composed of multiple acquisition devices is set in the field acquisition area to perform synchronous acquisition of frame images from multiple angles. , The frame image collected by the acquisition device is intercepted by the data processing device according to the frame interception instruction, and the server uses the frame images of multiple synchronized video frames uploaded by the data processing device as the image combination, and the image combination can be determined Corresponding parameter data and the depth data of each frame of the image in the image combination, based on the corresponding parameter data of the image combination, the pixel data and depth data of the preset frame image in the image combination, to the preset virtual viewpoint path Frame image reconstruction is performed to obtain corresponding multi-angle free-view video data, and the multi-angle free-view video data is inserted into the to-be-played data stream of the playback control device for transmission to the playback terminal for playback.

With reference to a schematic structural diagram of a data processing system for an application scenario in an embodiment of the present invention, the data processing system 10 includes: a data processing device 11, a server 12, a playback control device 13, and a playback terminal 14. The data processing device 11 can The frame images collected by the acquisition array in the acquisition area are intercepted for video frames. By intercepting the video frames that are to generate multi-angle free-view images, a large amount of data transmission and data processing can be avoided. After that, the server 12 performs multi-angle free-viewing The image generation can make full use of the powerful computing power of the server to quickly generate multi-angle free-view video data, which can be inserted into the data stream to be played by the playback control device in time, and realize multi-angle free-view video data at low cost. Play to meet the needs of users for low-latency playback and real-time interaction of multi-angle free-view videos.

In order to enable those skilled in the art to more clearly understand and implement the embodiments of this specification, the technical solutions in the embodiments of this specification will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of this specification. Obviously, the described implementation The examples are a part of the embodiments of this specification, not all the embodiments. Based on the embodiments in this specification, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this specification.

Referring to the flowchart of the data processing method shown in FIG. 2, in the embodiment of the present invention, the following steps may be specifically included:

S21: Receive frame images of multiple synchronized video frames uploaded by the data processing device as an image combination.

Wherein, the multiple synchronized video frames are obtained by the data processing device based on the video frame interception instruction, which is obtained by intercepting the video frame at the specified frame time from the multiple video data streams synchronously collected and uploaded in real time from different locations in the field collection area, The shooting angles of the multiple synchronized video frames are different.

In a specific implementation, the video frame interception instruction may include information about a specified frame time, and the data processing device intercepts the corresponding frame time from multiple video data streams according to the information about the specified frame time in the video frame interception instruction Video frames. Wherein, the designated frame time may be in units of frames, the Nth to Mth frames are regarded as the designated frame time, N and M are both integers not less than 1, and N≤M; or, the designated frame time may also be The unit of time is X to Y seconds as the designated frame time, X and Y are both positive numbers, and X≤Y. Therefore, the multiple synchronized video frames may include all frame-level synchronized video frames corresponding to a specified frame moment, and the pixel data of each video frame forms a corresponding frame image.

For example, according to the received video frame interception instruction, the data processing equipment can obtain the specified frame time as the second frame of the multiple video data streams, and the data processing equipment intercepts the video frames of the second frame of each video data stream. And the frame-level synchronization between the video frames of the second frame of each intercepted video data stream is used as the obtained multiple synchronized video frames.

For another example, assuming that the capture frame rate is set to 25fps, that is, 25 frames are captured per second, the data processing device can obtain the video frames within the first second of the multiple video data streams at the specified frame time according to the received video frame interception instruction , The data processing equipment can respectively intercept 25 video frames within the first second in each video data stream, and the first video frame within the first second in each intercepted video data stream is synchronized at the frame level. Frame-level synchronization between the second video frame in the first second in each video data stream, until the frame-level synchronization between the 25th video frame in the first second in each video data stream is taken as the acquisition Multiple synchronized video frames obtained.

For another example, the data processing device can obtain the second and third frames of the multiple video data streams at the specified frame time according to the received video frame interception instruction, and the data processing device can respectively intercept the first and third frames in each video data stream. 2 frames of video frame and 3rd frame of video frame, and frame-level synchronization between the 2nd frame and the 3rd frame of the intercepted video data streams, as multiple synchronized video frames .

S22: Determine parameter data corresponding to the image combination.

In a specific implementation, the parameter data corresponding to the image combination can be obtained through a parameter matrix, and the parameter matrix can include an internal parameter matrix, an external parameter matrix, a rotation matrix and a translation matrix, and the like. As a result, the relationship between the three-dimensional geometric position of a specified point on the surface of the space object and its corresponding point in the image combination can be determined.

In the embodiment of the present invention, a Structure From Motion (SFM) algorithm can be used to perform feature extraction, feature matching, and global optimization on the obtained image combination based on a parameter matrix, and the obtained parameter estimation value is used as the image combination The corresponding parameter data. Among them, the algorithm used for feature extraction can include any of the following: Scale-Invariant Feature Transform (SIFT) algorithm, Speeded-Up Robust Features (SURF) algorithm, and accelerated segment test features ( Features from Accelerated Segment Test, FAST) algorithm. Algorithms used for feature matching may include: Euclidean distance calculation method, Random Sample Consensus (RANSC) algorithm, etc. Algorithms for global optimization may include: Bundle Adjustment (BA) and so on.

S23: Determine the depth data of each frame image in the image combination.

In a specific implementation, the depth data of each frame image may be determined based on multiple frame images in the image combination. Wherein, the depth data may include depth values corresponding to pixels of each frame image in the image combination. The distance from the collection point to each point in the scene can be used as the aforementioned depth value, and the depth value can directly reflect the geometric shape of the visible surface in the area to be viewed. For example, taking the origin of the shooting coordinate system as the optical center, the depth value may be the distance from each point in the scene to the optical center along the shooting optical axis. Those skilled in the art can understand that the above distance may be a relative value, and multiple frames of images may use the same reference.

In an embodiment of the present invention, a binocular stereo vision algorithm may be used to calculate the depth data of each frame of image. In addition, the depth data can also be indirectly estimated by analyzing the features such as the luminosity feature, the light and dark feature of the frame image.

In another embodiment of the present invention, a multi-view stereo (MVS) algorithm may be used to reconstruct the frame image. In the reconstruction process, all pixels can be used for reconstruction, or the pixels can be down-sampled and only part of the pixels can be used for reconstruction. Specifically, the pixel points of each frame image can be matched, the three-dimensional coordinates of each pixel point can be reconstructed to obtain points with image consistency, and then the depth data of each frame image can be calculated. Alternatively, the pixel points of the selected frame image may be matched, and the three-dimensional coordinates of the pixel points of each selected frame image may be reconstructed to obtain points with image consistency, and then the depth data of the corresponding frame image may be calculated. Among them, the pixel data of the frame image corresponds to the calculated depth data. The method of selecting the frame image can be set according to the specific situation. For example, the distance between the frame image of the depth data and other frame images can be calculated according to the needs, and the selected part Frame image.

S24: Based on the parameter data corresponding to the image combination, the pixel data and the depth data of the preset frame image in the image combination, frame image reconstruction is performed on the preset virtual viewpoint path to obtain corresponding multi-angle free-view video data.

The multi-angle free-view video data may include: multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted according to frame time.

Among them, the pixel data of the frame image can be any of YUV data or RGB data, or other data that can express the frame image; the depth data can include the depth value corresponding to the pixel data of the frame image, Alternatively, it may be selected from a set of depth values corresponding to the pixel data of the frame image, and the specific selection method depends on the specific scenario; the virtual viewpoint is selected from a range of multi-angle free viewing angles, and the multiple The angle-free viewing angle range is a range that supports switching of viewpoints in the area to be viewed.

In a specific implementation, the preset frame image may be all the frame images in the image combination, or may be a selected partial frame image. Among them, the selection method can be set according to the specific situation, for example, according to the positional relationship between the collection points, part of the frame images in the corresponding position in the image combination can be selected; for example, according to the desired frame time or frame period , Select part of the frame image of the corresponding frame time in the image combination.

Since the preset frame images can correspond to different frame moments, each virtual viewpoint in the virtual viewpoint path can be corresponded to each frame time, and the corresponding frame image can be obtained according to the frame time corresponding to each virtual viewpoint, and then based on The image combines the corresponding parameter data, the depth data and the pixel data of the frame image corresponding to the frame time of each virtual viewpoint, and reconstructs the frame image of each virtual viewpoint to obtain the corresponding multi-angle free-view video data. At this time, the multi-angle The free-view video data may include: multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted by frame time. In other words, in a specific implementation, in addition to realizing a multi-angle free-view image at a certain moment, a sequential or non-continuous multi-angle free-view video can also be realized.

In an embodiment of the present invention, the image combination includes A synchronized video frames, where a1 synchronized video frames correspond to the first frame moment, and a2 synchronized video frames correspond to the second frame moment, a1+a2=A; and , Preset a virtual viewpoint path composed of B virtual viewpoints, where b1 virtual viewpoints correspond to the first frame moment, b2 virtual viewpoints correspond to the second frame moment, b1+b2≤2B, based on the image Combine the corresponding parameter data, the pixel data and the depth data of the frame image of a1 synchronized video frame at the first frame time, perform the first frame image reconstruction on the path composed of b1 virtual viewpoints, and combine the corresponding parameter data based on the image , The pixel data and depth data of the frame images of a21 synchronized video frames at the first frame time, the second frame of image reconstruction is performed on the path composed of b2 virtual viewpoints, and the corresponding multi-angle free-view video data is finally obtained. The multi-angle free-view video data may include: multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted according to frame time.

It is understandable that the designated frame time and the virtual viewpoint can be divided into more fine-grained divisions, thereby obtaining more synchronized video frames and virtual viewpoints corresponding to different frame times. The above-mentioned embodiment is only an example for illustration, not a specific implementation. limits.

In the embodiment of the present invention, a depth map-based image rendering (DIBR) algorithm can be used to combine the corresponding parameter data and the preset virtual viewpoint path according to the image to determine the pixels of the preset frame image Data and depth data are combined for rendering, so as to realize the frame image reconstruction based on the preset virtual viewpoint path, and obtain the corresponding multi-angle free-view video data.

S25: Insert the multi-angle free-view video data into the to-be-played data stream of the playback control device and play it through the playback terminal.

The playback control device can take multiple video data streams as input, where the video data stream can come from each collection device in the collection array or from other collection devices. The playback control device can select one input video data stream as the data stream to be played according to needs. Among them, the multi-angle free-view video data obtained in step S24 can be inserted into the data stream to be played, or switched by the video data stream of other input interfaces. To the input interface of the multi-angle free view video data, the playback control device outputs the selected data stream to be played to the playback terminal, which can be played through the playback terminal, so the user can watch the video with multiple free perspectives through the playback terminal image. The playback terminal can be a video playback device such as a TV, a mobile phone, a tablet, a computer, or an electronic device that includes a display screen.

In a specific implementation, the multi-angle free-view video data of the data stream to be played inserted into the playback control device can be retained in the playback terminal to facilitate time-shift viewing by the user, where the time-shift can be a pause or rewind when the user is watching , Fast forward to the current moment and other operations.

Using the above data processing method, the data processing equipment in the distributed system architecture can be used to process the interception of the specified video frame and the reconstruction of the multi-angle free-view video after the server intercepts the preset frame image, which can avoid the deployment of a large number of servers on site. Processing, it can also avoid directly uploading the video data stream collected by the collection device of the collection array, so it can save a lot of transmission resources and server processing resources, and in the case of limited network transmission bandwidth, it allows multiple angles of the specified video frame to be free Viewing angle video can be reconstructed in real time, realizing low-latency playback of multi-angle free-viewing video, reducing the limitation of network transmission bandwidth, thus reducing implementation costs, reducing restrictions, easy to implement, and satisfying low-latency playback and low-latency playback of multi-angle free-viewing video The need for real-time interaction.

In a specific implementation, the depth of the preset frame image in the image combination may be determined according to the relationship between the virtual parameter data of each virtual viewpoint in the preset virtual viewpoint path and the parameter data corresponding to the image combination. The data are respectively mapped to the corresponding virtual viewpoint; according to the pixel data and depth data of the preset frame image respectively mapped to the corresponding virtual viewpoint, and the preset virtual viewpoint path, the frame image is reconstructed to obtain the corresponding multi-angle free-view video data.

Wherein, the virtual parameter data of the virtual viewpoint may include: virtual viewing position data and virtual viewing angle data; the parameter data corresponding to the image combination may include: collecting position data, shooting angle data, and the like. The forward mapping can be used first, and then the reverse mapping method can be used to obtain the reconstructed image.

In a specific implementation, the collected position data and shooting angle data may be referred to as external parameter data, and the parameter data may also include internal parameter data, and the internal parameter data may include attribute data of the collection device, so that the mapping relationship can be determined more accurately. For example, the internal parameter data may include distortion data. Since distortion factors are taken into consideration, the mapping relationship can be further accurately determined spatially.

In a specific embodiment, in order to facilitate subsequent data acquisition, a spliced image corresponding to the image combination may be generated based on the pixel data and depth data of the image combination, and the spliced image may include a first field and a second field. , Wherein the first field includes the pixel data of the image combination, and the second field includes the depth data of the image combination, and then the stitched image and corresponding parameter data corresponding to the image combination are stored.

In another specific embodiment, in order to save storage space, a spliced image corresponding to the preset frame image in the image combination may be generated based on the pixel data and depth data of the preset frame image in the image combination. The stitched image corresponding to the frame image may include a first field and a second field, wherein the first field includes pixel data of the preset frame image, and the second field includes depth data of the preset frame image, Then, only the spliced image corresponding to the preset frame image and the corresponding parameter data may be stored.

Wherein, the first field corresponds to the second field, the stitched image can be divided into an image area and a depth map area, the pixel field of the image area stores the pixel data of the multiple frame images, and the depth map area The pixel field stores the depth data of the multiple frame images; the pixel field in the image area that stores the pixel data of the frame image is used as the first field, and the pixel field in the depth map area that stores the depth data of the frame image is used as The second field; the stitched image of the acquired image combination and the parameter data corresponding to the image combination can be stored in the data file. When the stitched image or corresponding parameter data needs to be acquired, it can be based on the header file of the data file The included storage address is read from the corresponding storage space.

In addition, the storage format of the image combination may be a video format, the number of image combinations may be multiple, and each image combination may be a combination of images corresponding to different frame moments after the video is decapsulated and decoded.

In specific implementation, based on the received image reconstruction instruction from the interactive terminal, the interaction frame time information at the interaction time can be determined, and the stored corresponding images corresponding to the interaction frame time can be combined with the stitched image of the preset frame image and the corresponding image combination The corresponding parameter data is sent to the interactive terminal, so that the interactive terminal selects corresponding pixel data and depth data and corresponding parameter data in the stitched image according to preset rules based on the virtual viewpoint position information determined by the interactive operation, The selected pixel data and depth data are combined with the parameter data for rendering, and the multi-angle free-view video data corresponding to the virtual viewpoint position to be interacted is reconstructed and played.

Wherein, the preset rules can be set according to specific scenarios. For example, based on the position information of the virtual viewpoint determined by the interactive operation, the position information of the W adjacent virtual viewpoints that are closest to the virtual viewpoint at the moment of interaction may be selected in order of distance, And obtain the pixel data and depth data corresponding to the above-mentioned total W+1 virtual viewpoints including the virtual viewpoints of the interaction moments that satisfy the information of the interaction frame moments in the stitched image.

Wherein, the interactive frame time information is determined based on the trigger operation from the interactive terminal. The trigger operation may be a trigger operation input by the user or a trigger operation automatically generated by the interactive terminal. For example, the interactive terminal detects that there are multiple angles. The trigger operation can be initiated automatically when the free viewpoint data frame is identified. When the user triggers manually, it can be the time information when the user chooses to trigger the interaction after the interactive terminal displays the interactive prompt information, or it can be the historical moment information when the interactive terminal receives the user operation to trigger the interaction, and the historical moment information can be the current playback moment. The previous moment information.

In a specific implementation, the interactive terminal may use the stitched image of the preset frame image and corresponding parameter data, the interactive frame time information, and the virtual viewpoint position information at the interactive frame time based on the acquired image combination of the interactive frame time. Step S24 performs combined rendering on the pixel data and depth data of the spliced image of the preset frame image in the image combination of the acquired interactive frame time, to obtain the multi-angle free-view video data corresponding to the interactive virtual viewpoint position, and Start playing a multi-angle free-view video at the interactive virtual viewpoint position.

With the above solution, the multi-angle free-view video data corresponding to the interactive virtual viewpoint position can be generated at any time based on the image reconstruction instruction from the interactive terminal, which can further enhance the user's interactive experience.

Referring to the schematic structural diagram of the data processing system shown in FIG. 1, in the embodiment of the present invention, as shown in FIG. 1, the data processing system 10 may include: a data processing device 11, a server 12, a playback control device 13, and a playback terminal 14. among them:

The data processing device 11 is adapted to capture multiple synchronized video frames from multiple video data streams simultaneously collected in real time from different locations in the field collection area based on a video frame interception instruction. The multiple synchronized video frames at the designated frame time are uploaded to the server, where the multiple video data streams may be video data streams in a compressed format, or video data streams in a non-compressed format;

The server 12 is adapted to combine the received frame images of multiple synchronized video frames at a specified frame time uploaded by the data processing device 11 as an image combination, and determine the parameter data corresponding to the image combination and the image combination The depth data of each frame image, and based on the corresponding parameter data of the image combination, the pixel data and the depth data of the preset frame image in the image combination, the frame image reconstruction is performed on the preset virtual viewpoint path to obtain the corresponding multiple Angle free-view video data, where the multi-angle free-view video data includes: multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted according to frame time;

The playback control device 13 is adapted to insert the multi-angle free-view video data into the data stream to be played;

The playback terminal 14 is adapted to receive the to-be-played data stream from the playback control device 13 and perform real-time playback.

In a specific implementation, the playback control terminal 13 may output the data stream to be played based on the control instruction. As an alternative example, the playback control device 13 may select one of the multiple data streams as the data stream to be played, or continuously switch the selection among the multiple data streams to continuously output the data stream to be played. The broadcast director control device may be used as a playback control device in the embodiment of the present invention. Wherein, the guide control device can be a manual or semi-manual guide control device that performs playback control based on external input control instructions, or it can be a virtual guide control device that can automatically perform guide control based on artificial intelligence or big data learning or preset algorithms.

Using the above data processing system, the data processing equipment in the distributed system architecture can be used to process the interception of the specified video frame and the reconstruction of the multi-angle free-view video after the server intercepts the preset frame image, which can avoid the deployment of a large number of servers on site. Processing, it can also avoid directly uploading the video data stream collected by the collection device of the collection array, so it can save a lot of transmission resources and server processing resources, and in the case of limited network transmission bandwidth, it allows multiple angles of the specified video frame to be free Viewing angle video can be reconstructed in real time, realizing low-latency playback of multi-angle free-view video, reducing the limitation of network transmission bandwidth, thus reducing implementation costs, reducing restrictions, easy to implement, and meeting low-latency playback of multi-angle free-viewing video The need for real-time interaction.

In a specific implementation, the server 12 is further adapted to generate a spliced image of a preset frame time in the image combination based on the pixel data and depth data of the preset frame of the image in the image combination, and the spliced image includes the first A field and a second field, wherein the first field includes pixel data of a preset frame image in the image combination, and the second field includes a second field of depth data of a preset frame image in the image combination, And store the spliced image of the image combination and the parameter data corresponding to the image combination.

In a specific implementation, the data processing system 10 may further include an interactive terminal 15, which is adapted to determine interactive frame time information based on a trigger operation, send an image reconstruction instruction containing the interactive frame time information to the server, and receive the information returned from the server Corresponding to the stitched image and corresponding parameter data of the preset frame image in the image combination at the time of the interactive frame, determine the virtual viewpoint position information based on the interactive operation, select the corresponding pixel data and depth data in the stitched image according to the preset rules, based The selected pixel data and depth data are combined with the parameter data for rendering, and the multi-angle free-view video data corresponding to the virtual viewpoint position at the time of the interactive frame is reconstructed and played.

The number of the playing terminal 14 may be one or more, the number of the interactive terminal 15 may be one or more, and the playing terminal 14 and the interactive terminal 15 may be the same terminal device. In addition, at least one of a server, a playback control device, or an interactive terminal may be used as the transmitting end of the video frame interception instruction, and other devices capable of transmitting the video frame interception instruction may also be used.

It should be noted that, in specific implementation, the locations of the data processing device and the server can be flexibly deployed according to user requirements. For example, the data processing equipment can be placed in a non-collection area or in the cloud. For another example, the server can be placed in a non-collection area on site, on the cloud or terminal access side. For example, on the terminal access side, edge node devices such as base stations, set-top boxes, routers, home data center servers, and hotspot devices can all be used as locations. The server is used to obtain multi-angle free view data. Alternatively, the data processing device and the server can also be centrally arranged and work together as a server cluster to realize the rapid generation of multi-angle free-view data, so as to realize low-latency playback and real-time playback of multi-angle free-view videos interactive.

With the above solution, the multi-angle free-view video data corresponding to the position of the virtual viewpoint to be interacted can be generated at any time based on the image reconstruction instruction from the interactive terminal, which can further enhance the user interaction experience.

In order to enable those skilled in the art to better understand and implement the embodiments of the present invention, the data processing system will be described in detail below through specific application scenarios.

As shown in Figure 3, it is a schematic diagram of the structure of a data processing system in an application scenario, in which a layout scenario of a data processing system for a basketball game is shown. The data processing system includes a collection of multiple collection devices. An array 31, a data processing device 32, a cloud server cluster 33, a playback control device 34, a playback terminal 35 and an interactive terminal 36.

Referring to Figure 3, the basketball hoop on the left is taken as the core point of view, the core point of view is taken as the center of the circle, and the fan-shaped area on the same plane as the core point of view is used as the preset multi-angle free viewing angle range. The collection devices in the collection array 31 can be fan-shaped and placed in different positions in the field collection area according to the preset multi-angle free viewing angle range, and can collect video data streams synchronously in real time from corresponding angles respectively.

In specific implementation, the collection equipment can also be set up in the ceiling area of the basketball stadium, on the basketball stand, and so on. The collection devices can be arranged and distributed along a straight line, a fan shape, an arc line, a circle, or an irregular shape. The specific arrangement can be set according to one or more factors such as the specific site environment, the number of acquisition equipment, the characteristics of the acquisition equipment, and the requirements for imaging effects. The collection device may be any device with a camera function, for example, a common camera, a mobile phone, a professional camera, and the like.

In order not to affect the work of the collection device, the data processing device 32 can be placed in a non-collection area on site and can be regarded as a site server. The data processing device 32 may send a streaming instruction to each acquisition device in the acquisition array 31 through a wireless local area network, and each acquisition device in the acquisition array 31 will obtain a streaming instruction based on the streaming instruction sent by the data processing device 32. The video data stream of is transmitted to the data processing device 32 in real time. Wherein, each collection device in the collection array 31 can transmit the obtained video data stream to the data processing device 32 in real time through the switch 37.

When the data processing device 32 receives the video frame interception instruction, it intercepts the video frame at the specified frame time from the received multiple video data streams to obtain the frame images of multiple synchronized video frames, and the obtained specified specified The multiple synchronized video frames at the frame time are uploaded to the server cluster 33 in the cloud.

Correspondingly, the cloud server cluster 33 uses the received frame images of multiple synchronized video frames as image combinations, determines the parameter data corresponding to the image combination and the depth data of each frame image in the image combination, and based on the image Combine the corresponding parameter data, the pixel data and depth data of the preset frame image in the image combination, and perform frame image reconstruction on the preset virtual viewpoint path to obtain the corresponding multi-angle free view video data, the multi-angle free view The video data may include: multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted by frame time.

The server can be placed in the cloud, and in order to process data more quickly in parallel, a cloud server cluster 33 can be composed of multiple different servers or server groups according to different processing data.

For example, the cloud server cluster 33 may include: a first cloud server 331, a second cloud server 332, a third cloud server 333, and a fourth cloud server 334. Among them, the first cloud server 331 can be used to determine the corresponding parameter data of the image combination; the second cloud server 332 can be used to determine the depth data of each frame of the image in the image combination; the third cloud server 333 can be based on the The parameter data corresponding to the image combination, the pixel data and the depth data of the image combination are reconstructed using a depth image-based virtual view point reconstruction (DIBR) algorithm to reconstruct a preset virtual view point path; The fourth cloud server 334 may be used to generate a multi-angle free-view video, where the multi-angle free-view video data may include: multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted by frame time.

It is understandable that the first cloud server 831, the second cloud server 832, the third cloud server 833, and the fourth cloud server 834 may also be a server group composed of a server array or server sub-clusters, which is not done in the embodiment of the present invention. limit.

In a specific implementation, the server cluster 33 in the cloud can store the pixel data and depth data of the image combination in the following manner:

Based on the pixel data and depth data of the image combination, a stitched image corresponding to the frame time is generated, the stitched image includes a first field and a second field, wherein the first field includes a preset frame image in the image combination The second field includes the second field of the depth data of the preset frame image in the image combination. The acquired spliced image and corresponding parameter data can be stored in a data file. When the spliced image or parameter data needs to be acquired, it can be read from the corresponding storage space according to the corresponding storage address in the header file of the data file.

Then, the playback control device 34 may insert the received multi-angle free-view video data into the data stream to be played, and the playback terminal 35 receives the data stream to be played from the playback control device 34 and plays it in real time. The playback control device 34 may be a manual playback control device or a virtual playback control device. In a specific implementation, a dedicated server that can automatically switch video streams can be set as a virtual playback control device to control the data source. A broadcast control device such as a broadcast control station can be used as a playback control device in the embodiment of the present invention.

It is understandable that the data processing device 32 can be placed in the on-site non-collection area or the cloud according to specific scenarios, and the server (cluster) and playback control device can be placed in the on-site non-collection area, cloud or terminal access according to the specific scenario. On the other hand, this embodiment is not used to limit the specific implementation and protection scope of the present invention.

As shown in FIG. 4, the interactive interface schematic diagram of the interactive terminal 40 has a progress bar 41 on the interactive interface of the interactive terminal 40. With reference to FIGS. 3 and 4, the interactive terminal 40 can compare the designated frame time received from the data processing device 32 with The progress bar is associated, and several interactive identifiers, such as

interactive identifiers

42 and 43, can be generated on the progress bar 41. Among them, the black segment of the progress bar 41 is the played portion 41a, and the blank segment of the progress bar 41 is the unplayed portion 41b.

When the system of the interactive terminal reads the corresponding interactive identifier 43 on the progress bar 41, the interface of the interactive terminal 40 may display interactive prompt information. For example, when the user selects an operation to trigger the current interactive mark 43, the interactive terminal 40 generates an image reconstruction instruction corresponding to the interactive frame time of the interactive mark 43 after receiving the feedback, and sends the image reconstruction instruction containing the interactive frame time information to the Cloud server cluster 33. When the user does not select the trigger, the interactive terminal 40 can continue to read subsequent video data, and the played portion 41a on the progress bar continues to advance. The user can also choose to trigger the historical interaction mark while watching, for example, to trigger the interaction mark 42 displayed in the played part 41a on the progress bar, and the interactive terminal 40 generates an image reconstruction instruction at the interaction frame time corresponding to the interaction mark 42 after receiving the feedback.

When the server cluster 33 in the cloud receives the image reconstruction instruction from the interactive terminal 40, the spliced image of the preset frame image in the corresponding image combination and the corresponding parameter data of the corresponding image combination can be extracted and transmitted to the interactive terminal 40 .

The interactive terminal 40 determines the interactive frame time information based on the trigger operation, sends an image reconstruction instruction containing the interactive frame time information to the server, and receives the stitched image of the preset frame image in the image combination corresponding to the interactive frame time returned from the server cluster 33 in the cloud And corresponding parameter data, and determine the virtual viewpoint position information based on interactive operations, select the corresponding pixel data and depth data and corresponding parameter data in the stitched image according to preset rules, and combine the selected pixel data and depth data for rendering , The multi-angle free-view video data corresponding to the virtual viewpoint position at the time of the interaction frame is reconstructed and played.

It is understandable that each collection device in the collection array can be connected to the data processing device through a switch and/or a local area network, and the number of playback terminals and interactive terminals can be one or more. The interactive terminal may be the same terminal device, the data processing device may be placed in an on-site non-collection area or in the cloud according to specific scenarios, and the server may be placed in an on-site non-collection area, in the cloud or on the terminal access side, according to specific scenarios. The embodiments are not used to limit the specific implementation and protection scope of the present invention.

The embodiment of the present invention also provides a server corresponding to the above-mentioned data processing method. In order to enable those skilled in the art to better understand and implement the embodiments of the present invention, the following describes in detail through specific embodiments with reference to the accompanying drawings.

Referring to the schematic structural diagram of the server shown in FIG. 5, in the embodiment of the present invention, as shown in FIG. 5, the server 50 may include:

The data receiving unit 51 is adapted to receive frame images of multiple synchronized video frames uploaded by the data processing device as an image combination;

The parameter data calculation unit 52 is adapted to determine parameter data corresponding to the image combination;

The depth data calculation unit 53 is adapted to determine the depth data of each frame of the image in the image combination;

The video data acquisition unit 54 is adapted to perform frame image reconstruction on the preset virtual viewpoint path based on the parameter data corresponding to the image combination, the pixel data and the depth data of the preset frame image in the image combination, to obtain the corresponding multiple Angle free-view video data, wherein the multi-angle free-view video data includes: multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted according to frame time.

The first data transmission unit 55 is adapted to insert the multi-angle free-view video data into the to-be-played data stream of the playback control device and play it through the playback terminal.

Wherein, the plurality of synchronized video frames may be obtained by the data processing device based on the video frame interception instruction, which is obtained by intercepting the video frame at the specified frame time in the multiple video data streams synchronized in real time and uploaded from different locations in the field collection area. , The shooting angles of the multiple synchronized video frames are different.

The server can be placed in a non-collection area on site, in the cloud or on the terminal access side according to specific scenarios.

In a specific implementation, as shown in FIG. 5, the video data acquiring unit 54 may include:

The data mapping subunit 541 is adapted to combine the preset frame images in the image combination according to the relationship between the virtual parameter data of each virtual viewpoint in the preset virtual viewpoint path and the parameter data corresponding to the image combination The depth data of are respectively mapped to the corresponding virtual viewpoints;

The data reconstruction subunit 542 is adapted to reconstruct the frame image according to the pixel data and depth data of the preset frame image respectively mapped to the corresponding virtual viewpoint, and the preset virtual viewpoint path, to obtain the corresponding multi-angle free-view video data .

In a specific implementation, as shown in FIG. 5, the server 50 may further include:

The stitched image generating unit 56 is adapted to generate a stitched image corresponding to the image combination based on the pixel data and depth data of the preset frame image in the image combination. The stitched image may include a first field and a second field, wherein , The first field includes pixel data of a preset frame image in the image combination, and the second field includes depth data of a preset frame image in the image combination;

The data storage unit 57 is adapted to store the stitched image of the image combination and the parameter data corresponding to the image combination.

The data extraction unit 58 is adapted to determine the interactive frame time information at the interactive time based on the received image reconstruction instruction from the interactive terminal, and extract the spliced image of the preset frame image and the corresponding image combination corresponding to the corresponding image combination at the interactive frame time Parameter data;

The second data transmission unit 59 is adapted to transmit the spliced image and corresponding parameter data extracted by the data extraction unit 58 to the interactive terminal, so that the interactive terminal determines the virtual viewpoint position information based on the interactive operation, Select the corresponding pixel data, depth data and corresponding parameter data in the stitched image according to preset rules, combine the selected pixel data and depth data for rendering, and reconstruct the multi-angle free viewing angle corresponding to the virtual viewpoint position at the time of the interaction frame Video data and play it.

The embodiments of the present invention also provide a data interaction method and data processing system, which can obtain the data stream to be played from the playback control device in real time and perform real-time playback and display. Each interaction identifier in the data stream to be played and the designation of the video data Frame time is associated, and then, in response to a trigger operation on an interactive indicator, the interaction data corresponding to the specified frame time of the interactive indicator can be obtained. Since the interactive data may include multi-angle free view data, it can be based on the Interactive data, displaying a multi-angle free view of the specified frame time.

By adopting the data interaction scheme in the embodiment of the present invention, during the playback process, the interaction data can be acquired according to the trigger operation of the interaction identifier, and then the multi-angle free perspective display can be performed to enhance the user interaction experience. The following is a detailed description of the data interaction method and data processing system through specific embodiments with reference to the attached drawings.

Referring to the flowchart of the data exchange method shown in FIG. 6, the following describes the data exchange method used in the embodiment of the present invention through specific steps.

S61: Acquire a data stream to be played from the playback control device in real time and perform real-time playback and display. The data stream to be played includes video data and interactive identifiers, and each interactive identifier is associated with a designated frame moment of the data stream to be played.

Wherein, the designated frame time may be in units of frames, the Nth to Mth frames are regarded as the designated frame time, N and M are integers not less than 1, and N≤M; or, the designated frame time may also be in time As the unit, take X to Y seconds as the designated frame time, X and Y are positive numbers, and X≤Y.

In a specific implementation, the data stream to be played may be associated with a number of designated frame moments, and the playback control device may generate an interactive identifier corresponding to each designated frame moment based on the information of each designated frame moment, so as to play and display in real time. When the data stream is to be played, the corresponding interactive logo can be displayed at the specified frame time. Among them, each interactive identifier and video data can be associated in different ways according to actual conditions.

In an embodiment of the present invention, the data stream to be played may include several frame moments corresponding to the video data. Since each interactive identifier also has a corresponding designated frame time, it can match the designated frame corresponding to each interactive identifier. The time information and the information of each frame time in the data stream to be played can associate the frame time of the same information with the interactive identifier, so that when the data stream to be played is displayed in real time and the corresponding frame time is reached, Display the corresponding interactive logo.

For example, the data stream to be played includes N frame moments, and the playback control device generates corresponding M interaction identifiers based on the information of the M designated frame moments. If the information of the i-th frame time is the same as the information of the j-th designated frame time, then the i-th frame time can be associated with the j-th interactive identifier, and when the real-time display proceeds to the i-th frame time, The j-th interactive logo can be displayed, where i is a natural number not greater than N, and j is a natural number not greater than M.

S62: In response to a triggering operation on an interactive indicator, obtain interactive data corresponding to a designated frame time of the interactive indicator, where the interactive data includes multi-angle free view data.

In specific implementation, each interactive data corresponding to each designated frame time can be stored in a preset storage device. Since the interaction identifier and the designated frame time have a corresponding relationship, the interactive terminal can be triggered by performing a trigger operation. The interactive identification displayed by the interactive terminal can obtain the specified frame time corresponding to the triggered interactive identification according to the trigger operation on the interactive identification. In this way, the interaction data at the specified frame time corresponding to the triggered interaction identifier can be acquired.

For example, a preset storage device may store M pieces of interactive data, where the M pieces of interactive data respectively correspond to M designated frame moments, and the M designated frame moments correspond to M interactive identifiers. Assuming that the triggered interaction identifier is Pi, the designated frame time Ti corresponding to the interaction identifier Pi can be obtained according to the triggered interaction identifier Pi. In this way, the interaction data of the specified frame time Ti corresponding to the obtained interaction identifier Pi is acquired. Among them, i is a natural number.

Wherein, the trigger operation may be a trigger operation input by the user, or a trigger operation automatically generated by the interactive terminal.

In addition, the preset storage device can be placed in a non-collection area on site, in the cloud or on the terminal access side. Specifically, the preset storage device may be a data processing device, server, or interactive terminal in the embodiment of the present invention, or an edge node device located on the side of the interactive terminal, such as a base station, a set-top box, a router, a home data center server, and a hotspot. Equipment, etc.

S63: Based on the interactive data, perform image display of a multi-angle free view at the specified frame time.

In a specific implementation, an image reconstruction algorithm may be used to perform image reconstruction on the multi-angle free view data of the interactive data, and then perform the multi-angle free view image display at the specified frame time.

Moreover, if the designated frame time is one frame time, a static image with a multi-angle free view can be displayed; if the designated frame time corresponds to multiple frame times, a dynamic image with a multi-angle free view can be displayed.

With the above solution, in the process of video playback, the interactive data can be acquired according to the trigger operation on the interactive identifier, and then the multi-angle free perspective display can be performed to enhance the user interaction experience.

In a specific implementation, the multi-angle free view data may be generated based on multiple frame images corresponding to the received specified frame moment, and the multiple frame images are synchronously collected by the data processing device on the multiple collection devices in the collection array. Multiple video data streams are intercepted at the designated frame time, and the multi-angle free view data may include pixel data, depth data, and parameter data of the multiple frame images, wherein the pixel data of each frame image and There is an association relationship between the depth data.

Among them, the pixel data of the frame image may be any of YUV data or RGB data, or may also be other data capable of expressing the frame image. The depth data may include depth values corresponding to the pixel data of the frame image one-to-one, or may be a partial value selected from a depth value set that corresponds to the pixel data of the frame image one-to-one. The specific selection method of the depth data depends on the specific situation.

In a specific implementation, the parameter data corresponding to the multiple frame images may be obtained through a parameter matrix, and the parameter matrix may include an internal parameter matrix, an external parameter matrix, a rotation matrix and a translation matrix, and the like. As a result, the relationship between the three-dimensional geometric position of a specified point on the surface of the space object and its corresponding points in multiple frame images can be determined.

In the embodiment of the present invention, the SFM algorithm can be used to perform feature extraction, feature matching and global optimization on the acquired multiple frame images based on the parameter matrix, and the obtained parameter estimates are used as the corresponding parameter data of the multiple frame images. The specific algorithms used in the process of feature extraction, feature matching and global optimization can be referred to the previous introduction.

In a specific implementation, the depth data of each frame image may be determined based on the multiple frame images. Wherein, the depth data may include depth values corresponding to pixels of each frame of image. The distance from the collection point to each point in the scene can be used as the aforementioned depth value, and the depth value can directly reflect the geometric shape of the visible surface in the area to be viewed. For example, taking the origin of the shooting coordinate system as the optical center, the depth value may be the distance from each point in the scene to the optical center along the shooting optical axis. Those skilled in the art can understand that the above distance may be a relative value, and multiple frames of images may use the same reference.

In an embodiment of the present invention, a binocular stereo vision algorithm may be used to calculate the depth data from each frame of image. In addition, the depth data can also be indirectly estimated by analyzing the features such as the luminosity feature, the light and dark feature of the frame image.

In another embodiment of the present invention, the MVS algorithm can be used to reconstruct the frame image, the pixels of each frame image can be matched, the three-dimensional coordinates of each pixel point can be reconstructed, the points with image consistency can be obtained, and then the calculation The depth data of each frame image. Alternatively, the pixel points of the selected frame image may be matched, and the three-dimensional coordinates of the pixel points of each selected frame image may be reconstructed to obtain points with image consistency, and then the depth data of the corresponding frame image may be calculated. Among them, the pixel data of the frame image corresponds to the calculated depth data. The method of selecting the frame image can be set according to the specific situation. For example, the distance between the frame image of the depth data and other frame images can be calculated according to the needs, and the selected part Frame image.

In a specific implementation, the data processing device may intercept frame-level synchronized video frames at the specified frame time in the multiple video data streams based on the received video frame interception instruction.

In a specific implementation, the video frame interception instruction may include frame time information for intercepting a video frame, and the data processing device intercepts corresponding information from multiple video data streams according to the frame time information in the video frame interception instruction The video frame at the frame moment. In addition, the data processing device sends the frame time information in the video frame interception instruction to the playback control device, and the playback control device can obtain the corresponding designated frame time according to the received frame time information, and according to the received frame time The information generates the corresponding interactive logo.

In a specific implementation, multiple collection devices in the collection array are placed at different locations in the field collection area according to a preset multi-angle free viewing angle range, and the data processing device can be placed in a field non-collection area or in the cloud.

In a specific implementation, the multi-angle free viewing angle may refer to the spatial position and viewing angle of the virtual viewpoint that enables the scene to be freely switched. For example, the multi-angle free viewing angle can be a 6-degree-of-freedom (6DoF) viewing angle, where the spatial position of the virtual viewpoint can be expressed as (x, y, z), and the viewing angle can be expressed as three rotation directions

A total of 6 degrees of freedom directions are used as a 6 degree of freedom (6DoF) viewing angle.

Moreover, the multi-angle free viewing angle range can be determined according to the needs of the application scene.

In a specific implementation, the playback control device may generate an interactive identifier associated with the video frame at the corresponding moment in the data stream to be played based on the frame time information of the intercepted video frame from the data processing device. For example, after receiving the video frame interception instruction, the data processing device sends the frame time information in the video frame interception instruction to the playback control device. Then, the playback control device can generate a corresponding interactive identifier based on the time information of each frame.

In specific implementation, corresponding interactive data can be generated according to the objects displayed on site and the associated information of the displayed objects. For example, the interaction data may also include at least one of the following: field analysis data, information data of the collection object, information data of equipment associated with the collection object, information data of items deployed on site, and information data of logos displayed on site. Then, based on the interaction data, a multi-angle free perspective display can be performed to display richer interactive information to the user through a multi-angle free perspective, so that the user interaction experience can be further enhanced.

For example, when playing a basketball game, the interactive data can include not only multi-angle free perspective data, but also analysis data of the ball game, information data of a certain player, information data of the shoes worn by the player, information data of basketball, One or more of the information data of the on-site sponsor’s logo, etc.

In a specific implementation, in order to conveniently return to the data stream to be played after the image display is over, continue to refer to FIG. 6, after the step 63, it may further include:

S64: When the interaction end signal is detected, switch to real-time acquisition of the data stream to be played from the playback control device and perform real-time playback and display.

For example, when receiving an instruction to end the interaction, switch to the to-be-played data stream obtained in real time from the playback control device and perform real-time playback and display.

For another example, when it is detected that the multi-angle free-view image at the specified frame time is displayed to the last image, it switches to the to-be-played data stream acquired in real time from the playback control device and performs real-time playback and display.

In a specific embodiment, the multi-angle free-view image display based on the interaction data in step 63 may specifically include the following steps:

The virtual viewpoint is determined according to the interactive operation, the virtual viewpoint is selected from a multi-angle free viewing angle range, the multi-angle free viewing angle range is a range that supports switching of virtual viewpoints in the viewing area, and then displaying based on the virtual viewpoint An image for viewing the area to be viewed, the image being generated based on the interaction data and the virtual viewpoint.

In specific implementation, a virtual viewpoint path may be preset, and the virtual viewpoint path may include several virtual viewpoints. Since the virtual viewpoint is selected from a range of multi-angle free viewing angles, the corresponding first virtual viewpoint can be determined according to the viewing angle of the image played and displayed during the interactive operation, and then the first virtual viewpoint can be started from the first virtual viewpoint according to the preset virtual viewpoint The corresponding images of each virtual viewpoint are displayed in sequence.

In the embodiment of the present invention, the DIBR algorithm may be used to combine the pixel data and depth data corresponding to the specified frame time of the triggered interactive mark according to the parameter data in the multi-angle free view data and the preset virtual viewpoint path Rendering, thereby realizing image reconstruction based on a preset virtual viewpoint path, and obtaining corresponding multi-angle free-view video data, and then starting from the first virtual viewpoint and sequentially displaying corresponding images in the order of the preset virtual viewpoints.

Moreover, if the designated frame time corresponds to the same frame time, the obtained multi-angle free-view video data may include multi-angle free-view spatial data of images sorted according to the frame time, which can display multi-angle free-view static images; if said The designated frame moment corresponds to different frame moments, and the obtained multi-angle free-view video data can include multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted by frame time, which can display dynamic images with multi-angle free-view views , That is, the frame images of the video frames with multiple angles and free viewing angles are displayed.

The embodiment of the present invention also provides a system corresponding to the above-mentioned data interaction method. In order to enable those skilled in the art to better understand and implement the embodiments of the present invention, a detailed description will be given below through specific embodiments with reference to the accompanying drawings.

Referring to the schematic structural diagram of the data processing system shown in FIG. 7, the data processing system 70 may include: a collection array 71, a data processing device 72, a server 73, a playback control device 74, and an interactive terminal 75, wherein:

The collection array 71 may include a plurality of collection devices, which are placed at different positions in the field collection area according to a preset multi-angle free viewing angle range, and are suitable for real-time simultaneous collection of multiple video data streams and real-time uploading of videos. Data flows to the data processing device 72;

The data processing device 72, for the uploaded multiple video data streams, is adapted to intercept the multiple video data streams at the specified frame time according to the received video frame interception instruction to obtain the data corresponding to the specified frame time Multiple frame images and frame time information corresponding to the specified frame time, and multiple frame images at the specified frame time and frame time information corresponding to the specified frame time are uploaded to the server 73, and the specified frame time The frame time information of the frame time is sent to the playback control device 74;

The server 73 is adapted to receive the multiple frame images and the frame time information uploaded by the data processing device 72, and generate interactive data for interaction based on the multiple frame images. The data includes multi-angle free view data, and the interaction data is associated with the frame time information;

The playback control device 74 is adapted to determine the designated frame time corresponding to the frame time information uploaded by the data processing device 72 in the data stream to be played, generate an interactive identifier associated with the designated frame time, and include all The to-be-played data stream of the interactive identifier is transmitted to the interactive terminal 75;

The interactive terminal 75 is adapted to play and display the video containing the interactive identifier in real time based on the received data stream to be played, and based on the triggering operation of the interactive identifier, obtain and store in the server 73 and correspond to the The interactive data at the specified frame time can be displayed in a multi-angle free-view image.

With the above solution, during the playback process, the interactive data can be acquired according to the trigger operation of the interactive identifier, and then the multi-angle free perspective display can be performed to enhance the user's interactive experience.

In a specific implementation, the multi-angle free viewing angle may refer to the spatial position and viewing angle of the virtual viewpoint that enables the scene to be freely switched. Moreover, the multi-angle free viewing angle range can be determined according to the needs of the application scene. The multi-angle free viewing angle may be a 6-degree-of-freedom (6DoF) viewing angle.

In a specific implementation, the acquisition device itself may have the function of encoding and packaging, so that the original video data collected from the corresponding angle in real time can be encoded and packaged in real time. In addition, the acquisition device can have a compression function.

In a specific implementation, the server 73 is adapted to generate the multi-angle free view data based on the received multiple frame images corresponding to the specified frame time, and the multi-angle free view data includes the data of the multiple frame images. Pixel data, depth data, and parameter data, where there is an association relationship between the pixel data and the depth data of each frame image.

In specific implementation, the multiple collection devices in the collection array 71 can be placed in different locations in the field collection area according to the preset multi-angle free viewing angle range, and the data processing device 72 can be placed in the field non-collection area or in the cloud, so The server 73 can be placed in a non-collection area on site, in the cloud, or on the terminal access side.

In a specific implementation, the playback control device 74 is adapted to generate an interactive identifier associated with the corresponding video frame in the data stream to be played based on the frame information moment of the video frame intercepted by the data processing device 72.

In a specific implementation, the interactive terminal 75 is further adapted to switch to the to-be-played data stream obtained in real time from the playback control device 74 and perform real-time playback and display when the interaction end signal is detected.

In order to enable those skilled in the art to better understand and implement the embodiments of the present invention, the following describes the data processing system in detail through specific application scenarios. As shown in FIG. 8, it is another application scenario of the data processing system in the embodiment of the present invention. The schematic structural diagram shows a basketball game playing application scenario, where the scene is the basketball court area on the left. The data processing system 80 may include: a collection array 81 composed of collection devices, a data processing device 82, and a cloud The server cluster 83, the playback control device 84 and the interactive terminal 85.

With the basketball hoop as the core point of view, and the core point of view as the center of the circle, the fan-shaped area on the same plane as the core point of view can be used as a preset multi-angle free viewing angle range. Correspondingly, each collection device in the collection array 81 can be fan-shaped and placed in different positions of the field collection area according to the preset multi-angle free viewing angle range, and can simultaneously collect video data streams from corresponding angles in real time.

In order not to affect the work of the collection device, the data processing device 82 may be placed in a non-collection area on site. The data processing device 82 may send a streaming instruction to each collection device in the collection array 81 via a wireless local area network. Each collection device in the collection array 81 transmits the obtained video data stream to the data processing device 82 in real time based on the streaming instruction sent by the data processing device 82. Wherein, each collection device in the collection array 81 can transmit the obtained video data stream to the data processing device 82 in real time through the switch 87. Each collection device can compress the collected original video data in real time and transmit it to the data processing device in real time, so as to further save local area network transmission resources.

When the data processing device 82 receives the video frame interception instruction, it intercepts the video frames at the specified frame time from the received multiple video data streams to obtain the frame images corresponding to the multiple video frames and the frame images corresponding to the specified frame time. The frame time information, and upload the multiple frame images of the specified frame time and the frame time information corresponding to the specified frame time to the server cluster 83 in the cloud, and send the frame time information of the specified frame time to all述Play control device 84. Among them, the video frame interception instruction may be manually issued by the user, or may be automatically generated by the data processing device.

The server can be placed in the cloud, and in order to process data faster in parallel, the cloud server 83 can be composed of multiple different servers or server groups according to different data processing.

For example, the server cluster 83 in the cloud may include: a first cloud server 831, a second cloud server 832, a third cloud server 833, and a fourth cloud server 834. Wherein, the first cloud server 831 may be used to determine the parameter data corresponding to the multiple frame images; the second cloud server 832 may be used to determine the depth data of each frame image in the multiple frame images; the third cloud server 833 Based on the parameter data corresponding to the multiple frame images, the depth data and the pixel data of the preset frame images in the multiple frame images, the DIBR algorithm may be used to reconstruct the frame image of the preset virtual viewpoint path; The four cloud servers 834 can be used to generate multi-angle free-view videos.

In a specific implementation, the multi-angle free-view video data may include: multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted according to frame time. The interactive data may include multi-angle free view data, the multi-angle free view data may include pixel data, depth data, and parameter data of a plurality of frame images, and there is an association relationship between the pixel data and the depth data of each frame image .

The server cluster 83 in the cloud can store the interactive data according to the specified frame time information.

The playback control device 84 may generate an interaction identifier associated with the specified frame time according to the frame time information uploaded by the data processing device, and transmit the data stream to be played containing the interaction identifier to the interactive terminal 85.

The interactive terminal 85 can play the display video in real time based on the received data stream to be played and display the interactive logo at the corresponding video frame moment. When an interactive logo is triggered, the interactive terminal 85 can obtain the interactive data stored in the cloud server cluster 83 and corresponding to the specified frame time, so as to display images with a multi-angle free view. When the interaction terminal 85 detects the interaction end signal, it can switch to obtain the data stream to be played from the playback control device 84 in real time and perform real-time playback and display.

Referring to the schematic structural diagram of another data processing system shown in FIG. 38, the data processing system 380 may include: a collection array 381, a data processing device 382, a playback control device 383, and an interactive terminal 384; among them:

The collection array 381 includes a plurality of collection devices, which are placed at different locations in the field collection area according to the preset multi-angle free viewing angle range, and are suitable for real-time simultaneous collection of multiple video data streams, and real-time upload of video data Flow to the data processing equipment;

The data processing device 382, for the uploaded multiple video data streams, is adapted to intercept the multiple video data streams at a specified frame time according to the received video frame interception instruction to obtain the data corresponding to the specified frame time Multiple frame images and frame time information corresponding to the specified frame time, and send the frame time information of the specified frame time to the playback control device 383;

The playback control device 383 is adapted to determine the designated frame time corresponding to the frame time information uploaded by the data processing device 382 in the data stream to be played, generate an interactive identifier associated with the designated frame time, and include all The to-be-played data stream of the interactive identifier is transmitted to the interactive terminal 384;

The interactive terminal 384 is adapted to play and display the video containing the interactive identifier in real time based on the received data stream to be played, and based on the triggering operation of the interactive identifier, obtain from the data processing device 382 corresponding to the The multiple frame images at the specified frame moment of the interactive mark, and based on the multiple frame images, generate interactive data for interaction, and then perform multi-angle free-view image display, wherein the interactive data includes multi-angle free Perspective data.

In a specific implementation, the data processing device can be flexibly deployed according to user requirements, for example, the data processing device can be placed in a non-collection area on site or in the cloud.

Using the above-mentioned data processing system, during the playback process, interactive data can be acquired according to the triggering operation of the interactive identifier, and then multi-angle free perspective display can be performed to enhance the user interaction experience.

The embodiment of the present invention also provides a terminal corresponding to the above-mentioned data interaction method. In order to enable those skilled in the art to better understand and implement the embodiments of the present invention, the following describes in detail through specific embodiments with reference to the accompanying drawings.

Referring to the schematic structural diagram of the interactive terminal shown in FIG. 9, the interactive terminal 90 may include:

The data stream acquiring unit 91 is adapted to acquire a data stream to be played from the playback control device in real time, the data stream to be played includes video data and an interactive identifier, and the interactive identifier is associated with a specified frame moment of the data stream to be played;

The play and display unit 92 is adapted to play and display the video and interactive identification of the data stream to be played in real time;

The interactive data obtaining unit 93 is adapted to obtain interactive data corresponding to the specified frame time in response to a trigger operation on the interactive identifier, and the interactive data includes multi-angle free view data;

The interactive display unit 94 is adapted to perform multi-angle free-view image display at the specified frame time based on the interactive data;

The switching unit 95 is adapted to trigger switching to the data stream to be played acquired by the data stream acquiring unit 91 from the playback control device in real time from the playback control device when the interaction end signal is detected, and the playback and display unit 92 performs real-time playback and display .

Wherein, the interactive data may be generated by the server and transmitted to the interactive terminal, or may be generated by the interactive terminal.

In the process of playing the video, the interactive terminal can obtain the data stream to be played from the playing control device in real time, and can display the corresponding interactive identifier at the corresponding frame time. In a specific implementation, as shown in FIG. 4, it is a schematic diagram of an interactive interface of an interactive terminal in an embodiment of the present invention.

The interactive terminal 40 obtains the data stream to be played in real time from the playback control device. When the real-time playback display progresses to the first frame time T1, the first interactive indicator 42 can be displayed on the progress bar 41, and the real-time playback display progresses to the first frame time T1. At the second frame time T2, the second interactive logo 43 can be displayed on the progress bar. Among them, the black part of the progress bar is the played part, and the white part is the unplayed part.

The trigger operation may be a trigger operation input by a user, or a trigger operation automatically generated by the interactive terminal. For example, the interactive terminal may automatically initiate a trigger operation when it detects the presence of an identifier of a multi-angle free viewpoint data frame. When the user triggers manually, it can be the time information when the user chooses to trigger the interaction after the interactive terminal displays the interactive prompt information, or it can be the historical moment information when the interactive terminal receives the user operation to trigger the interaction, and the historical moment information can be the current playback moment. The previous moment information.

With reference to Figure 4, Figure 7 and Figure 9, when the interactive terminal system reads the corresponding interactive mark 43 on the progress bar 41, the interactive prompt information can be displayed. When the user does not choose to trigger, the interactive terminal 40 can continue to read subsequent videos Data, the played part of the progress bar 41 continues to advance. When the user selects the trigger, the interactive terminal 40 generates an image reconstruction instruction at a specified frame time corresponding to the interactive identifier after receiving the feedback, and sends it to the server 73.

For example, when the user chooses to trigger the current interactive indicator 43, the interactive terminal 40 generates an image reconstruction instruction corresponding to the specified frame time T2 of the interactive indicator 43 after receiving the feedback, and sends it to the server 73. The server may send the interaction data corresponding to the designated frame time T2 according to the image reconstruction instruction.

The user can also choose to trigger the historical interaction mark while watching, for example, to trigger the interactive mark 42 displayed in the played part 41a on the progress bar. After receiving the feedback, the interactive terminal 40 generates an image reconstruction instruction corresponding to the specified frame time T1 of the interactive mark 42 and sends it To the server 73. According to the image reconstruction instruction, the server may send interactive data corresponding to the designated frame time T1. The interactive terminal 40 may use an image reconstruction algorithm to perform image processing on the multi-angle free view data of the interactive data, and then perform image display of the multi-angle free view at the specified frame time. If the designated frame time is one frame time, then a static image with a multi-angle free view is displayed; if the designated frame time corresponds to multiple frame times, then a dynamic image with a multi-angle free view is displayed.

With reference to Figure 4, Figure 38 and Figure 9, when the interactive terminal system reads the corresponding interactive mark 43 on the progress bar 41, the interactive prompt information can be displayed. When the user does not choose to trigger, the interactive terminal 40 can continue to read subsequent videos Data, the played part of the progress bar 41 continues to advance. When the user selects the trigger, the interactive terminal 40 generates an image reconstruction instruction at a specified frame time corresponding to the interactive identifier after receiving the feedback, and sends it to the data processing device 382.

For example, when the user chooses to trigger the current interactive indicator 43, the interactive terminal 40 generates an image reconstruction instruction corresponding to the specified frame time T2 of the interactive indicator 43 after receiving the feedback, and sends it to the data processing device. The data processing device 382 can send multiple frame images corresponding to the specified frame time T2 according to the image reconstruction instruction.

The user can also choose to trigger the historical interaction mark while watching, for example, to trigger the interactive mark 42 displayed in the played part 41a on the progress bar. After receiving the feedback, the interactive terminal 40 generates an image reconstruction instruction corresponding to the specified frame time T1 of the interactive mark 42 and sends it To the data processing equipment. The data processing device can send multiple frame images corresponding to the designated frame time T1 according to the image reconstruction instruction.

The interactive terminal 40 may generate interactive data for interaction based on the multiple frame images, and may use an image reconstruction algorithm to perform image processing on the multi-angle free view data of the interactive data, and then perform image processing at the specified frame time. Multi-angle and free viewing angle image display. If the designated frame time is one frame time, then a static image with a multi-angle free view is displayed; if the designated frame time corresponds to multiple frame times, then a dynamic image with a multi-angle free view is displayed.

In specific implementation, the interactive terminal of the embodiment of the present invention may be an electronic device with touch screen function, a head-mounted virtual reality (VR) terminal, an edge node device connected to a display, and an IoT (The Internet of Things (Internet of Things) equipment.

As shown in FIG. 40, it is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention. The interactive terminal is an electronic device 400 with a touch screen function. When the corresponding interactive identifier 402 on the progress bar 401 is read, the electronic The interface of the device 400 may display an interactive prompt message box 403. The user can make a selection according to the content of the interactive prompt information box 403. When the user makes a selection "Yes" trigger operation, the electronic device 400 can generate an image reconstruction instruction at the time of the interactive frame corresponding to the interactive identifier 402 after receiving the feedback. When the user makes a non-triggering operation of selecting “No”, the electronic device 400 can continue to read subsequent video data.

As shown in FIG. 41, it is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention. The interactive terminal is a head-mounted VR terminal 410. When the corresponding interactive identifier 412 on the progress bar 411 is read, the head-mounted The interface of the VR terminal 410 may display an interactive prompt information box 413. The user can make a selection according to the content of the interactive prompt information box 413. When the user makes a "yes" trigger operation (for example, nodding), the head-mounted VR terminal 410 can generate an interactive frame corresponding to the interactive identifier 412 after receiving the feedback. The image reconstruction instruction at the moment, when the user makes a non-triggering operation (for example, shaking the head) of selecting “No”, the head-mounted VR terminal 410 can continue to read subsequent video data.

As shown in FIG. 42, it is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention. The interactive terminal is an edge node device 421 connected to a display 420. When the edge node device 421 reads the corresponding interaction on the progress bar 422 When the indicator 423 is marked, the display 420 may display an interactive prompt message box 424. The user can make a selection according to the content of the interactive prompt information box 424. When the user makes a selection "Yes" trigger operation, the edge node device 421 can generate the image reconstruction instruction at the interactive frame time corresponding to the interactive identifier 423 after receiving the feedback. When the user makes a non-triggering operation of selecting “No”, the edge node device 421 may continue to read subsequent video data.

In a specific implementation, the interactive terminal may establish a communication connection with at least one of the above-mentioned data processing device and server, and may adopt a wired connection or a wireless connection.

As shown in FIG. 43, it is a schematic diagram of the connection of an interactive terminal in an embodiment of the present invention. The edge node device 430 establishes a wireless connection with the

interactive devices

431, 432, and 433 through the Internet of Things.

In a specific implementation, after triggering the interactive identification, the interactive terminal can display images of a multi-angle free view at the specified frame time corresponding to the triggered interactive identification, and determine the virtual viewpoint position information based on the interactive operation, as shown in Figure 44, A schematic diagram of the interactive operation of an interactive terminal in an embodiment of the present invention. The user can operate horizontally or vertically on the interactive operation interface, and the operation track can be a straight line or a curve.

In a specific implementation, as shown in FIG. 45, it is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention. After the user clicks on the interactive identifier, the interactive terminal obtains the interactive data at the specified frame time of the interactive identifier.

If the user does not take a new operation, the triggering operation is an interactive operation, and the corresponding first virtual viewpoint can be determined according to the viewing angle of the image displayed during the interactive operation. If the user takes a new operation, the new operation is an interactive operation, and the corresponding first virtual viewpoint can be determined according to the viewing angle of the image displayed during the interactive operation.

Then, starting from the first virtual view point, images corresponding to each virtual view point can be displayed in sequence according to a preset virtual view point sequence. If the designated frame time corresponds to the same frame time, the obtained multi-angle free-view video data may include multi-angle free-view spatial data of images sorted according to the frame time, and can display static images with multiple-angle free-view views; if the designated frame Time corresponds to different frame moments, and the obtained multi-angle free-view video data can include multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted by frame time, which can display dynamic images with multi-angle free-view angles, namely It shows the frame images of video frames with multiple angles and free viewing angles.

In an embodiment of the present invention, refer to FIGS. 45 and 46. The multi-angle free-view video data obtained by the interactive terminal may include multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted by frame time. The user swipes horizontally to the right to generate an interactive operation and determine the corresponding first Virtual viewpoint, and because different virtual viewpoints can correspond to different multi-angle free-view spatial data and multi-angle free-view time data, as shown in Figure 46, the frame images displayed in the interactive interface occur in time and space with the interactive operation The content displayed in the frame image has changed from the athlete in Figure 45 toward the finish line to the athlete in Figure 46 who is about to cross the finish line, and with the athlete as the target object, the perspective of the frame image has changed from the left view to the front view. .

Similarly, we can get Figures 45 and 47. The content displayed in the frame images changes from the athlete in Figure 45 to the finish line. The athlete in Figure 47 has crossed the finish line. With the athlete as the target object, the perspective of the frame image is from the left. The view becomes the right view.

Similarly, Figures 45 and 48 can be obtained. The user slides up and vertically to generate interactive operations. The content of the frame image changes from the athlete in Figure 45 to the finish line to the athlete in Figure 48 that has crossed the finish line, and the athlete is the target object. , The viewing angle of the frame image is changed from the left view to the top view.

It is understandable that different interactive operations can be obtained according to the user's operation, and the corresponding first virtual viewpoint can be determined according to the image viewing angle displayed during the interactive operation; according to the obtained multi-angle free-view video data, multiple angles of freedom can be displayed The viewing angle of static images or dynamic images is not limited in the embodiment of the present invention.

In a specific implementation, the interactive data may also include at least one of the following: field analysis data, information data of the collection object, information data of the equipment associated with the collection object, information data of the items deployed on site, and logo information displayed on site. Information data.

In an embodiment of the present invention, FIG. 10 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention. After the interactive terminal 100 triggers the interactive identification, it can display images with a multi-angle free view at the specified frame time corresponding to the triggered interactive identification, and can superimpose on-site analysis data on the image (not shown), as shown in FIG. 10 Field analysis data 101 is shown.

In an embodiment of the present invention, FIG. 11 is a schematic diagram of another interactive interface of the interactive terminal in the embodiment of the present invention. After the user triggers the interactive identification, the interactive terminal 110 can display images with a multi-angle free view at the specified frame time corresponding to the triggered interactive identification, and can superimpose the information data of the collection object on the image (not shown), as shown in FIG. The information data 111 of the collection object in 11 is shown.

In an embodiment of the present invention, FIG. 12 is a schematic diagram of another interactive interface of the interactive terminal in the embodiment of the present invention. After the user triggers the interactive identification, the interactive terminal 120 can display images with a multi-angle free view at the specified frame time corresponding to the triggered interactive identification, and can superimpose the information data of the collection object on the image (not shown), as shown in FIG. The information data 121-123 of the collection object in 12 are shown.

In an embodiment of the present invention, FIG. 13 is a schematic diagram of another interactive interface of the terminal in the embodiment of the present invention. After the user triggers the interactive identification, the interactive terminal 130 can display images with a multi-angle free view at a specified frame time corresponding to the triggered interactive identification, and can superimpose the information data of the items deployed on-site on the image (not shown). The information data 131 of the file package in FIG. 13 is shown.

In an embodiment of the present invention, FIG. 14 is a schematic diagram of another interactive interface of the terminal in the embodiment of the present invention. After the interactive terminal 140 triggers the interactive logo, it can display images with a multi-angle free view at the specified frame time corresponding to the triggered interactive logo, and can superimpose the information data of the logo displayed on-site on the image (not shown), such as The logo information data 141 in FIG. 14 is shown.

In this way, the user can obtain more relevant interactive information through the interactive data, and have a more in-depth, comprehensive, and professional understanding of the content being watched, thereby further enhancing the user's interactive experience.

Referring to the schematic structural diagram of another interactive terminal shown in FIG. 39, the interactive terminal 390 may include: a processor 391, a network component 392, a memory 393, and a display component 394; wherein:

The processor 391 is adapted to obtain the data stream to be played in real time through the network component 392, and in response to a triggering operation on an interactive indicator, obtain the interactive data corresponding to the specified frame time of the interactive indicator, wherein the to-be-played data stream is The play data stream includes video data and an interactive identifier, the interactive identifier is associated with a designated frame moment of the data stream to be played, and the interactive data includes multi-angle free view data;

The memory 393 is suitable for storing the data stream to be played obtained in real time;

The display component 394 is suitable for real-time playback of the video and interactive logo showing the data stream to be played based on the data stream to be played acquired in real time, and to perform a multi-angle free view at the specified frame time based on the interactive data Image display.

Among them, the interactive terminal 390 may obtain the interactive data at the specified frame time from the server that stores the interactive data, or obtain multiple frame images corresponding to the specified frame time from the data processing device that stores the frame images, and then generate corresponding Interactive data.

In order to enable those skilled in the art to better understand and implement the embodiments of the present invention, the processing solution on the scene side of the multi-angle free-view video image will be described in further detail below.

Referring to the flowchart of the data processing method shown in FIG. 15, in the embodiment of the present invention, the following steps may be specifically included:

S151: When it is determined that the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is not greater than a preset bandwidth threshold, respectively send a streaming instruction to each acquisition device in the acquisition array, wherein the The collection devices in the collection array are placed in different positions in the field collection area according to the preset multi-angle free viewing angle range.

The multi-angle free viewing angle may refer to the spatial position and viewing angle of the virtual viewpoint that enables the scene to be freely switched. Moreover, the multi-angle free viewing angle range can be determined according to the needs of the application scene.

In a specific implementation, the preset bandwidth threshold can be determined according to the transmission capacity of the transmission network where each collection device in the collection array is located. For example, if the uplink bandwidth of the transmission network is 1000 Mbps, the preset bandwidth value may be 1000 Mbps.

S152: Receive a compressed video data stream transmitted in real time by each acquisition device in the acquisition array based on the pull instruction, where the compressed video data stream is real-time synchronous acquisition and data compression by each acquisition device in the acquisition array from a corresponding angle. obtain.

In specific implementation, the capture device itself can have the function of encoding and packaging, so that the original video data collected from the corresponding angle can be encoded and packaged in real time. The packaging format used by the capture device can be AVI, QuickTime File Format , MPEG, WMV, Real Video, Flash Video, Matroska, etc., or other packaging formats. The encoding format used by the capture device can be H.261, H.263, H.264, H. 265, MPEG, AVS and other encoding formats, or other encoding formats. In addition, the acquisition device can have a compression function. The higher the compression rate, the smaller the amount of compressed data can be made when the amount of data before compression is the same, which can relieve the bandwidth pressure of real-time synchronous transmission. Therefore, the acquisition device can use predictive coding. , Transform coding and entropy coding techniques to improve the compression rate of the video.

Using the above data processing method, it is determined whether the transmission bandwidth matches before streaming, which can avoid data transmission congestion during the streaming process, so that the data collected and compressed by each collection device can be transmitted synchronously in real time, speeding up multi-angle and free-view video data With high processing speed, low-latency playback of multi-angle and free-view videos can be realized under the condition of limited bandwidth resources and data processing resources, reducing implementation costs.

In a specific implementation, the sum of the bit rates of the compressed video data streams pre-transmitted by the collection devices in the collection array can be calculated by obtaining the values of the parameters of the collection devices and is not greater than the preset bandwidth threshold. For example, the acquisition array can contain 40 acquisition devices, and the code rate of the compressed video data stream of each acquisition device is 15Mbps, then the overall code rate of the acquisition array is 15*40=600Mbps. If the preset bandwidth threshold is 1000Mbps, Then it is determined that the sum of the bit rates of the compressed video data streams pre-transmitted by the collection devices in the collection array is not greater than the preset bandwidth threshold. Then, according to the IP addresses of the 40 acquisition devices in the acquisition array, a streaming command can be sent to each acquisition device respectively.

In the specific implementation, in order to ensure that the values of the parameters of the collection devices in the collection array are unified, so that the collection devices can realize real-time synchronous collection and data compression, you can set up before sending a pull command to each collection device in the collection array. The value of the parameter of each collection device in the collection array. Wherein, the parameters of the acquisition device may include: acquisition parameters and compression parameters, and each acquisition device in the acquisition array is obtained by synchronous acquisition and data compression in real time from a corresponding angle according to the set value of the parameters of each acquisition device. The sum of the bit rates of the compressed video data stream is not greater than the preset bandwidth threshold.

Since the acquisition parameters and compression parameters are complementary to each other, when the value of the compression parameter is unchanged, the data size of the original video data can be reduced by setting the value of the acquisition parameter, so that the time of data compression processing is shortened; in the value of the acquisition parameter In the case of unchanged, setting the value of the compression parameter can correspondingly reduce the amount of compressed data, so that the data transmission time is shortened. For another example, setting a higher compression rate can save transmission bandwidth, and setting a lower sampling rate can also save transmission bandwidth. Therefore, the acquisition parameters and/or compression parameters can be set according to the actual situation.

Therefore, before starting to pull the stream, you can set the value of the parameters of each acquisition device in the acquisition array to ensure that the values of the parameters of each acquisition device in the acquisition array are unified, and each acquisition device can synchronize acquisition and data compression from the corresponding angle in real time. And the sum of the bit rates of the obtained compressed video data streams is not greater than the preset bandwidth threshold, so that network congestion can be avoided, and low-latency playback of multi-angle free-view videos can also be achieved in the case of limited bandwidth resources.

In a specific embodiment, the acquisition parameters may include focal length parameters, exposure parameters, resolution parameters, encoding rate parameters, and encoding format parameters, etc. The compression parameters may include compression rate parameters, compression format parameters, etc., by setting the values of different parameters , To obtain the most suitable value for the transmission network where each collection device is located.

In order to simplify the setup process and save setup time, before setting the values of the parameters of the collection devices in the collection array, you can first determine that the collection devices in the collection array perform collection and data compression according to the values of the parameters that have been set. Whether the sum of the code rates of the compressed video data stream is greater than the preset bandwidth threshold, and when the sum of the obtained compressed video data streams’ code rates is greater than the preset bandwidth threshold, a pull request is sent to each collection device in the collection array. Before the flow command, the value of the parameter of each collection device in the collection array can be set. It is understandable that, in specific implementation, the value of the acquisition parameter and the value of the compression parameter can also be set according to imaging quality requirements such as the resolution of the multi-angle free-view image to be displayed.

In a specific implementation, the process from transmission to writing of the compressed video data stream obtained by each acquisition device occurs continuously. Therefore, before the pull instruction is sent to each acquisition device in the acquisition array, it is also possible to determine the Whether the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than the preset writing speed threshold, and the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array When it is greater than the preset writing speed threshold, the value of the parameter of each acquisition device in the acquisition array can be set, so that each acquisition device in the acquisition array is based on the set value of the parameter of each acquisition device from a corresponding angle The sum of the code rates of the compressed video data stream obtained by real-time synchronous collection and data compression is not greater than the preset writing speed threshold.

In a specific implementation, the preset writing speed threshold may be determined according to the data storage writing speed of the storage medium. For example, if the upper limit of the data storage writing speed of the solid state disk (Solid State Disk or Solid State Drive, SSD) of the data processing device is 100 Mbps, the preset writing speed threshold may be 100 Mbps.

With the above solution, before starting to pull the stream, it can be ensured that the sum of the code rates of the compressed video data stream obtained by the acquisition devices from the corresponding angle in real-time synchronous acquisition and data compression is not greater than the preset writing speed threshold, which can avoid Data writing congestion ensures that the compressed video data stream is unblocked during the collection, transmission, and writing process, so that the compressed video stream uploaded by each collection device can be processed in real time, thereby realizing multi-angle and free-view video playback.

In specific implementation, the compressed video data stream obtained by each collection device can be stored. When a video frame interception instruction is received, the frame-level synchronized video frames in each compressed video data stream can be intercepted according to the received video frame interception instruction, and the intercepted video frames are synchronously uploaded to the designated target terminal.

Wherein, the designated target terminal may be a preset target terminal, or may be a target terminal designated by a video frame interception instruction. In addition, the captured video frame may be encapsulated first, and uploaded to the designated target terminal through a network transmission protocol, and then analyzed to obtain a frame-level synchronized video frame in the corresponding compressed video data stream.

As a result, the subsequent processing of the video frames intercepted by the compressed video data stream is handed over to the designated target end, which can save network transmission resources, reduce the pressure and difficulty of deploying a large number of server resources on site, and greatly reduce the data processing load. , To shorten the transmission delay of multi-angle free-view video frames.

In a specific implementation, in order to ensure that the frame-level synchronized video frames in each compressed video data stream are intercepted, as shown in FIG. 16, the following steps may be included:

S161: Determine one of the compressed video data streams of each acquisition device in the acquisition array received in real time as a reference data stream;

S162. Based on the received video frame interception instruction, determine the video frame to be intercepted in the reference data stream, and select the remaining compressed video data streams that are synchronized with the video frame to be intercepted in the reference data stream. Video frame, as the video frame to be intercepted in the remaining compressed video data streams;

S163: Intercept video frames to be intercepted in each compressed video data stream.

In order to enable those skilled in the art to better understand and implement the embodiments of the present invention, the following describes in detail how to determine the video frames to be intercepted in each compressed video data stream through a specific application scenario.

In an embodiment of the present invention, the acquisition array may include 40 acquisition devices, and therefore, 40 channels of compressed video data streams can be received in real time. It is assumed that in the compressed video data streams of each acquisition device in the acquisition array received in real time, Determine the compressed video data stream A1 corresponding to the acquisition device A1' as the reference data stream, and then, based on the feature information X of the object in the video frame that is indicated in the received video frame interception instruction, determine that the reference data stream corresponds to the The video frame a1 with the same feature information X of the object is taken as the video frame to be intercepted, and then according to the feature information x1 of the object in the video frame a1 to be intercepted in the reference data stream, the remaining compressed video data streams A2-A40 are selected The video frames a2-a40 consistent with the feature information x1 of the object are used as video frames to be intercepted in the remaining compressed video data streams.

The feature information of the object may include at least one of shape feature information, color feature information, and position feature information. The feature information X of the object in the video frame indicated to be intercepted in the video frame interception instruction may be the same as the feature information x1 of the object in the video frame a1 to be intercepted in the reference data stream. Representation, for example, the feature information X and x1 of the object are two-dimensional feature information; the feature information X of the object and the feature information x1 of the object can also be different representations of the feature information of the same object, for example, the feature of the object The information X can be two-dimensional feature information, and the feature information x1 of the object can be three-dimensional feature information. Moreover, a similarity threshold can be preset. When the similarity threshold is met, it can be considered that the feature information X of the object is consistent with x1, or the feature information x1 of the object is the same as the feature information x2-x40 of the object in the other compressed video data streams A2-A40. Unanimous.

The specific representation method and similarity threshold of the feature information of the object may be determined according to the preset multi-angle free viewing angle range and the scene of the scene, which is not limited in the embodiment of the present invention.

In another embodiment of the present invention, the acquisition array may contain 40 acquisition devices, therefore, 40 channels of compressed video data streams can be received in real time. It is assumed that the compressed video data streams of each acquisition device in the acquisition array are received in real time. , Determine the compressed video data stream B1 corresponding to the acquisition device B1' as the reference data stream, and then, based on the time stamp information Y indicating the intercepted video frame in the received video frame interception instruction, determine that the reference data stream corresponds to the The video frame b1 corresponding to the time stamp information Y is used as the video frame to be intercepted, and then according to the time stamp information y1 in the video frame b1 to be intercepted in the reference data stream, the remaining compressed video data streams B2-B40 and Video frames b2-b40 with the same time stamp information y1 are used as video frames to be intercepted in the remaining compressed video data streams.

Wherein, the time stamp information Y of the video frame indicated to be intercepted in the video frame interception instruction may have a certain error with the time stamp information y1 in the video frame b1 to be intercepted in the reference data stream. For example, the The time stamp information corresponding to the video frame in the reference data stream is inconsistent with the time stamp information Y. If there is an error of 0.1ms, an error range can be preset. For example, if the error range is ±1ms, the 0.1ms error is within the error range Therefore, the video frame b1 corresponding to the time stamp information y1 that differs from the time stamp information Y by 0.1 ms can be selected as the video frame to be intercepted in the reference data stream. The specific error range and the selection rule of the time stamp information y1 in the reference data stream can be determined according to the on-site collection equipment and transmission network, which is not limited in this embodiment.

It is understandable that the method for determining the video frame to be intercepted in each compressed video data stream in the above-mentioned embodiment can be used alone or simultaneously, which is not limited in the embodiment of the present invention.

By using the above data processing method, the data processing device can smoothly and smoothly pull the data collected and compressed by each collection device.

In the following, in conjunction with the drawings in the embodiments of the present specification, the technical solution for data processing of the acquisition array in the embodiments of the present specification will be described clearly and completely.

Referring to the flowchart of the data processing method shown in FIG. 17, in the embodiment of the present invention, the following steps may be specifically included:

S171, the collection devices placed in different positions of the field collection area according to the preset multi-angle free viewing angle range in the collection array collect raw video data synchronously in real time from corresponding angles, and perform real-time data compression on the collected raw video data. Obtain the corresponding compressed video data stream.

S172: When the data processing device connected to the collection array link determines that the sum of the code rates of the compressed video data streams pre-transmitted by the collection devices in the collection array is not greater than a preset bandwidth threshold, respectively, send the data to the collection Each collection device in the array sends a pull command.

In a specific implementation, the preset bandwidth threshold can be determined according to the transmission capacity of the transmission network where each collection device in the collection array is located. For example, if the uplink bandwidth of the transmission network is 1000 Mbps, the preset bandwidth value can be 1000 Mbps.

S173: Based on the streaming instruction, each acquisition device in the acquisition array transmits the obtained compressed video data stream to the data processing device in real time.

In specific implementation, the data processing device can be set according to actual scenarios. For example, when there is a suitable space on site, the data processing equipment can be placed in a non-collection area on site as a field server; when there is no suitable space on site, the data processing equipment can be placed in the cloud as a cloud server.

With the above solution, when the data processing device connected to the collection array link determines that the sum of the code rates of the compressed video data streams pre-transmitted by the collection devices in the collection array is not greater than the preset bandwidth threshold, they send the Each collection device in the collection array sends a streaming instruction so that the data collected and compressed by each collection device can be transmitted synchronously in real time, so that real-time streaming can be performed through the transmission network where it is located, and data transmission congestion during the streaming process can be avoided; Then, each acquisition device in the acquisition array transmits the obtained compressed video data stream to the data processing device in real time based on the pull instruction. Since the data transmitted by each acquisition device is compressed, the bandwidth pressure of real-time synchronous transmission can be relieved , To speed up the processing speed of multi-angle free-view video data.

As a result, it is possible to avoid arranging a large number of servers on site for data processing, and there is no need to collect the raw data through the SDI capture card, and then process the raw data through the computing server in the on-site computer room, avoiding the use of expensive SDI video transmission lines The cable and SDI interface are used for data transmission through a common transmission network. With limited bandwidth resources and data processing resources, low-latency playback of multi-angle and free-view video can be realized, reducing implementation costs.

In specific implementations, in order to simplify the setup process and save setup time, before setting the values of the parameters of each collection device in the collection array, the data processing device may first determine the performance of each collection device in the collection array according to the set parameters. Whether the sum of the code rates of the compressed video data stream obtained by data collection and data compression is greater than the preset bandwidth threshold, when the sum of the code rates of the obtained compressed video data stream is greater than the preset bandwidth threshold, the data processing device can set The value of the parameter of each acquisition device in the acquisition array is then sent to each acquisition device in the acquisition array respectively to pull the current instruction.

In the specific implementation, the process from transmission to writing of the compressed video data stream obtained by each acquisition device occurs continuously, and it is also necessary to ensure that the data processing equipment is unblocked when writing the compressed video data stream obtained by each acquisition device. Before sending a streaming instruction to each acquisition device in the acquisition array, the data processing device may also determine whether the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than a preset writing speed threshold , And when the sum of the bit rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than the preset writing speed threshold, the data processing device may set the value of the parameter of each acquisition device in the acquisition array , So that each acquisition device in the acquisition array, according to the set values of the parameters of each acquisition device, from a corresponding angle in real-time synchronous acquisition and data compression, the sum of the code rates of the compressed video data stream obtained is not greater than the preset Write speed threshold.

In a specific implementation, the preset writing speed threshold may be determined according to the data storage writing speed of the data processing device.

In specific implementation, data can be transmitted between each collection device in the collection array and the data processing device through at least one of the following methods:

1. Data transmission through the switch;

Each collection device in the collection array is connected to the data processing device through a switch. The switch can aggregate and uniformly transmit the compressed video data streams of more collection devices to the data processing device, which can reduce the number of ports supported by the data processing device. Quantity. For example, the switch supports 40 inputs, so the data processing device can simultaneously receive the video stream of the collection array composed of 40 collection devices through the switch, thereby reducing the number of data processing devices.

2. Data transmission via local area network.

Each acquisition device in the acquisition array is connected to the data processing device through a local area network. The local area network can transmit the compressed video data stream of the acquisition device to the data processing device in real time, reducing the number of ports supported by the data processing device, thereby reducing data The number of processing equipment.

In a specific implementation, the data processing device may store the compressed video data stream obtained by each collection device (may be a buffer), and when the video frame interception instruction is received, the data processing device may store the compressed video data stream according to the received video frame The video frame interception instruction intercepts the frame-level synchronized video frames in each compressed video data stream, and synchronously uploads the intercepted video frames to the designated target terminal.

Wherein, the data processing device may establish a connection with a target terminal through a port or an IP address in advance, and may also synchronously upload the captured video frame to the port or IP address specified by the video frame capture instruction. In addition, the data processing device may first encapsulate the captured video frame and upload it to the designated target terminal through a network transmission protocol, and then perform analysis to obtain frame-level synchronized video frames in the corresponding compressed video data stream.

By adopting the above solution, the compressed video data stream obtained by the real-time synchronous collection and data compression of each collection device in the collection array can be uniformly transmitted to the data processing device. After receiving the video frame interception instruction, the data processing device will intercept the frame after receiving the video frame interception instruction. In the preliminary processing, the frame-level synchronized video frames of each intercepted compressed video data stream can be synchronously uploaded to the designated target end, and the subsequent processing of the video frames intercepted by the compressed video data stream can be handed over to the designated target end, Therefore, network transmission resources can be saved, the pressure and difficulty of on-site deployment can be reduced, the data processing load can be greatly reduced, and the transmission delay of multi-angle free-view video frames can be shortened.

In a specific implementation, in order to intercept the frame-level synchronized video frames in each compressed video data stream, the data processing device may first determine one of the compressed video data streams of each acquisition device in the acquisition array received in real time. Stream as a reference data stream, and then, the data processing device may determine the video frame to be intercepted in the reference data stream based on the received video frame interception instruction, and select the video frame to be intercepted in the reference data stream. The video frames in the remaining compressed video data streams synchronized with the video frames are used as video frames to be intercepted in the remaining compressed video data streams. Finally, the data processing device intercepts the video frames to be intercepted in each compressed video data stream. For the specific frame cutting method, please refer to the example of the foregoing embodiment, which will not be repeated here.

The embodiment of the present invention also provides a data processing device corresponding to the data processing method in the above-mentioned embodiment. In order to enable those skilled in the art to better understand and implement the embodiment of the present invention, the following is a detailed introduction through specific embodiments with reference to the drawings. .

Referring to the schematic structural diagram of the data processing device shown in FIG. 18, in the embodiment of the present invention, as shown in FIG. 18, the data processing device 180 may include:

The first transmission matching unit 181 is adapted to determine whether the sum of the code rates of the compressed video data streams pre-transmitted by the collection devices in the collection array is not greater than a preset bandwidth threshold, wherein each collection device in the collection array is based on a preset The multi-angle free viewing angle range is placed at different locations in the field collection area.

The instruction sending unit 182 is adapted to send a pull signal to each acquisition device in the acquisition array when it is determined that the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is not greater than a preset bandwidth threshold. Stream instructions.

The data stream receiving unit 183 is adapted to receive a compressed video data stream transmitted in real time by each acquisition device in the acquisition array based on the pull stream instruction, and the compressed video data stream is a corresponding angle for each acquisition device in the acquisition array. Real-time synchronous acquisition and data compression are obtained.

Using the above-mentioned data processing equipment, before sending a streaming instruction to each acquisition device in the acquisition array, it is determined whether the transmission bandwidth matches, which can avoid data transmission congestion during the streaming process, so that the data collected and compressed by each acquisition device can be Real-time synchronous transmission speeds up the processing speed of multi-angle free-view video data, realizes multi-angle free-view video with limited bandwidth resources and data processing resources, and reduces implementation costs.

In an embodiment of the present invention, as shown in FIG. 18, the data processing device 180 may further include:

The first parameter setting unit 184 is adapted to set the value of the parameter of each collection device in the collection array before sending a current drawing instruction to each collection device in the collection array respectively;

Wherein, the parameters of the acquisition device may include: acquisition parameters and compression parameters, and each acquisition device in the acquisition array is obtained by synchronous acquisition and data compression in real time from a corresponding angle according to the set value of the parameters of each acquisition device. The sum of the bit rates of the compressed video data stream is not greater than the preset bandwidth threshold.

In an embodiment of the present invention, in order to simplify the setup process and save setup time, as shown in FIG. 18, the data processing device 180 may further include:

The second transmission matching unit 185 is adapted to determine the compressed video obtained by collecting and data compression of each collecting device in the collecting array according to the set parameter value before setting the value of the parameter of each collecting device in the collecting array Whether the sum of the bit rates of the data stream is not greater than the preset bandwidth threshold.

The writing matching unit 186 is adapted to determine whether the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than a preset writing speed threshold;

The second parameter setting unit 187 is adapted to set the data rate of each acquisition device in the acquisition array when the sum of the bit rates of the compressed video data streams pre-transmitted by the acquisition devices in the acquisition array is greater than a preset writing speed threshold. The value of the parameter, so that each acquisition device in the acquisition array, according to the set value of the parameter of each acquisition device, from a corresponding angle in real-time synchronous acquisition and data compression, the sum of the code rates of the compressed video data stream obtained is not greater than the said The preset write speed threshold.

Therefore, before starting to pull the stream, it can be ensured that the sum of the code rates of the compressed video data stream obtained by the acquisition devices from the corresponding angle in real-time synchronous acquisition and data compression is not greater than the preset writing speed threshold, thereby avoiding data writing. Incoming congestion ensures that the link of the compressed video data stream is unblocked during the collection, transmission and writing process, so that the compressed video stream uploaded by each collection device can be processed in real time, thereby realizing the playback of multi-angle and free-view video.

The frame interception processing unit 188 is adapted to intercept the frame-level synchronized video frames in each compressed video data stream according to the received video frame interception instruction;

The uploading unit 189 is adapted to synchronously upload the captured video frames to the designated target terminal.

Wherein, the designated target terminal may be a preset target terminal, or may be a target terminal designated by a video frame interception instruction.

As a result, the subsequent processing of the video frames intercepted by the compressed video data stream is handed over to the designated target end, which can save network transmission resources, reduce the pressure and difficulty of on-site deployment, and can also greatly reduce the data processing load and shorten the cost. The transmission delay of the angle-free view video frame.

In an embodiment of the present invention, as shown in FIG. 18, the frame cutting processing unit 188 may include:

The reference data stream selection subunit 1881 is adapted to determine one of the compressed video data streams of each collection device in the collection array received in real time as a reference data stream;

The video frame selection subunit 1882 is adapted to determine the video frame to be intercepted in the reference data stream based on the received video frame interception instruction, and select the remaining video frames that are synchronized with the video frame to be intercepted in the reference data stream The video frame in each compressed video data stream is used as the video frame to be intercepted in the remaining compressed video data streams;

The video frame interception subunit 1883 is adapted to intercept the video frames to be intercepted in each compressed video data stream.

In an embodiment of the present invention, as shown in FIG. 18, the video frame selection subunit 1882 may include at least one of the following:

The first video frame selection module 18821 is adapted to select video frames consistent with the feature information of the object in the remaining compressed video data streams according to the feature information of the object in the video frame to be intercepted in the reference data stream, as Video frames to be intercepted of the remaining compressed video data streams;

The second video frame selection module 18822 is adapted to select video frames consistent with the time stamp information in the remaining compressed video data streams according to the time stamp information of the video frames to be intercepted in the reference data stream, as the remaining video frames Compress the video frame to be intercepted of the video data stream.

The embodiment of the present invention also provides a data processing system corresponding to the above-mentioned data processing method. The above-mentioned data processing device is used to realize real-time reception of multiple compressed video data streams. In order to enable those skilled in the art to better understand and implement the embodiments of the present invention, The detailed description will be given below through specific embodiments with reference to the drawings.

Referring to the schematic structural diagram of the data processing system shown in FIG. 19, in the embodiment of the present invention, the data processing system 190 may include: an acquisition array 191 and a data processing device 192. The acquisition array 191 includes a preset multi-angle free viewing angle. Multiple collection devices located at different locations in the field collection area, including:

Each collection device in the collection array 191 is adapted to collect raw video data synchronously in real time from corresponding angles, and perform real-time data compression on the collected raw image data respectively to obtain compressed video data streams synchronously collected in real time from corresponding angles, And based on the streaming instruction sent by the data processing device 192, the obtained compressed video data stream is transmitted to the data processing device 192 in real time;

The data processing device 192 is adapted to, when it is determined that the sum of the code rates of the compressed video data streams pre-transmitted by the collection devices in the collection array is not greater than a preset bandwidth threshold, respectively send data to each collection in the collection array 191. The device sends a streaming instruction, and receives the compressed video data stream transmitted in real time by each collection device in the collection array 191.

With the above solution, it is possible to avoid arranging a large number of servers on site for data processing, and there is no need to collect the raw data through the SDI capture card, and then process the raw data through the computing server in the on-site computer room, which can avoid the use of expensive SDI video transmission. The cable and SDI interface are used for data transmission and streaming through a common transmission network. With limited bandwidth resources and data processing resources, low-latency playback of multi-angle free-view video is realized, and implementation costs are reduced.

In an embodiment of the present invention, the data processing device 192 is further adapted to set the value of the parameter of each collection device in the collection array before sending a pull command to each collection device in the collection array 191 respectively;

Wherein, the parameters of the acquisition device include: acquisition parameters and compression parameters, and each acquisition device in the acquisition array, according to the set values of the parameters of each acquisition device, synchronizes acquisition and data compression in real time from a corresponding angle. The sum of the bit rates of the video data streams is not greater than the preset bandwidth threshold.

Therefore, before starting to pull the stream, the data processing device can set the value of the parameter of each acquisition device in the acquisition array to ensure that the value of the parameter of each acquisition device in the acquisition array is unified, and each acquisition device can synchronize in real time from the corresponding angle. And data compression, and the sum of the bit rate of the compressed video data stream obtained is not greater than the preset bandwidth threshold, so that network congestion can be avoided, and low-latency playback of multi-angle free-view video can also be achieved when bandwidth resources are limited. .

In an embodiment of the present invention, the data processing device 192 determines the compressed video data stream pre-transmitted by each acquisition device in the acquisition array 191 before sending a streaming instruction to each acquisition device in the acquisition array 191. Whether the sum of code rates is greater than the preset writing speed threshold, and when the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array 191 is greater than the preset writing speed threshold, set the Collect the values of the parameters of the collection devices in the collection array 191, so that the collection devices in the collection array 192 can synchronously collect and compress the compressed video data stream obtained from the corresponding angle in real time according to the set values of the parameters of the collection devices. The sum of the bit rates is not greater than the preset writing speed threshold.

Therefore, before starting to pull the stream, it can be ensured that the sum of the bit rates of the compressed video data stream obtained by the acquisition devices from the corresponding angle in real-time synchronous acquisition and data compression is not greater than the preset writing speed threshold, thereby avoiding data processing The data writing of the device is congested to ensure that the compressed video data stream is unblocked during the collection, transmission and writing process, so that the compressed video stream uploaded by each collection device can be processed in real time, thereby realizing the playback of multi-angle and free-view video .

In a specific implementation, each collection device in the collection array and the data processing device are adapted to be connected through a switch and/or a local area network.

In an embodiment of the present invention, the data processing system 190 may further include a designated target terminal 193.

The data processing device 192 is adapted to intercept frame-level synchronized video frames in each compressed video stream according to the received video frame interception instruction, and synchronously upload the intercepted video frames to the designated target terminal 193;

The designated target terminal 193 is adapted to receive the video frame obtained by the data processing device 192 based on the video frame interception instruction.

Wherein, the data processing device may establish a connection with a target terminal through a port or an IP address in advance, and may also synchronously upload the captured video frame to the port or IP address specified by the video frame capture instruction.

With the above solution, the compressed video data stream obtained by real-time synchronous collection and data compression of each collection device in the collection array can be uniformly transmitted to the data processing device. After receiving the video frame interception instruction, the data processing device will intercept the frame after receiving the video frame interception instruction. In the preliminary processing, the frame-level synchronized video frames in each intercepted compressed video data stream can be synchronously uploaded to the specified target end, and the subsequent processing of the video frames intercepted by the compressed video data stream can be handed over to the specified target end, thereby It can save network transmission resources and reduce the pressure and difficulty of on-site deployment. It can also greatly reduce the data processing load and shorten the transmission delay of multi-angle free-view video frames.

In an embodiment of the present invention, the data processing device 192 is adapted to determine that one of the compressed video data streams of each acquisition device in the acquisition array 191 received in real time is used as a reference data stream; and based on the received compressed video data stream; The video frame interception instruction determines the video frame to be intercepted in the reference data stream, and selects the video frames in the remaining compressed video data streams that are synchronized with the video frame to be intercepted in the reference data stream, as the remaining video frames The video frames to be intercepted in each compressed video data stream; finally, the video frames to be intercepted in each compressed video data stream are intercepted.

In order to enable those skilled in the art to better understand and implement the embodiments of the present invention, the frame synchronization solution between the data processing device and the collection device will be described in detail below through specific embodiments.

Referring to the flowchart of the data synchronization method shown in FIG. 20, in the embodiment of the present invention, the following steps may be specifically included:

S201: Send a pull instruction to each collection device in the collection array, where each collection device in the collection array is placed in a different position of the field collection area according to a preset multi-angle free viewing angle range, and each collection device in the collection array The equipment simultaneously collects the video data stream in real time from the corresponding angle.

In a specific implementation, in order to achieve synchronization of pulling streams, there may be multiple implementation manners. For example, it is possible to send a current drawing instruction to each acquisition device in the acquisition array at the same time; or, it is also possible to send a drawing instruction only to the main acquisition device in the acquisition array to trigger the drawing of the main acquisition device, and then the main acquisition device will The streaming instruction is synchronized to all slave acquisition devices and triggers all slave acquisition devices to pull streaming.

S202: Receive, in real time, the video data streams respectively transmitted by each acquisition device in the acquisition array based on the streaming instruction, and determine whether the video data streams respectively transmitted by the acquisition devices in the acquisition array are frame-level synchronized.

In a specific implementation, the acquisition device itself may have the function of encoding and packaging, so that the original video data collected from the corresponding angle in real time can be encoded and packaged in real time. In addition, each collection device can also have a compression function. The higher the compression rate, the smaller the amount of compressed data can be made when the amount of data before compression is the same, which can relieve the bandwidth pressure of real-time synchronous transmission. Therefore, the collection device can use Predictive coding, transform coding, entropy coding and other technologies improve the compression rate of video.

S203: When the video data streams respectively transmitted by each acquisition device in the acquisition array are not synchronized at the frame level, re-send the pull instruction to each acquisition device in the acquisition array, until each acquisition device in the acquisition array Frame-level synchronization between separately transmitted video data streams.

Using the above data synchronization method, by determining whether the video data streams transmitted by each acquisition device in the acquisition array are frame-level synchronized, it can ensure the synchronous transmission of multiple channels, thereby avoiding the problem of missing frames and multi-frame transmission, and improving data processing Speed to meet the needs of low-latency playback of multi-angle free-view video.

In a specific implementation, when each collection device in the collection array is manually started, there is a start-up time error, and it is possible that the video data stream may not be collected at the same time. Therefore, at least one of the following methods can be adopted to ensure that each collection device in the collection array is set to collect the video data stream synchronously in real time from a corresponding angle:

1. When at least one acquisition device acquires the acquisition start instruction, the acquisition device that has acquired the acquisition start instruction synchronizes the acquisition start instruction to other acquisition devices, so that each acquisition device in the acquisition array is based on The collection start instruction starts to synchronously collect the video data stream in real time from the corresponding angle.

For example, the acquisition array may contain 40 acquisition devices, and when the acquisition device A1 acquires the acquisition start instruction, the acquisition device A1 synchronously sends the acquired acquisition start instruction to the other acquisition devices A2-A40. After all the collection devices receive the collection start instruction, each collection device starts to synchronously collect the video data stream from a corresponding angle in real time based on the collection start instruction. Since the data transmission speed between the collection devices is much faster than the speed of manual start, the start time error caused by manual start can be reduced.

2. Each acquisition device in the acquisition array synchronously acquires a video data stream in real time from a corresponding angle based on a preset clock synchronization signal.

For example, a clock signal synchronization device can be set, and each collection device can be connected to the clock signal synchronization device. When the clock signal synchronization device receives a trigger signal (such as a synchronous acquisition start instruction), the clock signal synchronization device can A clock synchronization signal is transmitted to each collection device, and each collection device starts to synchronously collect the video data stream from a corresponding angle in real time based on the clock synchronization signal. Since the clock signal transmitting device can transmit a clock synchronization signal to each collection device based on a preset trigger signal, so that each collection device can collect synchronously, and is not susceptible to interference from external conditions and manual operations. Therefore, the synchronization accuracy of each collection device can be improved And synchronization efficiency.

In the specific implementation, due to the influence of the network transmission environment, the collection devices in the collection array may not receive the pull command at the same time, and there may be a time difference of several milliseconds or less between the collection devices, resulting in each collection The video data stream transmitted by the device in real time is not synchronized. As shown in Figure 21, the acquisition array contains acquisition devices 1 and 2. The acquisition parameter settings of acquisition devices 1 and 2 are the same, and the acquisition frame rate is X fps, and the acquisition device 1 Synchronous acquisition with the video frame of 2 acquisition.

The acquisition interval T of each frame in acquisition devices 1 and 2 is

Suppose that at time t0 the data processing device sends a streaming instruction r, the acquisition device 1 receives the streaming instruction r at t1, and the acquisition device 2 receives the streaming instruction r at t2. If the acquisition devices 1 and 2 are both in the same collection interval If it is received within T, it can be considered that the acquisition devices 1 and 2 have received the streaming instruction at the same time, and the acquisition devices 1 and 2 can respectively transmit frame-level synchronized video data streams; if the acquisition devices 1 and 2 do not receive in the same acquisition interval Then, it can be considered that the acquisition devices 1 and 2 have not received the streaming instruction at the same time, and the acquisition devices 1 and 2 cannot realize the synchronous transmission of the frame-level video data stream. The frame-level synchronization of video data stream transmission can also be referred to as pull stream synchronization. Once the pull stream synchronization is achieved, it will automatically continue until the pull stream stops.

The reasons why the frame-level synchronized video data stream cannot be transmitted can be:

1) It is necessary to send a pull command to each collection device respectively;

2) There is a delay when the local area network transmits the pull command.

Therefore, at least one of the following methods can be used to determine whether the video data streams respectively transmitted by the collection devices in the collection array are frame-level synchronized:

1. When acquiring the Nth frame of the video data stream transmitted by each acquisition device in the acquisition array, the characteristic information of the object of the Nth frame of each video data stream can be matched, when the Nth frame of each video data stream is When the feature information of the object meets the preset similarity threshold, it is determined that the feature information of the object in the Nth frame of the video data stream transmitted by each collection device in the collection array is consistent, and then the video data streams transmitted by each collection device are consistent. Frame level synchronization.

Where N is an integer not less than 1, the feature information of the object of the Nth frame of each video data stream may include at least one of shape feature information, color feature information, and position feature information.

2. The time stamp information of the Nth frame of each video data stream can be matched when the Nth frame of the video data stream respectively transmitted by each acquisition device in the acquisition array is obtained, where N is an integer not less than 1. When the time stamp information of the Nth frame of each video data stream is consistent, it is determined that the frame-level synchronization between the video data streams respectively transmitted by each collection device is synchronized.

When the video data streams transmitted by each acquisition device in the acquisition array are not synchronized at the frame level, re-send streaming instructions to each acquisition device in the acquisition array, and at least one of the above methods can be used to determine whether the frame level is Synchronize until the frame-level synchronization between the video data streams respectively transmitted by each acquisition device in the acquisition array.

In a specific implementation, the video frames in the video data stream of each acquisition device can also be intercepted and transmitted to the designated destination. In order to ensure the frame-level synchronization of the intercepted video frames, as shown in Figure 22, the following steps can be included:

S221: Determine one of the video data streams of the video data streams of each acquisition device in the acquisition array received in real time as a reference data stream.

S222: Based on the received video frame interception instruction, determine the video frame to be intercepted in the reference data stream, and select videos in the remaining video data streams that are synchronized with the video frame to be intercepted in the reference data stream Frame, as the video frame to be intercepted in the remaining video data streams.

S223: Intercept video frames to be intercepted in each video data stream.

S224: Synchronously upload the captured video frames to the designated target terminal.

By adopting the above-mentioned solution, it is possible to achieve frame-cutting synchronization, improve frame-cutting efficiency, further improve the display effect of the generated multi-angle free-view video, and enhance user experience. In addition, the coupling between the process of selecting and intercepting video frames and the process of generating multi-angle free-view videos can be reduced, and the independence between each process can be enhanced, which is convenient for later maintenance. The intercepted video frames can be synchronized to the specified designated At the target end, it can save network transmission resources and reduce data processing load, and increase the speed of data processing to generate multi-angle free-view videos.

In order to enable those skilled in the art to better understand and implement the embodiments of the present invention, the following uses specific application examples to describe in detail how to determine the video frames to be intercepted in each video data stream.

One way is to select the video frame consistent with the feature information of the object in the remaining video data streams according to the feature information of the object in the video frame to be intercepted in the reference data stream as the waiting data of the remaining video data streams. The captured video frame.

For example, the acquisition array contains 40 acquisition devices, so 40 video data streams can be received in real time. It is assumed that the video data corresponding to the acquisition device A1' is determined in the video data streams of each acquisition device in the acquisition array received in real time. Stream A1 is used as the reference data stream, and then, based on the feature information X of the object in the video frame that is indicated in the received video frame interception instruction, the video frame a1 in the reference data stream that is consistent with the feature information X of the object is determined As the video frame to be intercepted, according to the feature information x1 of the object in the video frame a1 to be intercepted in the reference data stream, select the video frame a2 that is consistent with the feature information x1 of the object in the remaining video data streams A2-A40 -a40, as the video frame to be intercepted for the remaining video data streams.

The feature information of the object may include shape feature information, color feature information, position feature information, etc.; the feature information X of the object in the video frame that is instructed to be intercepted in the video frame interception instruction is the same as that of the object to be intercepted in the reference data stream. The feature information x1 of the object in the video frame a1 can be the same way of representing the feature information of the same object, for example, the feature information X and x1 of the object are both two-dimensional feature information; the feature information X of the object and the feature information of the object x1 can also be different representations of the feature information of the same object. For example, the feature information X of the object can be two-dimensional feature information, and the feature information x1 of the object can be three-dimensional feature information. In addition, a similarity threshold can be preset. When the similarity threshold is met, it can be considered that the feature information X of the object is consistent with x1, or the feature information x1 of the object is consistent with the feature information x2-x40 of the object in the remaining video data streams A2-A40. .

The specific representation method and similarity threshold of the feature information of the object can be determined according to the preset multi-angle free viewing angle range and the scene of the scene, which is not limited in this embodiment.

Another way is to select video frames consistent with the time stamp information in the remaining video data streams according to the time stamp information of the video frames in the reference data stream as the video to be intercepted in the remaining video data streams frame.

For example, the acquisition array can contain 40 acquisition devices, so 40 video data streams can be received in real time. It is assumed that the video data corresponding to the acquisition device B1 is determined in the video data stream of each acquisition device in the acquisition array received in real time. Stream B1 is used as the reference data stream, and then, based on the time stamp information Y of the video frame that is indicated to be intercepted in the received video frame interception instruction, the video frame b1 corresponding to the time stamp information Y in the reference data stream is determined as the to-be The intercepted video frames are then selected according to the time stamp information y1 in the video frame b1 to be intercepted in the reference data stream, and the video frames b2-b40 that are consistent with the time stamp information y1 in the remaining video data streams B2-B40, As the video frame to be intercepted for the remaining video data streams.

It can be understood that the method for determining the video frame to be intercepted in each video data stream in the foregoing embodiment can be used alone or simultaneously, and the embodiment of the present invention does not limit it.

By adopting the above solution, the efficiency and result accuracy of synchronous selection and synchronous interception of video frames can be improved, so that the integrity and synchronization of transmitted data can be improved.

The embodiment of the present invention also provides a data processing device corresponding to the above-mentioned data processing method. In order to enable those skilled in the art to better understand and implement the embodiments of the present invention, a detailed description will be given below through specific embodiments with reference to the accompanying drawings.

Referring to the schematic structural diagram of the data processing device shown in FIG. 23, in the embodiment of the present invention, as shown in FIG. 23, the data processing device 230 may include:

The instruction sending unit 231 is adapted to send a streaming instruction to each collection device in the collection array, wherein each collection device in the collection array is placed in a different position of the field collection area according to a preset multi-angle free viewing angle range, and Each acquisition device in the acquisition array is set to acquire the video data stream synchronously in real time from the corresponding angle;

The data stream receiving unit 232 is adapted to receive in real time the video data streams respectively transmitted by each acquisition device in the acquisition array based on the pull instruction;

The first synchronization judging unit 233 is adapted to determine whether the video data streams respectively transmitted by the acquisition devices in the acquisition array are frame-level synchronized, and the video data streams respectively transmitted by the acquisition devices in the acquisition array are not synchronized at the frame level. During frame-level synchronization, the instruction sending unit 231 is triggered again until the frame-level synchronization between the video data streams respectively transmitted by each acquisition device in the acquisition array.

Wherein, the data processing device can be set according to actual scenarios. For example, when there is free space on site, the data processing device can be placed in a non-collection area on site and serve as a site server; when there is no free space on site, the data processing device can be placed in the cloud and serve as a cloud server.

Using the above-mentioned data processing equipment, by determining whether the video data streams transmitted by each acquisition device in the acquisition array are frame-level synchronized, it is possible to ensure synchronous transmission of multiple channels of data, thereby avoiding the problem of missing frames and multi-frame transmission, and improving data processing Speed to meet the needs of low-latency playback of multi-angle free-view video.

In an embodiment of the present invention, as shown in FIG. 23, the data processing device 230 may further include:

The reference video stream determining unit 234 is adapted to determine one of the video data streams of each acquisition device in the acquisition array received in real time as a reference data stream;

The video frame selection unit 235 is adapted to determine the video frame to be intercepted in the reference data stream based on the received video frame interception instruction, and select the remaining video frames that are synchronized with the video frame to be intercepted in the reference data stream. The video frame in the video data stream is used as the video frame to be intercepted in the remaining video data streams;

The video frame interception unit 236 is adapted to intercept the video frames to be intercepted in each video data stream;

The uploading unit 237 is adapted to synchronously upload the captured video frames to the designated target terminal.

The data processing device 230 may establish a connection with a target terminal through a port or an IP address in advance, and may also synchronously upload the captured video frame to the port or IP address specified by the video frame capture instruction.

By adopting the above-mentioned solution, it is possible to achieve frame-cutting synchronization, improve frame-cutting efficiency, further improve the display effect of the generated multi-angle free-view video, and enhance user experience. In addition, it reduces the coupling between the process of selecting and intercepting video frames and the process of generating multi-angle free-view videos, enhancing the independence of each process, and facilitating post-maintenance, and uploading the intercepted video frames to the specified target synchronously At the end, it can save network transmission resources and reduce the data processing load, and increase the speed of data processing to generate multi-angle free-view videos.

In an embodiment of the present invention, as shown in FIG. 23, the video frame selection unit 235 includes at least one of the following:

The first video frame selection module 2351 is adapted to select, according to the feature information of the object in the video frame to be intercepted in the reference data stream, the video frame consistent with the feature information of the object in the remaining video data streams, as the rest Video frames to be intercepted in each video data stream;

The second video frame selection module 2352 is adapted to select, according to the time stamp information of the video frames in the reference data stream, a video frame consistent with the time stamp information among the remaining video data streams, as the video frame of the remaining video data streams. The video frame to be captured.

The embodiment of the present invention also provides a data synchronization system corresponding to the above-mentioned data processing method. The above-mentioned data processing device is used to realize real-time reception of multiple video data streams. In order to enable those skilled in the art to better understand and implement the embodiments of the present invention, the following With reference to the drawings, detailed description will be given through specific embodiments.

Referring to the schematic structural diagram of the data synchronization system shown in FIG. 24, in the embodiment of the present invention, the data synchronization system 240 may include: a collection array 241 placed in the field collection area and a collection array 241 placed in a link connected to the collection array A data processing device 242. The collection array 241 includes a plurality of collection devices, and each collection device in the collection array 241 is located at different locations in the field collection area according to a preset multi-angle free viewing angle range, wherein:

Each collection device in the collection array 241 is adapted to collect the video data stream synchronously in real time from corresponding angles, and transmit the obtained video data stream to the data processing in real time based on the streaming instruction sent by the data processing device 242. Equipment 242;

The data processing device 242 is adapted to respectively send a streaming instruction to each acquisition device in the acquisition array 241, and receive in real time the video data stream respectively transmitted by each acquisition device in the acquisition array 241 based on the streaming instruction, And when the video data streams transmitted by the respective collection devices in the collection array 241 are not synchronized at the frame level, the current pull instructions are sent to the respective collection devices in the collection array 241 again until each of the collection arrays 241 is The video data stream transmitted by the capture device is synchronized at the frame level.

Using the data synchronization system in the embodiment of the present invention, by determining whether the video data streams respectively transmitted by each acquisition device in the acquisition array are frame-level synchronized, the synchronous transmission of multiple channels of data can be ensured, thereby avoiding missed frames and multi-frame transmissions The problem is to increase the data processing speed to meet the needs of low-latency playback of multi-angle free-view videos.

In a specific implementation, the data processing device 242 is further adapted to determine that one of the video data streams of each acquisition device in the acquisition array 241 received in real time is used as a reference data stream; and based on the received video frame interception Instruction to determine the video frame to be intercepted in the reference data stream, and select the video frames in the remaining video data streams that are synchronized with the video frame to be intercepted in the reference data stream, as the video frames of the remaining video data streams The video frame to be intercepted; the video frame to be intercepted in each video data stream is intercepted and the intercepted video frame is synchronously uploaded to the designated target terminal 243.

Wherein, the data processing device 240 may establish a connection with a target terminal through a port or an IP address in advance, and may also synchronously upload the captured video frame to the port or IP address specified by the video frame capture instruction.

In an embodiment of the present invention, the data synchronization system 240 may also include a cloud server, which is suitable for serving as the designated target 243.

In another embodiment of the present invention, as shown in FIG. 34, the data synchronization system 240 may further include a playback control device 341, which is suitable for serving as a designated target terminal 243.

In another embodiment of the present invention, as shown in FIG. 35, the data synchronization system 240 may further include an interactive terminal 351, which is suitable for serving as a designated target terminal 243.

In an embodiment of the present invention, at least one of the following methods may be adopted to ensure that each collection device in the collection array 241 is set to collect the video data stream synchronously in real time from a corresponding angle:

1. The collection devices in the collection array are connected through a synchronization line, wherein when at least one collection device acquires a collection start instruction, the collection device that has acquired the collection start instruction connects the collection device through the synchronization line. The acquisition start instruction is synchronized to other acquisition devices, so that each acquisition device in the acquisition array starts to synchronously acquire the video data stream from a corresponding angle in real time based on the acquisition start instruction;

In order to enable those skilled in the art to better understand and implement the embodiments of the present invention, the following describes the data synchronization system in detail through specific application scenarios. As shown in FIG. 25, the structure diagram of the data synchronization system in the application scenario is shown in which the data The synchronization system includes a collection array 251 composed of collection devices, a data processing device 252, and a server cluster 253 in the cloud.

At least one of the acquisition devices in the acquisition array 251 acquires the acquisition start instruction, and synchronizes the acquired acquisition start instruction to other acquisition devices through the synchronization line 254, so that each acquisition device in the acquisition array The acquisition device respectively starts to synchronously acquire the video data stream in real time from the corresponding angle based on the acquisition start instruction.

The data processing device 252 may send a streaming instruction to each collection device in the collection array 251 through a wireless local area network. Based on the streaming instruction sent by the data processing device 252, each collection device in the collection array 251 transmits the obtained video data stream to the data processing device 252 in real time through the switch 255.

The data processing device 252 determines whether the video data streams respectively transmitted by the respective collection devices in the collection array 251 are synchronized at the frame level, and the video data streams respectively transmitted by the respective collection devices in the collection array 251 are not synchronized with each other. When the frame-level synchronization is performed, a streaming instruction is sent to each acquisition device in the acquisition array 251, respectively, until the frame-level synchronization between the video data streams transmitted by each acquisition device in the acquisition array 251 is synchronized.

After the data processing device 252 determines that the video data streams transmitted by the acquisition devices in the acquisition array 251 are frame-level synchronized, determine one of the video data streams of each acquisition device in the acquisition array 251 received in real time. Stream as the reference data stream, and after the received video frame interception instruction, the video frame to be intercepted in the reference data stream is determined according to the video frame interception instruction, and then, the data processing device 252 selects the video frame to be intercepted. The video frames to be intercepted in the reference data stream are synchronized with the video frames in the remaining video data streams as the video frames to be intercepted in the remaining video data streams, and then the video frames to be intercepted in each video data stream are intercepted and combined The captured video frames are synchronously uploaded to the cloud.

The server cluster 253 in the cloud performs subsequent processing on the captured video frames to obtain a multi-angle free-view video for playback.

In a specific implementation, the cloud server cluster 253 may include: a first cloud server 2531, a second cloud server 2532, a third cloud server 2533, and a fourth cloud server 2534. Wherein, the first cloud server 2531 can be used for parameter calculation; the second cloud server 2532 can be used for depth calculation to generate a depth map; the third cloud server 2533 can be used for DIBR to a preset virtual viewpoint path Perform frame image reconstruction; the fourth cloud server 2534 can be used to generate multi-angle free-view videos.

It is understandable that the data processing device can be placed in a non-collection area on site or in the cloud according to actual scenarios, and the data synchronization system can use at least one of a cloud server, a playback control device, or an interactive terminal in practical applications. The transmitting end of the video frame interception instruction may also adopt other devices capable of transmitting the video frame interception instruction, which is not limited in the embodiment of the present invention.

It should be noted that the data processing system in the foregoing embodiment can all be applied to the data synchronization system in the embodiment of the present invention.

The embodiment of the present invention also provides a collection device corresponding to the above-mentioned data processing method. The collection device is adapted to synchronize the collection start instruction to other collection devices when acquiring the collection start instruction, and start from the corresponding perspective. The video data stream is synchronously collected in real time, and when a streaming instruction sent by the data processing device is received, the obtained video data stream is transmitted to the data processing device in real time. In order to enable those skilled in the art to better understand and implement the embodiments of the present invention, detailed descriptions will be given below through specific embodiments with reference to the accompanying drawings.

Referring to the schematic structural diagram of the collection device shown in FIG. 36, in the embodiment of the present invention, the collection device 360 includes: a photoelectric conversion camera component 361, a processor 362, an encoder 363, and a transmission component 365, in which:

Photoelectric conversion camera component 361, suitable for collecting images;

The processor 362 is adapted to synchronize the acquisition start instruction to other acquisition devices through the transmission component 365 when the acquisition start instruction is acquired, and start to perform real-time real-time processing of the images collected by the photoelectric conversion camera component 361. Process to obtain an image data sequence, and when a streaming instruction is obtained, transmit the obtained video data stream to the data processing device in real time through the transmission component 365;

The encoder 363 is adapted to encode the image data sequence to obtain a corresponding video data stream.

As an optional solution, as shown in FIG. 36, the collection device 360 may further include a recording component 364, which is adapted to collect sound signals and obtain audio data.

The collected image data sequence and audio data can be processed by the processor 362, and then the collected image data sequence and audio data can be encoded by the encoder 363 to obtain a corresponding video data stream. In addition, when the processor 362 obtains the acquisition start instruction, it can synchronize the acquisition start instruction to other acquisition devices through the transmission component 365; when receiving the streaming instruction, the transmission component 365 will transmit the acquired video data The stream is transmitted to the data processing device in real time.

In a specific implementation, the collection device can be placed at different locations in the field collection area according to the preset multi-angle free view range, and the collection device can be fixed at a certain point in the field collection area, or it can move within the field collection area. So as to form a collection array. Therefore, the collection device may be a fixed device or a mobile device, so that the video data stream can be flexibly collected from multiple angles.

As shown in Figure 37, it is a schematic diagram of the acquisition array in an application scenario in an embodiment of the present invention. The center of the stage is taken as the core point of view, the core point of view is the center of the circle, and the fan-shaped area where the core point of view is located on the same plane is used as a preset Multi-angle free viewing angle range. The collection devices 371-375 in the collection array are fan-shaped and placed in different positions of the field collection area according to the preset multi-angle free viewing angle range. The collection device 376 is a movable device, which can be moved to a designated location according to instructions for flexible collection. In addition, the acquisition device can be a handheld device to supplement the acquisition data when the acquisition device fails or in a small space. For example, the handheld device 377 located in the stage audience area in Figure 37 can be added to the acquisition array to provide stage audiences. The video data stream of the region.

As mentioned above, in order to generate multi-angle free view data, depth map calculation is required. However, the current depth map calculation takes a long time. How to reduce the time of depth map generation and increase the rate of depth map generation has become an urgent problem to be solved.

In view of the above-mentioned problems, the embodiment of the present invention provides a computing node cluster, and multiple computing nodes can simultaneously generate depth maps in parallel and batch-wise on the texture data synchronously collected by the same collection array. Specifically, the depth map calculation process can be divided into multiple steps such as obtaining a rough depth map through the first depth calculation, determining the unstable region in the rough depth map, and then calculating the second depth. In each step, the calculation Multiple computing nodes in the node cluster can perform the first depth calculation on the texture data collected by multiple collection devices in parallel to obtain a rough depth map, and verify the obtained rough depth map in parallel and perform the second depth calculation, thereby It can save the time of depth map calculation and increase the rate of depth map generation. Hereinafter, with reference to the accompanying drawings, a detailed description will be given through specific embodiments.

Referring to the flowchart of a method for generating a depth map shown in FIG. 26, in an embodiment of the present invention, multiple computing nodes in a computing node cluster are used to generate the depth map respectively. For the convenience of description, any of the computing node clusters is A computing node is called the first computing node. The method for generating the depth map of the computing node cluster is described in detail below through specific steps:

S261: Receive texture data, where the texture data is synchronously collected by multiple collection devices in the same collection array.

In a specific implementation, the multiple collection devices can be placed at different locations in the field collection area according to the preset multi-angle free view range, and the collection devices can be fixedly set at a certain point in the field collection area, or can be located in the field collection area. Move inside to form a collection array. Wherein, the multi-angle free viewing angle may refer to the spatial position and viewing angle of the virtual viewpoint that enables the scene to be freely switched. For example, the multi-angle free angle of view can be a 6-degree-of-freedom (6DoF) angle of view, and the collection devices used in the collection array can be general cameras, cameras, video recorders, handheld devices such as mobile phones, etc. For specific implementation, please refer to other embodiments of the present invention. , I won’t repeat it here.

The texture data is the pixel data of the two-dimensional image frame collected by the aforementioned acquisition device, which may be an image at one frame time, or may be the pixel data of a frame image corresponding to a video stream formed by continuous or non-continuous frame images.

S262: The first calculation node performs a first depth calculation according to the first texture data and the second texture data to obtain a first rough depth map.

Here, in order to make the description clearer and more concise, the texture data that meets the preset first mapping relationship with the first computing node in the texture data is called first texture data; It is assumed that the texture data collected by the collection device of the first spatial position relationship is called the second texture data.

In a specific implementation, the first mapping relationship may be obtained based on a preset first mapping relationship table or through random mapping. For example, the texture data processed by each computing node can be pre-allocated according to the number of computing nodes in the computing node cluster and the number of collection devices in the collection array corresponding to the texture data. A special distribution node may be set to distribute the computing tasks of each computing node in the computing node cluster, and the distribution node may obtain the first mapping relationship based on a preset first mapping relationship table or through random mapping. For example, if there are a total of 40 collection devices in the collection array, in order to achieve the highest concurrent processing efficiency, 40 computing nodes can be configured, and each collection device corresponds to a computing node. If there are only 20 computing nodes, and the processing capabilities of each computing node are the same or roughly equivalent, in order to achieve the highest concurrent processing efficiency and load balance requirements, you can set each computing node to correspond to two collection devices to collect data. The texture data. Specifically, the mapping relationship between the identification of the acquisition device corresponding to the texture data and the identification of each computing node can be set as the first mapping relationship, and the corresponding acquisition device in the acquisition array can be directly collected based on the first mapping relationship. The texture data is distributed to the corresponding computing node. In specific implementations, computing tasks can also be randomly assigned, and texture data collected by each collection device in the collection array can be randomly assigned to each computing node in the computing node cluster. For this reason, in order to improve processing efficiency, the collection array can be allocated in advance. All the collected texture data are copied on each computing node in the computing node cluster.

As an example, any server in the server cluster can perform the first depth calculation according to the first texture data and the second texture data.

Wherein, for the preset first spatial position relationship between the first texture data and the second texture data, for example, the second texture data may be the first texture data acquisition device that meets the preset first texture data. The texture data collected by the collection device with a distance relationship, or the texture data collected by the collection device that meets the preset first quantity relationship with the collection device of the first texture data, or the texture data collected by the collection device with the first texture data The data collection device meets the preset first distance relationship and the texture data collected by the collection device that meets the preset first quantity relationship.

Wherein, the first preset number can take any integer value from 1 to N-1, and N is the total number of collection devices in the collection array. In an embodiment of the present invention, the first preset number is set to 2, so that the highest possible image quality can be obtained with the least amount of calculation. For example, assuming that the calculation node 9 corresponds to the camera 9 in the preset first mapping relationship, the texture data of the camera 9 and the texture data of the

cameras

5, 6, 7, 10, 11, 12 adjacent to the camera 9 can be used , The rough depth map of the camera 9 is obtained by calculation.

It is understandable that in specific implementation, the second texture data may also be data collected by a collection device that meets other types of first spatial position relationships with the collection device of the first texture data, such as the first texture data collection device. The spatial position relationship may also satisfy a preset angle, satisfy a preset relative position, and so on.

S263: The first computing node synchronizes the first rough depth map to the remaining computing nodes in the computing node cluster to obtain a rough depth map set.

The rough depth map obtained after the rough calculation of the depth map needs to be cross-validated to determine the unstable region in each rough depth map, so as to perform a refined solution in the next step. Among them, for any rough depth map in the rough depth map set, cross-validation needs to be performed through rough depth maps corresponding to multiple acquisition devices around the acquisition device corresponding to the rough depth map. (Typically, the rough depth map to be verified and the rough depth map corresponding to all other acquisition devices are cross-validated together). Therefore, the rough depth map calculated by each computing node needs to be synchronized to the rest of the computing node cluster. After the computing nodes are synchronized in step S263, each computing node in the computing node cluster obtains the rough depth map calculated by the remaining computing nodes in the computing node cluster, and each server obtains the exact same rough depth map set.

S264: The first computing node uses a third rough depth map for verification for the second rough depth map in the rough depth map set, and obtains an unstable region in the second rough depth map.

Wherein, the second rough depth map and the first computing node may satisfy a preset second mapping relationship; the third rough depth map may be that a collection device corresponding to the second rough depth map satisfies a preset A rough depth map corresponding to the acquisition device of the second spatial position relationship.

The second mapping relationship may be obtained based on a preset second mapping relationship table or through random mapping. For example, the texture data processed by each computing node can be pre-allocated according to the number of computing nodes in the computing node cluster and the number of collection devices in the collection array corresponding to the texture data. In a specific implementation, a special distribution node may be set to distribute the computing tasks of each computing node in the computing node cluster, and the distribution node may obtain the second mapping relationship based on a preset second mapping relationship table or through random mapping. For a specific example of setting the second mapping relationship, refer to the foregoing implementation example of the first mapping relationship.

It can be understood that, in a specific implementation, the second mapping relationship may completely correspond to the first mapping relationship, or may not correspond to the first mapping relationship. For example, in the case that the number of cameras is equal to the number of computing nodes, the identification of the acquisition device corresponding to the data (including texture data, rough depth map) and the computing node that processes the data can be established according to the hardware identification to establish a one-to-one corresponding second mapping relationship .

It is understandable that the descriptions of the first rough depth map, the second rough depth map, and the third rough depth map are only for clear and concise description. In a specific implementation, the first rough depth map may be the same as or different from the second rough depth map; the acquisition device corresponding to the third rough depth map and the acquisition device corresponding to the second rough depth map It suffices to satisfy the preset second spatial position relationship.

Regarding the second spatial position relationship, as a specific example, the texture data corresponding to the third rough depth map may be collected by a collection device corresponding to the second rough depth map that satisfies a preset second distance relationship. The obtained texture data, or the texture data corresponding to the third texture depth map, may be texture data collected by the collection device corresponding to the second rough depth map that satisfies the preset second quantity relationship, or The texture data corresponding to the third rough depth map is texture data collected by a collection device corresponding to the second rough depth map that satisfies a preset second distance relationship and a second quantitative relationship.

Wherein, the second preset number can take any integer value from 1 to N-1, and N is the total number of collection devices in the collection array. In specific implementation, the second preset number may be equal to or different from the first preset number. In an embodiment of the present invention, the second preset number is taken as 2, so that the highest possible image quality can be obtained with the least amount of calculation.

In a specific implementation, the second spatial position relationship may also be another type of spatial position relationship, such as satisfying a preset angle, satisfying a preset relative position, and so on.

S265. The first computing node performs a second depth based on the unstable region in the second rough depth map, the texture data corresponding to the second rough depth map, and the texture data corresponding to the third rough depth map. Calculate to get the corresponding fine depth map.

It should be noted here that the difference between the second depth calculation and the first depth calculation is that the depth map candidate value in the second rough depth map selected by the second depth calculation does not include the depth value of the unstable region. In this way, unstable regions in the generated depth map can be eliminated, so that the generated depth map is more accurate, and the quality of the generated multi-angle free-view image can be improved.

Take an example of an application scenario:

The server S may perform the first round of depth calculation (first depth calculation) based on the allocated texture data of the camera M and the texture data of the camera that meets the preset first spatial position relationship with the camera M to obtain a rough depth map .

After the cross-validation in step S264, the depth map can be continuously refined and solved on the same server. Specifically, the server S can cross-validate the assigned rough depth map corresponding to the camera M with the results of all other rough depth maps, and can obtain the unstable region in the rough depth map corresponding to the camera M. After that, the server S You can perform another round of depth map calculation (second depth calculation) on the unstable area in the rough depth map corresponding to the assigned camera M, the texture data collected by camera M, and the texture information of the N cameras around camera M, namely A refined depth map corresponding to the first texture data (texture data collected by the camera M) can be obtained.

Here, the rough depth map corresponding to the camera M is a rough depth map calculated based on the texture data collected by the camera M and the texture data collected by a collection device that satisfies the preset first spatial position relationship with the camera M.

S266: Use the fine depth atlas of the fine depth maps obtained by each computing node as a final generated depth map.

With the above embodiment, multiple computing nodes can simultaneously generate the depth map in parallel and batch processing on the texture data synchronously collected by the same acquisition array, thereby greatly improving the efficiency of generating the depth map.

In addition, the above solution is adopted to eliminate unstable regions in the generated depth map through secondary depth calculation, so the obtained fine depth map is more accurate, and the quality of the generated multi-angle free-view image can be improved.

In a specific implementation, according to the size of the texture data to be processed and the demand for the generation speed of the depth map, an appropriate configuration of the computing nodes in the cluster of computing nodes and the number of computing nodes can be selected. For example, the computing node cluster may be a server cluster composed of multiple servers, and multiple servers in the server cluster may be deployed in a centralized manner or in a distributed deployment. In some embodiments of the present invention, some or all of the computing node devices in the computing node cluster may be used as local servers, or may be used as edge node devices, or as cloud computing devices.

For another example, the computing node cluster may also be a computing device formed by multiple CPUs or GPUs. The embodiment of the present invention also provides a computing node, which is suitable for forming a computing node cluster with at least another computing node to generate a depth map. Referring to the schematic structural diagram of the computing node shown in FIG. 27, the computing node 270 may include:

The input unit 271 is adapted to receive texture data, which originates from the simultaneous collection of multiple collection devices in the same collection array;

The first depth calculation unit 272 is adapted to perform a first depth calculation according to the first texture data and the second texture data to obtain a first rough depth map, wherein: the first texture data and the calculation node meet a preset A first mapping relationship; the second texture data is texture data collected by a collection device that meets a preset first spatial position relationship with the collection device of the first texture data;

The synchronization unit 273 is adapted to synchronize the first rough depth map to the remaining computing nodes in the computing node cluster to obtain a rough depth atlas;

The verification unit 274, for the second rough depth map in the rough depth map set, is adapted to use a third rough depth map for verification to obtain an unstable region in the second rough depth map, wherein: the second rough depth map The depth map and the computing node meet the preset second mapping relationship; the third rough depth map is corresponding to the acquisition device corresponding to the second rough depth map that meets the preset second spatial position relationship Rough depth map;

The second depth calculation unit 275 is adapted to perform a second operation based on the unstable region in the second rough depth map, the texture data corresponding to the second rough depth map, and the texture data corresponding to the third rough depth map. Depth calculation to obtain a corresponding fine depth map, where: the depth map candidate value in the second rough depth map selected by the second depth calculation does not include the depth value of the unstable region;

The output unit 276 is adapted to output the fine depth map, so that the computing node cluster obtains the fine depth atlas as the final generated depth map.

Using the above calculation node, the depth map calculation process can include multiple steps such as obtaining a rough depth map through the first depth calculation, determining the unstable region in the rough depth map and subsequent second depth calculation, and performing the depth map calculation through the above steps , Which facilitates the separate calculation of multiple computing nodes, thereby improving the efficiency of generating the depth map.

The embodiment of the present invention also provides a computing node cluster, the computing node cluster may include multiple computing nodes, and multiple computing nodes in the computing node cluster can simultaneously synchronize the texture data collected by the same collection array, Depth map generation is performed in batch mode. For the convenience of description, any computing node in the computing node cluster is referred to as the first computing node.

In some embodiments of the present invention, the first computing node is adapted to perform a first depth calculation according to the first texture data and the second texture data in the received texture data to obtain a first rough depth map; The first rough depth map is synchronized to the rest of the computing nodes in the computing node cluster to obtain a rough depth atlas; for the second rough depth map in the rough depth atlas, the third rough depth map is used for verification, and all The unstable region in the second rough depth map; and according to the unstable region in the second rough depth map, the texture data corresponding to the second rough depth map, and the texture data corresponding to the third rough depth map , Perform a second depth calculation to obtain a corresponding fine depth map, and output the obtained fine depth map so that the computing node cluster uses the obtained fine depth atlas as the final generated depth map;

Referring to the schematic diagram of depth map processing performed by the server cluster shown in FIG. 28, the texture data collected by the N cameras in the camera array are respectively input to the N servers in the server cluster, and the first depth calculation is performed respectively to obtain the rough depth map 1~ N. After that, each server copies the rough depth map calculated by itself to other servers in the server cluster and realizes time synchronization. After that, each server verifies the rough depth map assigned by itself and performs the second depth calculation. The depth map after fine calculation is obtained as the depth map generated by the server cluster. From the above calculation process, it can be seen that each server in the server cluster can perform the first depth calculation on the texture data collected by multiple cameras in parallel, and perform verification and second depth calculation on each rough depth map in the rough depth atlas. The entire depth map generation process is performed by multiple servers in parallel, which can greatly save the time of depth map calculation and improve the efficiency of depth map generation.

For the specific implementation and beneficial effects of computing nodes and computing node clusters in the embodiments of the present invention, reference may be made to the depth map generation method in the foregoing embodiments of the present invention, which will not be repeated here.

The server cluster can then store the generated depth map or output it to the terminal device according to the request, so as to further generate and display the virtual viewpoint image, which will not be repeated here.

The embodiment of the present invention also provides a computer-readable storage medium on which computer instructions are stored. When the computer instructions are run, the steps of the depth map generation method described in any of the foregoing embodiments can be executed. For details, please refer to the foregoing depth map. The steps of the generation method will not be repeated here.

In addition, currently known methods for generating virtual viewpoint images based on depth-image-based rendering (DIBR) cannot meet the requirements of multi-angle and free-view applications in playback.

The inventor found through research that the current DIBR virtual view point image generation method is not highly concurrency and is usually processed by the CPU. However, for each virtual view point image, since the generation method involves many steps, each step is also more complicated. Therefore, it is more difficult to achieve through parallel processing methods.

In order to solve the above problems, the embodiments of the present invention provide a method for generating virtual viewpoint images through parallel processing, which can greatly accelerate the time-efficiency performance of the generation of virtual viewpoint images with multi-angle free-view angles, thereby meeting the requirements of low-time multi-angle free-view videos. The need for delayed playback and real-time interaction improves user experience.

In order to make the purpose and feature set advantages of the embodiments of the present invention more comprehensible to those skilled in the art, specific embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

Referring to the flowchart of the method for generating a virtual view point image shown in FIG. 29, in specific implementation, the virtual view point image can be generated through the following steps:

S291. Acquire a multi-angle free view image combination, parameter data of the image combination, and preset virtual viewpoint path data, where the image combination includes multiple sets of texture maps and depth maps that are synchronized at multiple angles and have corresponding relationships. .

Wherein, the multi-angle free viewing angle may refer to the spatial position and viewing angle of the virtual viewpoint that enables the scene to be switched freely. The multi-angle free viewing angle range can be determined according to the needs of the application scene.

In a specific implementation, a collection array composed of multiple collection devices can be arranged on site. Each collection device in the collection array can be placed in a different position of the on-site collection area according to a preset multi-angle free viewing angle range. The live images can be collected synchronously, and the texture map with synchronized multiple angles can be obtained. For example, multiple cameras, video cameras, etc. can be used to perform synchronized image acquisition of multiple angles of a certain scene.

The images in the multi-angle free-view image combination may be completely free-view images. In a specific implementation, it can be a 6-degree-of-freedom (DoF) viewing angle, that is, the spatial position and viewing angle of the viewpoint can be freely switched. As mentioned earlier, the spatial position of the viewpoint can be expressed as coordinates (x, y, z), and the angle of view can be expressed as three rotation directions

So it can be called 6DoF.

In the process of generating the virtual view point image, the image combination of multiple angles and free viewing angles and the parameter data of the image combination can be acquired first.

In a specific implementation, the texture map and the depth map in the image combination correspond one-to-one. Wherein, the texture map can adopt any type of two-dimensional image format, for example, it can be any of BMP, PNG, JPEG, webp format, and so on. The depth map can represent the distance of each point in the scene relative to the shooting device, that is, each pixel value in the depth map represents the distance between a certain point in the scene and the shooting device.

The texture map in the image combination is a plurality of synchronized two-dimensional images. The depth data of each two-dimensional image may be determined based on the plurality of two-dimensional images.

Wherein, the depth data may include depth values corresponding to pixels of the two-dimensional image. The distance from the collection device to each point in the area to be viewed can be used as the aforementioned depth value, and the depth value can directly reflect the geometric shape of the visible surface in the area to be viewed. For example, the depth value may be the distance from each point in the area to be viewed along the optical axis of the camera to the optical center, and the origin of the camera coordinate system may be used as the optical center. Those skilled in the art can understand that the distance may be a relative value, and the same reference may be used for multiple images.

The depth data may include depth values corresponding to the pixels of the two-dimensional image one-to-one, or may be a partial value selected from a set of depth values corresponding to the pixels of the two-dimensional image one-to-one. Those skilled in the art can understand that the depth value set can be stored in the form of a depth map. In a specific implementation, the depth data can be the data obtained after down-sampling the original depth map, and the two-dimensional image (texture map). The one-to-one corresponding depth value set of pixels is the original depth map in the form of the image stored according to the pixel point arrangement of the two-dimensional image (texture map).

In the embodiment of the present invention, the image combination of multiple angles and free viewing angles and the parameter data of the image combination can be obtained through the following steps, which will be described below through specific application scenarios.

As a specific embodiment of the present invention, the following steps may be included: the first step is the acquisition and calculation of the depth map, including three main steps, namely: Multi-camera Video Capturing, and camera internal and external parameter calculation (Camera Parameter Estimation), and depth map calculation (Depth Map Calculation). For multi-camera capture, the video captured by each camera is required to be frame-level aligned.

Through multi-camera video capture, texture images can be obtained, that is, multiple images synchronized; through the calculation of internal and external parameters of the camera, camera parameters can be obtained, that is, the parameter data of the image combination, including internal parameter data And external parameter data; through depth map calculation, depth map (Depth Map) can be obtained.

In the image combination, multiple sets of texture maps and depth maps with corresponding relationships can be spliced together to form a frame of spliced image. The stitched image can have a variety of stitching structures. Each frame of stitched image can be used as an image combination. Multiple sets of texture maps and depth maps in the image combination can be spliced and combined according to a preset relationship. Specifically, the texture map and depth map of the image combination can be divided into a texture map area and a depth map area according to the position relationship. The texture map area stores the pixel values of each texture map, and the depth map area stores each texture map according to a preset position relationship. The depth value corresponding to the texture map. The texture map area and the depth map area can be continuous or spaced apart. In the embodiment of the present invention, there is no restriction on the positional relationship between the texture map and the depth map in the image combination.

In specific implementation, the parameter data of each image in the image combination can be obtained from the attribute information of the image. Wherein, the parameter data may include external parameter data, and may also include internal parameter data. The external parameter data is used to describe the spatial coordinates and posture of the shooting device, and the internal parameter data is used to express the property information of the shooting device such as the optical center and focal length of the shooting device. The internal parameter data may also include distortion parameter data. The distortion parameter data includes radial distortion parameter data and tangential distortion parameter data. Radial distortion occurs in the process of converting the coordinate system of the shooting device to the physical coordinate system of the image. The tangential distortion occurs in the manufacturing process of the shooting equipment, which is due to the fact that the plane of the photosensitive element is not parallel to the lens. Based on the external parameter data, information such as the shooting position and shooting angle of the image can be determined. In the generation of the virtual viewpoint image, the combination of internal parameter data including distortion parameter data can make the determined spatial mapping relationship more accurate.

In specific implementation, the virtual view point path can be preset. For example, for a sports game, such as a basketball game or a football game, an arc-shaped path can be planned in advance. For example, whenever a wonderful shot appears, a corresponding virtual viewpoint image is generated according to the arc-shaped path.

In the specific application process, it can be based on a specific location or perspective in the scene (such as the basket, the sidelines, the referee's perspective, the coach's perspective, etc.), or based on specific objects (such as the players on the court, the host on the scene, the audience, etc.) As well as actors in film and television images, etc.) set the virtual viewpoint path.

The path data corresponding to the virtual viewpoint path may include position data of a series of virtual viewpoints in the path.

S292: According to the preset virtual viewpoint path data and the parameter data of the image combination, a texture map and a depth map of a corresponding group of each virtual viewpoint in the virtual viewpoint path are selected from the image combination.

In a specific implementation, according to the position data of each virtual viewpoint in the virtual viewpoint path data and the parameter data of the image combination, it is possible to select from the image combination and each virtual viewpoint position to satisfy the preset positional relationship and/or quantity The texture map and depth map of the corresponding group of the relationship. For example, for a virtual viewpoint location area with a higher camera density, only the texture maps and corresponding depth maps taken by the two cameras closest to the virtual viewpoint may be selected, while for a virtual viewpoint location area with a lower camera density, you can select only the texture map and the corresponding depth map taken by the two cameras closest to the virtual viewpoint. Select the texture map and the corresponding depth map taken by three or four cameras closest to the virtual viewpoint.

In an embodiment of the present invention, texture maps and depth maps corresponding to 2 to N acquisition devices closest to each virtual viewpoint position in the virtual viewpoint path can be selected respectively, where N is the number of all acquisition devices in the acquisition array. For example, the texture map and the depth map corresponding to the two acquisition devices closest to each virtual viewpoint position can be selected by default. In a specific implementation, the user can set the selected number of collection devices closest to the virtual view point position by himself, and the maximum number does not exceed the number of collection devices corresponding to the image combination.

In this way, there is no special requirement for the spatial position distribution of the collection devices in the collection array (for example, it can be linear distribution, arc-shaped array arrangement, or any irregular arrangement form). The virtual viewpoint position data and the parameter data corresponding to the image combination determine the actual distribution of the acquisition equipment, and then adopt an adaptive strategy to select the texture map and depth map of the corresponding group in the image combination, so as to reduce the amount of data calculation, In the case of ensuring the quality of the generated virtual view point image, it provides a higher degree of freedom and flexibility in selection, and also reduces the installation requirements for the acquisition equipment in the acquisition array, which is convenient to adapt to different site requirements and easy installation and operability.

In an embodiment of the present invention, according to the virtual viewpoint position data and the parameter data of the image combination, a preset number of corresponding sets of texture maps and depth maps that are closest to the virtual viewpoint position are selected from the image combination.

It can be understood that, in specific implementation, other preset rules may also be used to select a corresponding set of texture maps and depth maps from the image combination. For example, it can also be selected from the combination of images according to the processing capacity of the virtual view point image generation device, or according to the user's requirements for the generation speed, the definition of the generated image (such as normal definition, high definition, or super definition, etc.) The texture map and depth map of the corresponding group.

S293: Input the texture map and depth map of the corresponding group of each virtual view point into the graphics processor, and for each virtual view point in the virtual view point path, using the pixel as the processing unit, multiple threads respectively combine the selected images in the corresponding The pixel points in the texture map and the depth map of the group are combined and rendered to obtain an image corresponding to the virtual viewpoint.

Graphics Processing Unit (GPU), also known as display core, visual processor, display chip, etc., is a microprocessor that specializes in image and graphics-related operations. It can be configured in personal computers, workstations, and electronic games. Computers and some mobile terminals (such as tablet computers, smart phones, etc.) have image-related computing requirements in electronic equipment.

In order to enable those skilled in the art to better understand and implement the embodiments of the present invention, the following briefly introduces a GPU architecture adopted in some embodiments of the present invention. It should be noted that the GPU architecture is only a specific example, and does not constitute a limitation on the GPU to which the embodiment of the present invention is applicable.

In some embodiments of the present invention, the GPU may adopt a unified device architecture (Compute Unified Device Architecture, CUDA) parallel programming architecture to perform combined rendering on the pixels in the corresponding set of texture maps and depth maps in the selected image combination. CUDA is a new hardware and software architecture that is used to allocate and manage calculations on GPUs as data parallel computing devices without mapping them to a graphical application programming interface (API).

When programming with CUDA, the GPU can be regarded as a computing device capable of executing a large number of threads in parallel. It runs as the main CPU or the coprocessor of the host. In other words, the data-parallel and computationally intensive part of the application running on the host is delegated to the GPU.

More precisely, a part of an application that executes multiple times and is independent of different data can be isolated into a function that runs on the GPU device, just like many different threads. To this end, such functions can be compiled into the instruction set of the GPU device, and the generated program (called a kernel (Kernel)) can be downloaded to the GPU. The thread batches that execute the kernel are organized into thread blocks.

A thread block is a batch of threads that can effectively share data through some fast shared memory and synchronize its execution to coordinate memory access for coordination. In a specific implementation, a synchronization point can be specified in the kernel, and the threads in the thread block will be suspended until they all reach the synchronization point.

In a specific implementation, the maximum number of threads that can be included in a thread block is limited. However, blocks of the same dimension and size that execute the same kernel can be batched into a grid of threads (Grid of Thread Blocks), so that the total number of threads that can be started in a single kernel call is much larger.

It can be seen from the above that with the CUDA structure, a large number of threads can simultaneously perform data processing on the GPU at the same time, so the virtual view point image generation speed can be greatly improved.

In order to enable those skilled in the art to better understand and implement, the process of processing each step of the combined rendering in units of pixels will be described in detail below.

In specific implementation, referring to the flowchart of the method for GPU combined rendering shown in FIG. 30, step S293 may be implemented by the following steps:

S2931: Perform forward mapping on the depth maps of the corresponding group in parallel, and map them to the virtual viewpoint.

The forward mapping of the depth map is to map the depth map of the original camera (acquisition device) to the position of the virtual camera through the conversion of the coordinate space position, so as to obtain the depth map of the virtual camera position. Specifically, the forward mapping of the depth map is an operation of mapping each pixel of the depth map of the original camera (acquisition device) to a virtual viewpoint according to a preset coordinate mapping relationship.

In a specific implementation, the first kernel (Kernel) function can be run on the GPU, and the pixels in the depth map of the corresponding group can be forward mapped in parallel to the corresponding virtual view point position.

In the process of research and practice, the inventor found that in the process of forward mapping, there may be a problem of occlusion of the front background and a mapping gap effect, which affects the quality of the generated image. First of all, in view of the front background occlusion problem, in the embodiment of the present invention, for multiple depth values mapped to the same pixel of the virtual view point, atomic operations can be used to take the maximum value of the pixel value to obtain the first value of the corresponding virtual view point position. Depth map. Afterwards, in order to improve the impact of the mapping gap effect, a second depth map of the virtual viewpoint position may be created based on the first depth map of the virtual viewpoint position, and for each pixel in the second depth map in parallel For processing, the maximum value of the pixel points in the preset area around the corresponding pixel position in the first depth map is taken.

In the forward mapping process, since each pixel can be processed in parallel, the processing speed of the forward mapping can be greatly accelerated, and the time-efficiency performance of the forward mapping can be improved.

S2932: Perform post-processing in parallel on the depth map after forward mapping.

After the forward mapping is completed, the virtual viewpoint depth map can be post-processed. Specifically, the preset second core function can be run on the GPU to perform each pixel in the second depth map obtained by the forward mapping. Perform median filter processing in a preset area around the pixel position. Since the median filter processing can be performed on each pixel in the second depth map in parallel, the post-processing speed can be greatly accelerated, and the time-efficiency performance of the post-processing can be improved.

S2933: Perform reverse mapping on the texture maps of the corresponding group in parallel.

This step is to calculate the coordinates of the virtual viewpoint position in the original camera texture map according to the value of the depth map, and calculate the corresponding value through sub-pixel interpolation calculation. In the GPU, the sub-pixel value can be directly interpolated according to bilinearity. Therefore, in this step, it is only necessary to directly obtain the value in the original camera texture according to the coordinate calculated for each pixel. In a specific implementation, the preset third core function can be run on the GPU, and the pixels in the selected corresponding group of texture maps can be interpolated in parallel to generate the corresponding virtual texture map.

By running the third core function on the GPU, the pixels in the texture map of the selected corresponding group are interpolated in parallel to generate the corresponding virtual texture map, which can greatly accelerate the processing speed of reverse mapping and improve the time efficiency of reverse mapping performance.

S2934, fusing the pixels in each virtual texture map generated after reverse mapping in parallel.

In a specific implementation, the fourth core function can be run on the GPU to perform weighted fusion of pixels at the same position in each virtual texture map generated after reverse mapping in parallel.

Run the fourth core function on the GPU to perform weighted fusion in parallel on the pixels at the same position in each virtual texture map generated after reverse mapping, which can greatly accelerate the fusion speed of the virtual texture map and improve the image fusion Aging performance.

A specific example is used to explain in detail below.

In step S2931, for the forward mapping of the depth map, first, the projection mapping relationship of each pixel can be calculated through the first Kernel function of the GPU.

Assuming a certain pixel (u, v) in the image of a real camera, first change the image coordinates (u, v) to the coordinates [X, Y, Z] ^T in the camera coordinate system through the perspective projection model of the corresponding camera. It is understandable that there are different conversion methods for the perspective projection models of different cameras.

For example, for a perspective projection model:

Among them, [u,v,1] ^T is the homogeneous coordinates of the pixel (u,v), [X,YZ] ^T is (u,v) corresponding to the coordinates of the real object in the camera coordinate system, f _x , f _y These are the focal lengths in the x and y directions, and c _x and c _y are the optical center coordinates in the x and y directions, respectively.

Therefore, for a certain pixel (u, v) in the image, the depth value Z of the known pixel and the physical parameters of the corresponding camera lens (f _x , f _y , c _x , c _y can be obtained from the parameter data of the aforementioned image combination ^{Obtained), the coordinates [X, YZ] T} of the corresponding point in the camera coordinate system can be obtained through the above formula (1).

After the image coordinate system is converted to the camera coordinate system, the coordinates of the object in the current camera coordinate system can be transformed to the coordinate system of the camera where the virtual viewpoint is located according to the coordinate transformation in the three-dimensional space. Specifically, the following transformation formula can be used:

Among them, R ₁₂ is a 3x3 rotation matrix, and T ₁₂ is a translation vector.

Assuming that the transformed three-dimensional coordinates are [X ₁ , Y ₁ , Z ₁ ] ^T , through the previous description from the image coordinate system to the camera coordinate system, and the inverse transformation is applied, the transformed virtual camera three-dimensional coordinates and virtual camera can be obtained Correspondence position of image coordinates. Thus, the projection relationship of the points from the real viewpoint image to the virtual viewpoint image is established. By transforming each pixel in the real viewpoint and rounding the coordinate points, the projection depth map in the virtual viewpoint image can be obtained.

After the point-to-point mapping relationship between the original camera depth map and the virtual camera depth map is established, since in the depth map projection process, multiple locations in the original camera depth map may be mapped to the same position in the virtual camera depth map. This leads to the existence of a front background occlusion relationship in the process of forward mapping of the depth map. To solve this problem, in the embodiment of the present invention, an atomic operation may be used, and the smallest depth map is taken as the final result of the mapping position. As shown in formula (3):

Depth(u,v)=min[Depth _1-N (u,v)] (3)

It should be noted that the value with the smallest depth value is also the value with the largest pixel value of the depth map. Therefore, by taking the value with the largest pixel value on the depth map obtained by mapping, the first depth map corresponding to the virtual view point position can be obtained.

In a specific implementation, the operation of taking the maximum or minimum value of multiple point mappings can be provided in a CUDA parallel environment, which can be specifically performed by calling the atomic operation function atomicMin or atomicMax possessed by CUDA.

In the above process of obtaining the first depth map, a gap effect may occur, that is, some pixels may not be covered due to the problem of mapping accuracy. To solve this problem, the embodiment of the present invention may perform gap concealment processing on the obtained first depth map. In an embodiment of the present invention, a 3*3 gap concealment process is performed on the first depth map. The specific cover-up process is as follows:

First create a second depth map of the virtual viewpoint position, and then, for each pixel D(x,y) in the second depth map, take the surrounding 3* in the first depth map of the virtual viewpoint position There are pixels D_old(x,y) in the 3 range, and the maximum value of the pixels in the 3*3 range around the first depth map is taken, which can be realized by the following kernel function operation:

D(x,y)=Max[D_old(X,y)] (4)

It can be understood that the size range of the surrounding area during the gap concealment process can also take other values, such as 5*5. In order to obtain a better processing effect, it can be set according to experience.

For step S2932, in specific implementation, a 3*3 or 5*5 median filter may be performed on the second depth map of the virtual view point position. For example, for a 3*3 median filter, the second core mapping function of the GPU may operate according to the following formula:

In step S2933, the third core function running on the GPU calculates the coordinates of the virtual viewpoint position in the original camera texture map according to the value of the depth map, and the third core function can be implemented by performing the reverse process of step S2391.

In step S2934, for the pixel f(x, y) at the virtual viewpoint position (x, y), the pixel values of the corresponding positions of the texture map obtained by all the original camera mapping can be performed according to the confidence level conf(x, y) Weighted. The fourth core function can be calculated using the following formula:

f(x,y)=∑conf(x,y)*f(x,y) (6)

Through the above steps S2931 to S2934, a virtual viewpoint image can be obtained. In specific implementation, the virtual texture map obtained after weighted fusion can also be further processed and optimized. For example, holes can be filled in parallel for each pixel in the texture map after the weighted fusion to obtain the image corresponding to the virtual viewpoint.

For the hole filling of the virtual texture map, in specific implementation, for each pixel, a separate windowing method can be used to perform parallel operations. For example, for each hole pixel, a window of size N*M can be opened, and then the value of the hole pixel is weighted according to the value of the non-hole pixel in the window. Through the above method, the generation of the virtual view point image can be completely performed in parallel on the GPU, which can greatly accelerate the generation process.

As shown in the schematic diagram of the hole filling method as shown in FIG. 31, for the generated virtual viewpoint view G, there is a hole area F, and the pixels f1 and f2 in the hole area F are respectively opened with rectangular windows a and b. Afterwards, for the pixel f1, all pixels are obtained from the existing non-hole pixels in the rectangular window (some pixels may also be obtained by down-sampling), and the value of the pixel f1 in the hole area F is obtained by weighting according to the distance (or average weighting). Similarly, for the pixel f2, the same operation can be used to obtain the value of the pixel f2. In a specific implementation, the fifth core function can be run on the GPU to parallelize the processing and speed up the time for filling holes.

The fifth core function can be calculated using the following formula:

P(x,y)=Average[Window(x,y)] (7)

Among them, P(x,y) is the value of a certain point in the hole, Window(x,y) is the value (or down-sampling value) of all existing pixels in the hole area, and Average is the average (or weighted average) of these pixels value.

In the embodiment of the present invention, in addition to generating virtual viewpoint images of each virtual viewpoint position in parallel in units of pixels, in order to further accelerate the generation efficiency of virtual viewpoint path images, the corresponding groups of each virtual viewpoint in the virtual viewpoint path may be grouped. The texture map and depth map of are input to multiple GPUs, and multiple virtual viewpoint images are generated in parallel.

In a specific implementation, in order to further improve the processing efficiency, each of the above-mentioned steps may be executed by different block grids.

Referring to the schematic structural diagram of the virtual viewpoint image generation system shown in FIG. 32, in the embodiment of the present invention, the virtual viewpoint image generation system 320 may include a CPU321 and a GPU322, among which:

The CPU 321 is adapted to acquire image combinations of multiple angles and free viewing angles, parameter data of the image combinations, and preset virtual viewpoint path data, where the image combination includes multiple sets of texture maps that are synchronized at multiple angles and have corresponding relationships. And a depth map; according to the preset virtual view path data and the parameter data of the image combination, select from the image combination the texture map and the depth map of the corresponding group of each virtual view point in the virtual view path;

The GPU322 is adapted to call the corresponding core function for each virtual view point in the virtual view point path, and perform the combined rendering in parallel on the texture map and the pixel point in the depth map of the corresponding group in the selected image combination, to obtain the corresponding virtual view point image.

Specifically, the GPU322 is adapted to perform forward mapping of the depth maps of the corresponding group in parallel to the virtual viewpoint; perform post-processing on the depth maps of the forward mapping in parallel; and parallelize the texture maps of the corresponding group. Reverse mapping is performed on the ground; the pixels in each virtual texture map generated after reverse mapping are merged in parallel.

Wherein, the GPU 322 may use steps S2931 to S2934 in the aforementioned virtual view point image generation method, and the hole filling step to generate virtual view point images of each virtual view point. For details, please refer to the introduction of the foregoing embodiment and will not be repeated here.

In a specific implementation, there may be one GPU or multiple GPUs, as shown in FIG. 32.

In specific applications, the GPU can be an independent GPU chip, or a GPU core in a GPU chip, or a GPU server, or it can be a GPU chip packaged by multiple GPU chips or multiple GPU cores. , Or a GPU cluster composed of multiple GPU servers.

Correspondingly, the texture map and depth map of the corresponding group of each virtual viewpoint in the virtual viewpoint path can be input into multiple GPU chips, multiple GPU cores, or multiple GPU servers, respectively, to generate multiple virtual viewpoints in parallel image. For example, the virtual viewpoint path data corresponding to a certain virtual viewpoint path contains a total of 20 virtual viewpoint position coordinates, and the data corresponding to the 20 virtual viewpoint position coordinates can be input into multiple GPU chips in parallel, for example, there are 10 GPU chips in total. Then the data corresponding to the 20 virtual viewpoint position coordinates can be processed in parallel in two batches, and each GPU chip can generate virtual viewpoint images corresponding to the virtual viewpoint positions in parallel in units of pixels, thus greatly speeding up the processing of virtual viewpoint images. The generation speed improves the time-efficiency performance of virtual viewpoint image generation.

An embodiment of the present invention also provides an electronic device. With reference to the schematic structural diagram of the electronic device shown in FIG. 33, the electronic device 330 may include a memory 331, a CPU 332, and a GPU333. The computer instructions running on CPU332 and GPU333 are suitable for executing the steps of the method for generating virtual viewpoint images according to any of the foregoing embodiments of the present invention when the CPU 332 and GPU333 cooperatively run the computer instructions. For details, please refer to the foregoing embodiment The detailed introduction will not be repeated here.

In specific implementation, the electronic device may be one server or a server cluster composed of multiple servers.

Each of the above embodiments can be applied to a live broadcast scenario, and two or more of the embodiments can be used in combination as needed in the application process. Those skilled in the art can understand that the solutions in the above embodiments are not limited to live broadcast scenarios. The solutions in the embodiments of the present invention for video or image collection, data processing of video data streams, and server image generation can also be applied to The playback requirements of non-live broadcast scenes, such as recording, broadcasting, and other scenes with low latency requirements.

For the specific implementation mode, working principle, specific role and effect of each device or system in the embodiment of the present invention, please refer to the specific introduction in the corresponding method embodiment.

The embodiment of the present invention also provides a computer-readable storage medium on which computer instructions are stored, and when the computer instructions are run, the steps of the method in any of the foregoing embodiments of the present invention can be executed.

Wherein, the computer-readable storage medium may be various suitable readable storage media such as an optical disk, a mechanical hard disk, and a solid-state hard disk. For the method for executing instructions stored on the computer-readable storage medium, reference may be made to the embodiments of the foregoing methods for details, and details are not described herein again.

An embodiment of the present invention also provides a server, including a memory and a processor, the memory stores computer instructions that can run on the processor, and the processor can execute the above-mentioned instructions of the present invention when the processor runs the computer instructions. The steps of the method described in any embodiment. For the specific implementation of the method executed when the computer instruction is running, reference may be made to the steps of the method in the foregoing embodiment, and details are not described herein again.

Although the present invention is disclosed as above, the present invention is not limited to this. Any person skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention should be subject to the scope defined by the claims.

Claims

A method for generating a depth map, characterized in that multiple computing nodes in a computing node cluster are used to generate the depth map respectively, and the generating method includes:

Receiving texture data, where the texture data is synchronously collected by multiple collection devices in the same collection array;

The first computing node in the computing node cluster performs a first depth calculation according to the first texture data and the second texture data to obtain a first rough depth map, wherein: the first computing node is any of the computing node clusters A computing node, where the first texture data and the first computing node meet a preset first mapping relationship; the second texture data is a first space that meets a preset first space with the collection device of the first texture data Texture data collected by location-based collection equipment;

The first computing node synchronizes the first rough depth map to the remaining computing nodes in the computing node cluster to obtain a rough depth atlas;

For the second rough depth map in the rough depth map set, the first computing node uses a third rough depth map for verification to obtain an unstable region in the second rough depth map, wherein: the second rough depth map The depth map and the first computing node meet a preset second mapping relationship; the third rough depth map is a collection device whose acquisition device corresponding to the second rough depth map meets a preset second spatial position relationship Corresponding rough depth map;

The first calculation node performs a second depth calculation according to the unstable region in the second rough depth map, the texture data corresponding to the second rough depth map, and the texture data corresponding to the third rough depth map, Obtain a corresponding fine depth map, where: the depth map candidate value in the second rough depth map selected by the second depth calculation does not include the depth value of the unstable region;

The fine depth atlas of the fine depth map obtained by each computing node is used as the final generated depth map.
The method for generating a depth map according to claim 1, wherein the second texture data is a collection that meets a preset first distance relationship and/or a first quantitative relationship with a collection device of the first texture data Texture data collected by the equipment;

The texture data corresponding to the third rough depth map is texture data collected by a collection device corresponding to the second rough depth map that satisfies a preset second distance relationship and/or a second quantitative relationship.
The depth map generating method according to claim 2, wherein the second texture data is texture data collected by a first preset number of collection devices that are closest to the location of the collection device of the first texture data ；

The texture data corresponding to the third rough depth map is texture data collected by a second preset number of acquisition devices that are closest to the position of the acquisition device corresponding to the second rough depth map.
The method for generating a depth map according to claim 3, wherein the first preset number of collection devices and the second preset number of collection devices are both 1 to N-1 collection devices, and N is the total amount of collection equipment in the collection array.
The method for generating a depth map according to claim 3, wherein the first preset number is equal to the second preset number.
The method for generating a depth map according to claim 1, wherein the first mapping relationship between the first computing node and the first texture data is obtained in the following manner:

The allocation node obtains the first mapping relationship based on a preset first mapping relationship table or through random mapping;

The second mapping relationship between the first computing node and the second rough depth map is obtained in the following manner:

The allocation node obtains the second mapping relationship based on a preset second mapping relationship table or through random mapping.
A computing node, characterized in that it is suitable for forming a computing node cluster with at least another computing node to generate a depth map, the computing node comprising:

The input unit is adapted to receive texture data, which originates from synchronous collection of multiple collection devices in the same collection array;

The first depth calculation unit is adapted to perform a first depth calculation according to the first texture data and the second texture data to obtain a first rough depth map, wherein: the first texture data and the calculation node meet the preset first depth map A mapping relationship; the second texture data is texture data collected by a collection device that meets a preset first spatial position relationship with the collection device of the first texture data;

A synchronization unit, adapted to synchronize the first rough depth map to the remaining computing nodes in the computing node cluster to obtain a rough depth atlas;

The verification unit, for the second rough depth map in the rough depth map set, is adapted to use a third rough depth map for verification to obtain an unstable region in the second rough depth map, wherein: the second rough depth map The map and the computing node meet the preset second mapping relationship; the third rough depth map is a rough map corresponding to the acquisition device corresponding to the second rough depth map that satisfies the second preset spatial position relationship. Depth map

The second depth calculation unit is adapted to perform the second depth based on the unstable region in the second rough depth map, the texture data corresponding to the second rough depth map, and the texture data corresponding to the third rough depth map. Calculation to obtain a corresponding fine depth map, where: the depth map candidate value in the second rough depth map selected by the second depth calculation does not include the depth value of the unstable region;

The output unit is adapted to output the fine depth map, so that the computing node cluster obtains the fine depth map set as the final generated depth map.
A computing node, comprising a memory and a processor, the memory stores computer instructions that can run on the processor, wherein the processor executes any of claims 1 to 6 when the computer instructions are executed. One of the steps of the depth map generation method.
A computing node cluster, characterized by comprising a plurality of computing nodes, the plurality of computing nodes including a first computing node, and the first computing node is any computing node in the computing node cluster, wherein:

The first calculation node is adapted to perform a first depth calculation according to the first texture data and the second texture data in the received texture data to obtain a first rough depth map; and synchronize the first rough depth map to The remaining computing nodes in the computing node cluster obtain a rough depth atlas; for the second rough depth map in the rough depth atlas, the third rough depth map is used for verification, and the second rough depth map is obtained An unstable region; and performing a second depth calculation according to the unstable region in the second rough depth map, the texture data corresponding to the second rough depth map, and the texture data corresponding to the third rough depth map, to obtain A corresponding fine depth map, and outputting the obtained fine depth map so that the computing node cluster uses the obtained fine depth map set as the final generated depth map;

Wherein, the first texture data meets a preset first mapping relationship with the first computing node; the second texture data meets a preset first spatial position relationship with the acquisition device of the first texture data The texture data collected by the acquisition device; the second rough depth map and the first computing node meet the preset second mapping relationship; the third rough depth map is the acquisition corresponding to the second rough depth map The device satisfies the rough depth map corresponding to the acquisition device with the preset second spatial position relationship; and the depth map candidate value in the second rough depth map selected by the second depth calculation does not include the depth value of the unstable region.
A computer-readable storage medium having computer instructions stored thereon, wherein the computer instructions execute the steps of the depth map generating method according to any one of claims 1 to 6 when the computer instructions are run.