CN112738009B

CN112738009B - Data synchronization method, device, synchronization system, medium and server

Info

Publication number: CN112738009B
Application number: CN201911033624.2A
Authority: CN
Inventors: 盛骁杰; 李晓阳; 凌康
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-10-28
Filing date: 2019-10-28
Publication date: 2023-05-26
Anticipated expiration: 2039-10-28
Also published as: CN112738009A

Abstract

Data synchronization method, device, synchronization system, medium and server, including: respectively sending a stream pulling instruction to each acquisition device in an acquisition array, wherein each acquisition device in the acquisition array is arranged at different positions of a field acquisition area according to a preset multi-angle free view angle range, and each acquisition device in the acquisition array respectively acquires video data streams from corresponding angles in real time and synchronously; receiving video data streams respectively transmitted by all the acquisition devices in the acquisition array based on the streaming instruction in real time, and determining whether frame-level synchronization is performed between the video data streams respectively transmitted by all the acquisition devices in the acquisition array; and when the frame-level synchronization is not carried out among the video data streams respectively transmitted by the acquisition devices in the acquisition array, respectively sending a stream pulling instruction to the acquisition devices in the acquisition array again until the frame-level synchronization is carried out among the video data streams respectively transmitted by the acquisition devices in the acquisition array. The scheme can meet the requirement of low-delay playing of the multi-angle free view video.

Description

Data synchronization method, device, synchronization system, medium and server

Technical Field

The embodiment of the invention relates to the technical field of data processing, in particular to a data synchronization method, data synchronization equipment, a synchronization system, a medium and a server.

Background

With the continuous development of interconnection technology, more and more video platforms continuously improve the viewing experience of users by providing videos with higher definition or higher viewing fluency.

However, for a video with a strong experience sense on site, for example, a video of a sports game, a user can only watch the game through one view point position in the watching process, and cannot freely switch the view point positions to watch the game pictures or the game process at different view point positions, so that the user cannot experience the feeling of watching the game while moving the view point on site.

The 6-degree-of-freedom (6Degree of Freedom,6DoF) technology is a technology for providing a high-degree-of-freedom viewing experience, and a user can adjust the viewing angle of video viewing through interaction means in the viewing process, so that the viewing experience is greatly improved.

To implement a 6DoF scene, there are currently Free-D playback techniques and light field rendering techniques. The Free-D playback technology expresses a 6DoF image by acquiring point cloud data of a scene through multi-angle shooting, and the light field rendering technology expresses the 6DoF image by acquiring depth of field information and three-dimensional position information of pixels through focal length and spatial position change of a dense light field.

The Free-D playback technology and the light field rendering technology have great demands on storage capacity and operand, so that a large number of servers are required to be arranged on site for processing, the implementation cost is too high, the limitation conditions are too many, the data cannot be processed quickly, the low-delay playing is difficult to realize, the requirements of watching and interaction of users cannot be met, and popularization is not facilitated.

Disclosure of Invention

In view of this, the embodiments of the present invention provide a data synchronization method, apparatus, synchronization system, medium and server, which improve the data processing speed and meet the requirement of low-latency playing of multi-angle free view video.

The embodiment of the invention provides a data synchronization method, which comprises the following steps: respectively sending a stream pulling instruction to each acquisition device in an acquisition array, wherein each acquisition device in the acquisition array is arranged at different positions of a field acquisition area according to a preset multi-angle free view angle range, and each acquisition device in the acquisition array synchronously acquires video data streams from corresponding angles in real time; receiving video data streams respectively transmitted by all the acquisition devices in the acquisition array based on the streaming instruction in real time, and determining whether frame-level synchronization is performed between the video data streams respectively transmitted by all the acquisition devices in the acquisition array; and when the frame-level synchronization is not carried out among the video data streams respectively transmitted by the acquisition devices in the acquisition array, respectively sending a stream pulling instruction to the acquisition devices in the acquisition array again until the frame-level synchronization is carried out among the video data streams respectively transmitted by the acquisition devices in the acquisition array.

Optionally, the data synchronization method further includes: determining one path of video data stream in the video data streams of all the acquisition devices in the acquisition array received in real time as a reference data stream; determining a video frame to be intercepted in the reference data stream based on the received video frame intercepting instruction, and selecting video frames in other video data streams synchronous with the video frame to be intercepted in the reference data stream as the video frames to be intercepted of the other video data streams; intercepting video frames to be intercepted in each video data stream; and synchronously uploading the intercepted video frames to a designated target end.

Optionally, the selecting, as the video frames to be intercepted of the remaining video data streams, video frames in the remaining video data streams synchronized with the video frames to be intercepted of the reference data stream includes at least one of the following: selecting video frames which are consistent with the characteristic information of the object in all other video data streams as video frames to be intercepted of all other video data streams according to the characteristic information of the object in the video frames to be intercepted in the reference data stream; and selecting the video frames which are consistent with the time stamp information in the rest video data streams as video frames to be intercepted of the rest video data streams according to the time stamp information of the video frames in the reference data streams.

Optionally, each acquisition device in the acquisition array respectively acquires the video data stream from the corresponding angle in real time and synchronously, including at least one of the following: when at least one acquisition device acquires an acquisition starting instruction, the acquisition device acquiring the acquisition starting instruction synchronizes the acquisition starting instruction to other acquisition devices, so that each acquisition device in the acquisition array starts to acquire video data streams from corresponding angles in real time and synchronously based on the acquisition starting instruction; and each acquisition device in the acquisition array synchronously acquires video data streams from corresponding angles in real time based on preset clock synchronous signals.

The embodiment of the invention also provides a data processing device, which comprises: the system comprises an instruction sending unit, a video data stream sending unit and a video data stream sending unit, wherein the instruction sending unit is suitable for respectively sending a stream pulling instruction to each acquisition device in an acquisition array, each acquisition device in the acquisition array is arranged at different positions of a field acquisition area according to a preset multi-angle free view angle range, and each acquisition device in the acquisition array respectively synchronously acquires video data streams from corresponding angles in real time; the data stream receiving unit is suitable for receiving video data streams respectively transmitted by all the acquisition devices in the acquisition array based on the streaming instruction in real time; the first synchronization judging unit is suitable for determining whether the video data streams respectively transmitted by the acquisition devices in the acquisition array are in frame level synchronization or not, and re-triggering the instruction sending unit when the video data streams respectively transmitted by the acquisition devices in the acquisition array are not in frame level synchronization until the video data streams respectively transmitted by the acquisition devices in the acquisition array are in frame level synchronization.

Optionally, the data processing apparatus further comprises: the reference video stream determining unit is suitable for determining one path of video data stream in the video data streams of all the acquisition devices in the acquisition array received in real time as a reference data stream; the video frame selecting unit is suitable for determining video frames to be intercepted in the reference data stream based on the received video frame intercepting instruction, and selecting video frames in other video data streams synchronous with the video frames to be intercepted in the reference data stream as video frames to be intercepted in other video data streams; the video frame intercepting unit is suitable for intercepting video frames to be intercepted in each video data stream; and the uploading unit is suitable for synchronously uploading the intercepted video frames to the appointed target end.

Optionally, the video frame selection unit includes at least one of: the first video frame selection module is suitable for selecting video frames which are consistent with the characteristic information of the object in all other video data streams as video frames to be intercepted of all other video data streams according to the characteristic information of the object in the video frames to be intercepted in the reference data stream; the second video frame selection module is suitable for selecting video frames which are consistent with the time stamp information in the rest video data streams according to the time stamp information of the video frames in the reference data stream as video frames to be intercepted of the rest video data streams.

The embodiment of the invention also provides a data synchronization system, which comprises: the system comprises an acquisition array arranged in a field acquisition area and data processing equipment connected with the acquisition array in a link, wherein the acquisition array comprises a plurality of acquisition equipment, and each acquisition equipment in the acquisition array is arranged at different positions of the field acquisition area according to a preset multi-angle free view angle range, wherein: each acquisition device in the acquisition array is suitable for synchronously acquiring video data streams from corresponding angles in real time respectively, and transmitting the acquired video data streams to the data processing device in real time based on a stream pulling instruction sent by the data processing device; the data processing equipment is suitable for respectively sending a stream pulling instruction to each acquisition equipment in the acquisition array, receiving video data streams respectively transmitted by each acquisition equipment in the acquisition array based on the stream pulling instruction in real time, and when frame-level synchronization is not carried out among the video data streams respectively transmitted by each acquisition equipment in the acquisition array, respectively sending the stream pulling instruction to each acquisition equipment in the acquisition array again until frame-level synchronization is carried out among the video data streams transmitted by each acquisition equipment in the acquisition array.

Optionally, the data processing device is further adapted to determine one of the video data streams of each acquisition device in the acquisition array received in real time as a reference data stream; determining a video frame to be intercepted in the reference data stream based on the received video frame intercepting instruction, and selecting video frames in other video data streams synchronous with the video frame to be intercepted in the reference data stream as the video frames to be intercepted of the other video data streams; intercepting video frames to be intercepted in each video data stream and synchronously uploading the video frames obtained by interception to a designated target end.

Optionally, each acquisition device in the acquisition array is connected through a synchronization line, wherein when at least one acquisition device acquires an acquisition starting instruction, the acquisition device acquiring the acquisition starting instruction synchronizes the acquisition starting instruction to other acquisition devices through the synchronization line, so that each acquisition device in the acquisition array starts to acquire video data streams from corresponding angles in real time based on the acquisition starting instruction; or each acquisition device in the acquisition array synchronously acquires the video data stream from the corresponding angle in real time based on a preset clock synchronous signal.

The embodiment of the invention also provides acquisition equipment, which comprises: processor, photoelectric conversion camera module, encoder and transmission part, wherein: the photoelectric conversion camera shooting assembly is suitable for acquiring images; the processor is suitable for synchronizing the acquisition start instruction to other acquisition equipment through the transmission component when acquiring the acquisition start instruction, starting to process the image acquired by the photoelectric conversion camera shooting component in real time to obtain an image data sequence, and transmitting the acquired video data stream to the data processing equipment through the transmission component when acquiring the streaming instruction; the encoder is adapted to encode the image data sequence to obtain a corresponding video data stream. The embodiment of the invention also provides a computer readable storage medium, wherein computer instructions are stored on the computer readable storage medium, and the computer instructions execute the steps of any one of the data synchronization methods in the embodiment of the invention when running.

The embodiment of the invention also provides a server, which comprises a memory and a processor, wherein the memory stores computer instructions capable of running on the processor, and the processor executes the steps of any one of the data synchronization methods in the embodiment of the invention when running the computer instructions.

By adopting the data synchronization scheme in the embodiment of the invention, the video data streams respectively transmitted by the acquisition devices in the acquisition array based on the streaming instruction can be received in real time, and when the frame-level synchronization is not ensured among the video data streams respectively transmitted by the acquisition devices in the acquisition array, the streaming instruction is sent to the acquisition devices in the acquisition array again until the frame-level synchronization is ensured among the video data streams respectively transmitted by the acquisition devices in the acquisition array. From the above, whether the video data streams respectively transmitted by the acquisition devices in the acquisition array are synchronous at the frame level or not is determined, so that the synchronous transmission of multiple paths of data can be ensured, the transmission problems of frame missing and multiple frames can be avoided, the data processing speed is improved, and the requirement of low-delay playing of multi-angle free view video is further met.

Further, one of the video data streams of each acquisition device in the acquisition array received in real time is determined to be used as a reference data stream, a received video frame interception instruction is combined to determine video frames to be intercepted in the reference data stream, then video frames in the rest video data streams synchronous with the video frames to be intercepted in the reference data stream are selected to be used as video frames to be intercepted in the rest video data streams, and then the video frames to be intercepted in each video data stream are intercepted, so that frame interception synchronization is realized, frame interception efficiency is improved, the display effect of the generated multi-angle free view video is further improved, and user experience is enhanced. And the coupling between the process of selecting and intercepting the video frames and the process of generating the multi-angle free view video is reduced, the independence among the processes is enhanced, the later maintenance is convenient, the intercepted video frames are synchronously uploaded to the appointed target end, the network transmission resources can be saved, the data processing load can be reduced, and the speed of generating the multi-angle free view video by data processing is improved.

Further, according to the characteristic information of the object in the video frames to be intercepted in the reference data stream, selecting the video frames which are consistent with the characteristic information of the object in the other video data streams, or according to the timestamp information of the video frames in the reference data stream, selecting the video frames which are consistent with the timestamp information in the other video data streams, and taking the video frames as the video frames to be intercepted in the other video data streams, the synchronous selection and synchronous interception efficiency and the result accuracy of the video frames can be improved, and therefore the integrity and the synchronism of transmission data can be improved.

Further, when at least one acquisition device acquires an acquisition start instruction, the acquisition device acquiring the acquisition start instruction may synchronize the acquisition start instruction to other acquisition devices, or each acquisition device in the acquisition array may be respectively based on a preset clock synchronization signal, so that real-time synchronous acquisition of video data streams may be realized by synchronizing the acquisition start instruction or sharing the clock synchronization signal by hardware.

Drawings

FIG. 1 is a schematic diagram of a data processing system in accordance with an embodiment of the present invention;

FIG. 2 is a flow chart of a data processing method in an embodiment of the invention;

FIG. 3 is a schematic diagram of a data processing system in an application scenario according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an interactive interface of an interactive terminal according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a server according to an embodiment of the present invention;

FIG. 6 is a flow chart of a method of data interaction in an embodiment of the invention;

FIG. 7 is a schematic diagram of another data processing system in accordance with an embodiment of the present invention;

FIG. 8 is a schematic diagram illustrating a data processing system in another application scenario according to an embodiment of the present invention;

FIG. 9 is a schematic structural diagram of an interactive terminal according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of an interactive interface of another interactive terminal according to an embodiment of the present invention;

FIG. 11 is a schematic diagram of an interactive interface of another interactive terminal according to an embodiment of the present invention;

FIG. 12 is a schematic diagram of an interactive interface of another interactive terminal according to an embodiment of the present invention;

FIG. 13 is a schematic diagram of an interactive interface of another interactive terminal according to an embodiment of the present invention;

FIG. 14 is a schematic diagram of an interactive interface of another interactive terminal according to an embodiment of the present invention;

FIG. 15 is a flow chart of another data processing method in an embodiment of the invention;

FIG. 16 is a flow chart of a method of intercepting synchronized video frames in a compressed video data volume in an embodiment of the invention;

FIG. 17 is a flow chart of another data processing method in an embodiment of the invention;

FIG. 18 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention;

FIG. 19 is a schematic diagram of another data processing system in accordance with an embodiment of the invention;

FIG. 20 is a flow chart of a method of data synchronization in an embodiment of the invention;

FIG. 21 is a timing diagram of a pull-stream synchronization in an embodiment of the invention;

FIG. 22 is a flow chart of another method of intercepting synchronized video frames in a compressed video data volume in an embodiment of the invention;

FIG. 23 is a schematic diagram of another data processing apparatus in accordance with an embodiment of the present invention;

FIG. 24 is a schematic diagram of a data synchronization system according to an embodiment of the present invention;

fig. 25 is a schematic structural diagram of a data synchronization system in an application scenario in an embodiment of the present invention;

FIG. 26 is a flow chart of a depth map generation method in an embodiment of the invention;

FIG. 27 is a schematic diagram of a server according to an embodiment of the present invention;

FIG. 28 is a schematic diagram of a server cluster performing depth map processing according to an embodiment of the present invention;

fig. 29 is a flowchart of a virtual viewpoint image generation method in an embodiment of the present invention;

FIG. 30 is a flow chart of a method for combined rendering by a GPU in accordance with an embodiment of the present invention;

FIG. 31 is a schematic diagram of a hole filling method according to an embodiment of the present invention;

fig. 32 is a schematic diagram of a virtual viewpoint image generating system according to an embodiment of the present invention;

fig. 33 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

FIG. 34 is a schematic diagram of another data synchronization system according to an embodiment of the present invention;

FIG. 35 is a schematic diagram of another data synchronization system according to an embodiment of the present invention;

fig. 36 is a schematic structural view of an acquisition device according to an embodiment of the present invention.

Fig. 37 is a schematic diagram of an acquisition array in an application scenario in an embodiment of the present invention.

FIG. 38 is a schematic diagram of another data processing system in accordance with an embodiment of the invention.

Fig. 39 is a schematic structural diagram of another interactive terminal according to an embodiment of the present invention.

Fig. 40 is a schematic diagram of an interactive interface of another interactive terminal according to an embodiment of the present invention.

Fig. 41 is a schematic diagram of an interactive interface of another interactive terminal according to an embodiment of the present invention.

Fig. 42 is a schematic diagram of an interactive interface of another interactive terminal according to an embodiment of the present invention.

Fig. 43 is a schematic connection diagram of an interactive terminal according to an embodiment of the present invention.

Fig. 44 is a schematic diagram of an interaction operation of an interaction terminal according to an embodiment of the present invention.

Fig. 45 is a schematic diagram of an interactive interface of another interactive terminal according to an embodiment of the present invention.

Fig. 46 is a schematic diagram of an interactive interface of another interactive terminal according to an embodiment of the present invention.

Fig. 47 is a schematic diagram of an interactive interface of another interactive terminal according to an embodiment of the present invention.

Fig. 48 is a schematic diagram of an interactive interface of another interactive terminal according to an embodiment of the present invention.

Detailed Description

In the conventional playing scenes such as live broadcasting, rebroadcasting, recorded broadcasting and the like, as described above, a user can only watch a game through one viewpoint position in the watching process, and cannot freely switch the viewpoint positions to watch the game pictures or the game process at different viewpoint positions, and the user cannot experience the feeling of watching the game while moving the viewpoint on site.

The technology of 6 degrees of freedom (6Degree of Freedom,6DoF) is adopted to provide high-degree-of-freedom viewing experience, a user can adjust the viewing angle of video viewing through an interaction means in the viewing process, and the viewing is performed from the viewing-desired free viewpoint, so that the viewing experience is greatly improved.

In order to realize the 6DoF scene, there are Free-D playback technology, light field rendering technology, 6DoF video generation technology based on depth map, and the like. The Free-D playback technology expresses a 6DoF image by acquiring point cloud data of a scene through multi-angle shooting, and the light field rendering technology expresses the 6DoF image by acquiring depth of field information and three-dimensional position information of pixels through focal length and spatial position change of a dense light field. The 6DoF video generating method based on the depth map is to perform combined rendering on the texture map and the depth map of the corresponding group in the image combination of the video frame at the user interaction moment based on the virtual viewpoint position and the parameter data corresponding to the texture map and the depth map of the corresponding group, and reconstruct a 6DoF image or video.

For example, when using Free-D playback scheme in the field, a large number of cameras are required to collect the original data, and gather the original data to a computer room in the field through a digital component serial interface (Serial Digital Interface, SDI) collection card, and then process the original data through a calculation server in the field computer room to obtain point cloud data expressing three-dimensional positions and pixel information of all points in space, and reconstruct a 6DoF scene. The scheme makes the data volume of field collection, transmission and calculation extremely large, and especially has very high-requirement playing scenes such as live broadcast and rebroadcast on a transmission network and a calculation server, and the implementation cost of the 6DoF reconstruction scene is too high and the limitation condition is too much. Moreover, at present, no good technical standard and industrial grade software and hardware support point cloud data, so that long data processing time is required from the original data acquisition of the scene to the final 6DoF reconstruction scene, and the requirements of low-delay playing and real-time interaction of multi-angle free view video cannot be met.

For example, when the field rendering scheme is used on site, depth of field information and three-dimensional position information of pixels are required to be obtained through focal length and spatial position change of a dense light field, and as the resolution of a light field image obtained by the dense light field is overlarge and often needs to be decomposed into hundreds of conventional two-dimensional pictures, the scheme also causes extremely large data quantity for field acquisition, transmission and calculation, has extremely high requirements on a field transmission network and a calculation server, has extremely high implementation cost and too many limiting conditions, and cannot rapidly process the data. In addition, the technical means of reconstructing a 6DoF scene through a light field image is still in experimental exploration, and the requirements of low-delay playing and real-time interaction of multi-angle free view video cannot be met effectively at present.

In summary, the Free-D playback technology and the light field rendering technology have very large demands on storage capacity and computation, so a large number of servers are required to be arranged on site for processing, so that the implementation cost is too high, the limitation condition is too many, the data cannot be processed quickly, the requirements of watching and interaction cannot be met, and the popularization is not facilitated.

Although the depth map-based 6DoF video reconstruction method can reduce the data operand in the video reconstruction process, the requirements of low-delay playing and real-time interaction of multi-angle free view video are difficult to meet due to the constraints of various factors such as network transmission bandwidth, equipment decoding capability and the like.

In view of the above problems, some embodiments of the present invention provide a multi-angle free view image generating scheme, which adopts a distributed system architecture, wherein an acquisition array formed by a plurality of acquisition devices is set in a field acquisition area to synchronously acquire frame images of a plurality of angles, a data processing device intercepts video frames according to a frame interception instruction on the frame images acquired by the acquisition devices, a server uses frame images of a plurality of synchronous video frames uploaded by the data processing device as an image combination, parameter data corresponding to the image combination and depth data of each frame image in the image combination can be determined, frame image reconstruction is performed on a preset virtual view path based on the parameter data corresponding to the image combination, pixel data and depth data of preset frame images in the image combination, so as to obtain corresponding multi-angle free view video data, and the multi-angle free view video data is inserted into a data stream to be played of a play control device for transmission to a play terminal for play.

Referring to a schematic structural diagram of a data processing system of an application scenario in an embodiment of the present invention, a data processing system 10 includes: the data processing device 11, the server 12, the play control device 13 and the play terminal 14, wherein the data processing device 11 can intercept video frames of frame images acquired by an acquisition array in a field acquisition area, can avoid a large amount of data transmission and data processing by intercepting the video frames of the multi-angle free view images to be generated, and then the server 12 generates the multi-angle free view images, so that powerful computing capacity of the server can be fully utilized, multi-angle free view video data can be quickly generated, and accordingly the multi-angle free view video data can be timely inserted into a data stream to be played of the play control device, play of the multi-angle free view is achieved at low cost, and requirements of users on low-delay play and real-time interaction of the multi-angle free view video are met.

In order to make the embodiments of the present specification more apparent and practical to those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described in conjunction with the accompanying drawings in the embodiments of the present specification, and it is apparent that the described embodiments are some embodiments of the present specification, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

Referring to the flowchart of the data processing method shown in fig. 2, in an embodiment of the present invention, the method specifically may include the following steps:

s21, receiving frame images of a plurality of synchronous video frames uploaded by the data processing equipment as image combination.

The data processing device acquires video frames at a specified frame moment in a multipath video data stream which is synchronously acquired and uploaded from different positions of an on-site acquisition area in real time based on a video frame interception instruction, and the plurality of synchronous video frames have different shooting visual angles.

In a specific implementation, the video frame interception instruction may include information specifying a frame time, and the data processing device intercepts a video frame of a corresponding frame time from the multi-path video data stream according to the information specifying the frame time in the video frame interception instruction. The appointed frame time can take frames as units, N to M frames are taken as appointed frame time, N and M are integers not less than 1, and N is not more than M; alternatively, the specified frame time may be a time unit, and X to Y seconds are taken as the specified frame time, where X and Y are positive numbers, and X is equal to or less than Y. Thus, the plurality of synchronized video frames may include all frame-level synchronized video frames corresponding to a given frame time, with the pixel data of each video frame forming a corresponding frame image.

For example, according to the received video frame interception instruction, the data processing device may obtain that the designated frame time is the 2 nd frame in the multi-path video data stream, then the data processing device intercepts the video frames of the 2 nd frame in each path of video data stream, and the intercepted video frames of the 2 nd frame of each path of video data stream are synchronized at the frame level, so as to obtain a plurality of obtained synchronized video frames.

For another example, assuming that the acquisition frame rate is set to 25fps, that is, 25 frames are acquired in 1 second, the data processing apparatus may obtain, according to the received video frame capturing instruction, video frames within 1 second in the multiple video data streams at a specified frame time, and the data processing apparatus may capture 25 video frames within 1 second in each video data stream, and frame-level synchronization between 1 st video frames within 1 second in each captured video data stream, and frame-level synchronization between 2 nd video frames within 1 second in each captured video data stream, until frame-level synchronization between 25 th video frames within 1 second in each captured video data stream is used as the obtained multiple synchronous video frames.

For example, the data processing device may obtain the designated frame time as the 2 nd frame and the 3 rd frame in the multi-path video data stream according to the received video frame interception instruction, and the data processing device may intercept the video frame of the 2 nd frame and the video frame of the 3 rd frame in each path of video data stream respectively, and synchronize frame levels between the video frames of the 2 nd frame and between the video frames of the 3 rd frame of each path of video data stream intercepted respectively as a plurality of synchronous video frames.

S22, determining the parameter data corresponding to the image combination.

In a specific implementation, the parameter data corresponding to the image combination can be obtained through a parameter matrix, and the parameter matrix can comprise an internal parameter matrix, an external parameter matrix, a rotation matrix, a translation matrix and the like. Thereby, the correlation between the three-dimensional geometrical position of the specified points of the surface of the object in space and their corresponding points in the image combination can be determined.

In the embodiment of the invention, a motion reconstruction (Structure From Motion, SFM) algorithm can be adopted, and based on a parameter matrix, feature extraction, feature matching and global optimization are performed on the acquired image combination, and the acquired parameter estimation value is used as the corresponding parameter data of the image combination. The algorithm adopted by the feature extraction can comprise any one of the following: scale-invariant feature transform (SIFT) algorithm, accelerated robust feature (Speeded-Up Robust Features, SURF) algorithm, feature of accelerated segment test (Features from Accelerated Segment Test, FAST) algorithm. The algorithm used for feature matching may include: a Euclidean distance calculation method, a random sample consistency (Random Sample Consensus, RANSC) algorithm, and the like. The algorithm for global optimization may include: beam method adjustment (Bundle Adjustment, BA), etc.

S23, determining depth data of each frame of image in the image combination.

In implementations, depth data for each frame image may be determined based on a plurality of frame images in the image combination. Wherein the depth data may comprise depth values corresponding to pixels of each frame of image in the image combination. The distance of the acquisition point to the various points in the field may be taken as the above-mentioned depth value, which may directly reflect the geometry of the visible surface in the area to be viewed. For example, with the origin of the photographing coordinate system as the optical center, the depth value may be the distance from each point in the field to the optical center along the photographing optical axis. It will be appreciated by those skilled in the art that the distances may be relative values and that multiple frame images may be referenced to the same basis.

In an embodiment of the present invention, a binocular stereoscopic algorithm may be used to calculate depth data for each frame of image. In addition, the depth data can be obtained by analyzing and indirectly estimating the luminosity features, the brightness features and the like of the frame image.

In another embodiment of the present invention, a multi-View three-dimensional reconstruction (MVS) algorithm may be used for frame image reconstruction. In the reconstruction process, all pixels can be adopted for reconstruction, or the pixels can be downsampled and only partial pixels can be used for reconstruction. Specifically, the pixels of each frame image may be matched, the three-dimensional coordinates of each pixel may be reconstructed, points having image consistency may be obtained, and then the depth data of each frame image may be calculated. Or, the pixels of the selected frame images can be matched, the three-dimensional coordinates of the pixels of each selected frame image are reconstructed, the points with image consistency are obtained, and then the depth data of the corresponding frame images are calculated. The pixel data of the frame image corresponds to the calculated depth data, and the mode of selecting the frame image can be set according to specific situations, for example, the distance between the frame image of the depth data and other frame images can be calculated according to the needs, and partial frame images can be selected.

S24, based on the parameter data corresponding to the image combination, pixel data and depth data of a preset frame image in the image combination, carrying out frame image reconstruction on a preset virtual viewpoint path, and obtaining corresponding multi-angle free view video data.

The multi-angle freeview video data may include: multi-angle freeview spatial data and multi-angle freeview temporal data of frame images ordered according to frame moments.

The pixel data of the frame image may be any one of YUV data and RGB data, or may be other data capable of expressing the frame image; the depth data may include depth values corresponding to pixel data of the frame image one by one, or may be a partial numerical value selected from a set of depth values corresponding to pixel data of the frame image one by one, where a specific selection manner depends on a specific scenario; the virtual view point is selected from a multi-angle free view angle range, and the multi-angle free view angle range is a range supporting view point switching and watching of a region to be watched.

In a specific implementation, the preset frame image may be all frame images in the image combination, or may be selected partial frame images. The selection mode can be set according to specific situations, for example, partial frame images at corresponding positions in the image combination can be selected according to the position relation among the acquisition points; for another example, a partial frame image of the corresponding frame time in the image combination may be selected according to the frame time or frame period desired to be acquired.

Because the preset frame images can correspond to different frame moments, each virtual viewpoint in the virtual viewpoint path can be corresponding to each frame moment, a corresponding frame image is obtained according to the frame moment corresponding to each virtual viewpoint, then based on the image combination corresponding parameter data, depth data and pixel data of the frame image corresponding to each virtual viewpoint, frame image reconstruction is performed on each virtual viewpoint, and corresponding multi-angle free view video data is obtained, and at this time, the multi-angle free view video data can include: multi-angle freeview spatial data and multi-angle freeview temporal data of frame images ordered according to frame moments. In other words, in the implementation, in addition to the multi-angle freeview image at a certain moment, a multi-angle freeview video that is continuous or discontinuous in time sequence can be implemented.

In an embodiment of the present invention, the image combination includes a synchronous video frames, wherein a1 synchronous video frames correspond to a first frame time, a2 synchronous video frames correspond to a second frame time, a1+a2=a; and, a virtual viewpoint path composed of B virtual viewpoints is preset, wherein B1 virtual viewpoints correspond to a first frame time, B2 virtual viewpoints correspond to a second frame time, b1+b2 is less than or equal to 2B, then, based on the image combination corresponding parameter data, pixel data and depth data of frame images of a1 synchronous video frame at the first frame time, a first frame image reconstruction is performed on the path composed of B1 virtual viewpoints, based on the image combination corresponding parameter data, pixel data and depth data of frame images of a21 synchronous video frame at the first frame time, a second frame image reconstruction is performed on the path composed of B2 virtual viewpoints, and finally, corresponding multi-angle free view video data is obtained, wherein the multi-angle free view video data may include: multi-angle freeview spatial data and multi-angle freeview temporal data of frame images ordered according to frame moments.

It will be appreciated that the specified frame time and virtual view may be divided more finely, so as to obtain more synchronous video frames and virtual views corresponding to different frame times, and the above embodiment is merely illustrative, and not limiting to the specific embodiments.

In the embodiment of the invention, an image drawing (Depth Image Based Rendering, DIBR) algorithm based on a depth map can be adopted, and the pixel data and the depth data of a preset frame image are combined and rendered according to the image combination corresponding parameter data and a preset virtual viewpoint path, so that the frame image reconstruction based on the preset virtual viewpoint path is realized, and the corresponding multi-angle free view video data is obtained.

S25, inserting the multi-angle free view video data into a data stream to be played of the playing control device and playing the data stream through the playing terminal.

The play control device may take as input a multi-path video data stream, where the video data stream may be from each acquisition device in the acquisition array or from other acquisition devices. The play control device may select one path of input video data stream as a data stream to be played according to needs, where the multi-angle free view video data obtained in the foregoing step S24 may be selected to be inserted into the data stream to be played, or the video data streams of other input interfaces are switched to the input interface of the multi-angle free view video data, and the play control device outputs the selected data stream to be played to the play terminal, so that the play terminal may play the video image with the multi-angle free view, so that a user may view the video image with the multi-angle free view through the play terminal. The playing terminal can be video playing equipment such as a television, a mobile phone, a tablet computer and the like or electronic equipment comprising a display screen.

In a specific implementation, multi-angle free view video data of a data stream to be played inserted into the play control device may be retained in the play terminal, so that a user can watch in a time-shifting manner, where the time-shifting may be operations such as pause, backward, fast forward to the current moment, etc. performed when the user watches.

By adopting the data processing method, the data processing equipment in the distributed system architecture can be used for processing the interception of the appointed video frame and the reconstruction of the multi-angle free view video after the server intercepts the preset frame image, so that a large number of servers can be prevented from being arranged on site for processing, and the video data streams acquired by the acquisition equipment of the acquisition array can be prevented from being directly uploaded, so that a large amount of transmission resources and server processing resources can be saved, and under the condition of limited network transmission bandwidth, the multi-angle free view video of the appointed video frame can be reconstructed in real time, the low-delay playing of the multi-angle free view video is realized, the limitation of the network transmission bandwidth is reduced, the implementation cost is reduced, the limiting condition is reduced, and the requirements of the low-delay playing and the real-time interaction of the multi-angle free view video are met.

In a specific implementation, according to the relation between the virtual parameter data of each virtual viewpoint in the preset virtual viewpoint path and the corresponding parameter data of the image combination, mapping the depth data of the preset frame image in the image combination to the corresponding virtual viewpoint respectively; and carrying out frame image reconstruction according to the pixel data and the depth data of the preset frame image respectively mapped to the corresponding virtual view point and the preset virtual view point path to obtain corresponding multi-angle free view angle video data.

Wherein, the virtual parameter data of the virtual viewpoint may include: virtual viewing position data and virtual viewing angle data; the image combination corresponding parameter data may include: position data, shooting angle data, and the like are collected. The method of forward mapping and then reverse mapping can be adopted to obtain the reconstructed image.

In a specific implementation, the acquisition position data and the shooting angle data may be referred to as external parameter data, and the parameter data may further include internal parameter data, where the internal parameter data may include attribute data of the acquisition device, so that the mapping relationship may be determined more accurately. For example, the internal parameter data may include distortion data, and the mapping relationship may be further accurately determined spatially in consideration of distortion factors.

In a specific embodiment, in order to facilitate subsequent data acquisition, a stitched image corresponding to the image combination may be generated based on the pixel data and the depth data of the image combination, where the stitched image may include a first field and a second field, where the first field includes the pixel data of the image combination, and the second field includes the depth data of the image combination, and then the stitched image corresponding to the image combination and the corresponding parameter data are stored.

In another specific embodiment, in order to save storage space, a stitched image corresponding to a preset frame image in the image combination may be generated based on pixel data and depth data of the preset frame image in the image combination, where the stitched image corresponding to the preset frame image may include a first field and a second field, where the first field includes pixel data of the preset frame image, and the second field includes depth data of the preset frame image, and then only the stitched image corresponding to the preset frame image and corresponding parameter data may be stored.

The first field corresponds to the second field, the spliced image can be divided into an image area and a depth map area, pixel fields of the image area store pixel data of the plurality of frame images, and pixel fields of the depth map area store depth data of the plurality of frame images; a pixel field of the image area storing pixel data of the frame image is used as the first field, and a pixel field of the depth image area storing depth data of the frame image is used as the second field; the acquired spliced image of the image combination and the corresponding parameter data of the image combination can be stored in a data file, and when the spliced image or the corresponding parameter data is required to be acquired, the spliced image or the corresponding parameter data can be read from a corresponding storage space according to a storage address contained in a header file of the data file.

In addition, the storage format of the image combinations may be a video format, the number of the image combinations may be a plurality, and each image combination may be an image combination corresponding to a different frame time after the video is unpacked and decoded.

In a specific implementation, based on a received image reconstruction instruction from an interactive terminal, interactive frame time information of an interactive time is determined, a spliced image of a corresponding image combination preset frame image corresponding to the interactive frame time and parameter data corresponding to the corresponding image combination are sent to the interactive terminal, so that the interactive terminal selects corresponding pixel data and depth data and corresponding parameter data in the spliced image according to a preset rule based on virtual viewpoint position information determined by interactive operation, performs combined rendering on the selected pixel data and depth data and the parameter data, reconstructs to obtain multi-angle free view video data corresponding to the virtual viewpoint position to be interacted, and plays the multi-angle free view video data.

The preset rule may be set according to a specific scenario, for example, based on virtual viewpoint position information determined by the interactive operation, position information of W adjacent virtual viewpoints of virtual viewpoints closest to the interactive moment in a distance order may be selected, and pixel data and depth data corresponding to the total w+1 virtual viewpoints of the virtual viewpoints including the interactive moment and satisfying the interactive frame moment information may be obtained in the stitched image.

The interactive frame time information is determined based on a trigger operation from the interactive terminal, and the trigger operation can be a trigger operation input by a user or a trigger operation automatically generated by the interactive terminal, for example, the interactive terminal can automatically initiate the trigger operation when detecting that the identifier of the multi-angle free viewpoint data frame exists. When the user manually triggers, the interaction terminal can display the interaction prompt information and then select the time information for triggering the interaction, or the interaction terminal can receive the historical time information for triggering the interaction by the user operation, wherein the historical time information can be the time information before the current playing time.

In a specific implementation, the interactive terminal may perform combined rendering on pixel data and depth data of a spliced image of a preset frame image in the acquired image combination of the interactive frame time based on the spliced image of the preset frame image and corresponding parameter data in the acquired image combination of the interactive frame time, the interactive frame time information and virtual viewpoint position information of the interactive frame time by adopting the same method as the above step S24, so as to obtain multi-angle free view video data corresponding to the virtual viewpoint position of the interaction, and start playing the multi-angle free view video at the virtual viewpoint position of the interaction.

By adopting the scheme, the multi-angle free view video data corresponding to the interactive virtual view position can be generated at any time based on the image reconstruction instruction from the interactive terminal, and the user interaction experience can be further improved.

With reference to the schematic diagram of the data processing system shown in FIG. 1, in an embodiment of the present invention, as shown in FIG. 1, data processing system 10 may include: a data processing device 11, a server 12, a play control device 13, and a play terminal 14, wherein:

the data processing device 11 is adapted to intercept video frames at a specified frame time from multiple paths of video data streams synchronously acquired in real time at different positions of the field acquisition area based on a video frame interception instruction to obtain multiple synchronous video frames, and upload the multiple obtained synchronous video frames at the specified frame time to the server, where the multiple paths of video data streams may be video data streams in a compressed format or video data streams in a non-compressed format;

the server 12 is adapted to determine, as an image combination, parameter data corresponding to the image combination and depth data of each frame image in the image combination, and reconstruct a frame image of a preset virtual viewpoint path based on the parameter data corresponding to the image combination, pixel data and depth data of a preset frame image in the image combination, so as to obtain corresponding multi-angle free view video data, where the multi-angle free view video data includes: multi-angle free view angle space data and multi-angle free view angle time data of frame images sequenced according to frame moments;

The play control device 13 is adapted to insert the multi-angle free view video data into a data stream to be played;

the playing terminal 14 is adapted to receive the data stream to be played from the playing control device 13 and play the data stream in real time.

In a specific implementation, the play control terminal 13 may output the data stream to be played based on the control instruction. As an alternative example, the playback control apparatus 13 may select one of the multiple data streams as a data stream to be played back, or switch the selection among the multiple data streams continuously to output the data stream to be played back. The lead-in control device may be used as a play control device in the embodiment of the present invention. The pilot control device can be manual or semi-manual pilot control device for playing control based on an external input control instruction, or can be virtual pilot control device capable of automatically conducting pilot control based on artificial intelligence or big data learning or a preset algorithm.

By adopting the data processing system, the data processing equipment in the distributed system architecture can be used for processing the interception of the appointed video frame and the reconstruction of the multi-angle free view video after the server intercepts the preset frame image, so that a large number of servers can be prevented from being arranged on site for processing, and the video data streams collected by the collection equipment of the collection array can be prevented from being directly uploaded, so that a large number of transmission resources and server processing resources can be saved, and under the condition of limited network transmission bandwidth, the multi-angle free view video of the appointed video frame can be reconstructed in real time, the low-delay playing of the multi-angle free view video is realized, the limitation of the network transmission bandwidth is reduced, the implementation cost is reduced, the limitation condition is reduced, the realization is easy, and the requirements of the low-delay playing and the real-time interaction of the multi-angle free view video are met.

In a specific implementation, the server 12 is further adapted to generate a stitched image of the preset frame time in the image combination based on the pixel data and the depth data of the preset frame image in the image combination, where the stitched image includes a first field and a second field, the first field includes the pixel data of the preset frame image in the image combination, the second field includes the second field of the depth data of the preset frame image in the image combination, and store the stitched image of the image combination and the parameter data corresponding to the image combination.

In a specific implementation, the data processing system 10 may further include an interaction terminal 15, which is adapted to determine interaction frame time information based on a triggering operation, send an image reconstruction instruction including the interaction frame time information to a server, receive a stitched image of a preset frame image and corresponding parameter data in an image combination corresponding to the interaction frame time returned from the server, determine virtual viewpoint position information based on the interaction operation, select corresponding pixel data and depth data in the stitched image according to a preset rule, perform combined rendering based on the selected pixel data and depth data and the parameter data, reconstruct multi-angle free view video data corresponding to the virtual viewpoint position of the interaction frame time, and play the video data.

The number of the playing terminals 14 may be one or more, the number of the interaction terminals 15 may be one or more, and the playing terminals 14 and the interaction terminals 15 may be the same terminal device. And, at least one of a server, a play control device or an interactive terminal may be used as a transmitting end of the video frame interception instruction, and other devices capable of transmitting the video frame interception instruction may also be used.

It should be noted that, in the specific implementation, the positions of the data processing device and the server may be flexibly deployed according to the needs of the user. For example, the data processing device may be located in an off-site acquisition area or cloud. For another example, the server may be disposed in a field non-collecting area, and the cloud end or the terminal access side, for example, on the terminal access side, a base station, a set top box, a router, a home data center server, a hot spot device, and other edge node devices may all be used as the server to obtain multi-angle free view data. Or the data processing equipment and the server can be arranged together in a centralized way to be used as a server cluster for cooperative work, so that the rapid generation of multi-angle free view angle data is realized, and the low-time delay playing and real-time interaction of the multi-angle free view angle video are realized.

By adopting the scheme, the multi-angle free view video data corresponding to the virtual view position to be interacted can be generated at any time based on the image reconstruction instruction from the interaction terminal, and the user interaction experience can be further improved.

In order to enable those skilled in the art to better understand and practice the embodiments of the present invention, a data processing system is described in detail below with reference to a specific application scenario.

Referring to fig. 3, a schematic structural diagram of a data processing system in an application scenario is shown, where an arrangement scenario of a data processing system of a basketball game is shown, where the data processing system includes an acquisition array 31 formed by a plurality of acquisition devices, a data processing device 32, a cloud server cluster 33, a play control device 34, a play terminal 35 and an interaction terminal 36.

Referring to fig. 3, the left basketball rim is taken as a core point, the core point is taken as a center, and a sector area which is in the same plane with the core point is taken as a preset multi-angle free view angle range. Each collection device in the collection array 31 may be disposed in different positions of the field collection area in a fan shape according to the preset multi-angle free view angle range, and may collect video data streams in real time and synchronously from corresponding angles respectively.

In implementations, the collection device may also be located in a ceiling area of a basketball venue, on a basketball stand, or the like. The acquisition devices may be arranged and distributed along a line, a sector, an arc, a circle, or an irregular shape. The specific arrangement mode can be set according to one or more factors such as specific field environment, the number of the acquisition devices, the characteristics of the acquisition devices, imaging effect requirements and the like. The acquisition device may be any device with camera functionality, such as a normal camera, a cell phone, a professional camera, etc.

The data processing device 32 may be located in an off-site area, which may be considered an on-site server, in order not to interfere with the operation of the acquisition device. The data processing device 32 may send a streaming instruction to each of the collection devices in the collection array 31 through a wireless local area network, and each of the collection devices in the collection array 31 transmits the obtained video data stream to the data processing device 32 in real time based on the streaming instruction sent by the data processing device 32. Wherein each acquisition device in the acquisition array 31 can transmit the obtained video data stream to the data processing device 32 in real time through the switch 37.

When the data processing device 32 receives a video frame interception instruction, it intercepts video frames at a specified frame time from the received multi-channel video data stream to obtain frame images of a plurality of synchronous video frames, and uploads the obtained plurality of synchronous video frames at the specified frame time to the cloud server cluster 33.

Accordingly, the cloud server cluster 33 uses the received frame images of the plurality of synchronous video frames as an image combination, determines parameter data corresponding to the image combination and depth data of each frame image in the image combination, and reconstructs a frame image of a preset virtual viewpoint path based on the parameter data corresponding to the image combination, pixel data and depth data of the preset frame image in the image combination, so as to obtain corresponding multi-angle free view video data, where the multi-angle free view video data may include: multi-angle freeview spatial data and multi-angle freeview temporal data of frame images ordered according to frame moments.

The servers may be placed in the cloud and in order to be able to process the data more quickly in parallel, the server cluster 33 of the cloud may be composed of a plurality of different servers or server groups, depending on the data processed.

For example, the cloud server cluster 33 may include: the first cloud server 331, the second cloud server 332, the third cloud server 333, and the fourth cloud server 334. The first cloud server 331 may be configured to determine parameter data corresponding to the image combination; the second cloud server 332 may be configured to determine depth data of each frame of image in the image combination; the third cloud server 333 may reconstruct a frame image of a preset virtual viewpoint path using a virtual viewpoint reconstruction (Depth Image Based Rendering, DIBR) algorithm based on the image combination corresponding parameter data, the image combination pixel data, and the depth data; the fourth cloud server 334 may be configured to generate multi-angle freeview video, where the multi-angle freeview video data may include: multi-angle freeview spatial data and multi-angle freeview temporal data of frame images ordered according to frame moments.

It can be appreciated that the first cloud server 831, the second cloud server 832, the third cloud server 833, and the fourth cloud server 834 may also be server groups formed by server arrays or server subsets, which is not limited in the embodiment of the present invention.

In an implementation, the cloud server cluster 33 may store the pixel data and the depth data of the image combination in the following manner:

and generating a spliced image corresponding to the frame time based on the pixel data and the depth data of the image combination, wherein the spliced image comprises a first field and a second field, the first field comprises pixel data of a preset frame image in the image combination, and the second field comprises a second field of the depth data of the preset frame image in the image combination. The acquired spliced image and corresponding parameter data can be stored in a data file, and when the spliced image or the parameter data needs to be acquired, the spliced image or the parameter data can be read from a corresponding storage space according to a corresponding storage address in a header file of the data file.

Then, the play control device 34 may insert the received multi-angle freeview video data into a data stream to be played, and the play terminal 35 receives the data stream to be played from the play control device 34 and plays the data stream in real time. The play control device 34 may be an artificial play control device or a virtual play control device. In specific implementation, a special server capable of automatically switching video streams can be set as a virtual playing control device to control a data source. A director control device, such as a director station, may be used as a play control device in an embodiment of the present invention.

It will be appreciated that the data processing device 32 may be disposed in an on-site non-acquisition area or a cloud end according to a specific scenario, and the server (cluster) and the play control device may be disposed in an on-site non-acquisition area, a cloud end or a terminal access side according to a specific scenario, which is not intended to limit the specific implementation and protection scope of the present invention.

As shown in the schematic diagram of the interactive interface of the interactive terminal in fig. 4, the interactive interface of the interactive terminal 40 has a progress bar 41, and in conjunction with fig. 3 and 4, the interactive terminal 40 may associate the designated frame time received from the data processing device 32 with the progress bar, and may generate several interactive identifications, such as

interactive identifications

42 and 43, on the progress bar 41. Wherein, the black section of the progress bar 41 is a played portion 41a, and the blank section of the progress bar 41 is an unplayed portion 41b.

When the system of the interactive terminal reads the corresponding interactive identifier 43 on the progress bar 41, the interface of the interactive terminal 40 can display interactive prompt information. For example, when the user selects to trigger the current interactive identifier 43, the interactive terminal 40 receives the feedback and generates an image reconstruction instruction of the interactive frame moment corresponding to the interactive identifier 43, and sends the image reconstruction instruction containing the information of the interactive frame moment to the server cluster 33 of the cloud. When the user does not select the trigger, the interactive terminal 40 may continue to read the subsequent video data, and the played portion 41a on the progress bar continues to advance. The user may also select to trigger the history interaction identifier, for example, the interaction identifier 42 displayed by the played portion 41a on the trigger progress bar, and the interaction terminal 40 receives the feedback to generate an image reconstruction instruction of the interaction frame moment corresponding to the interaction identifier 42.

When the cloud server cluster 33 receives the image reconstruction instruction from the interactive terminal 40, the stitched image of the preset frame image in the corresponding image combination and the parameter data corresponding to the corresponding image combination may be extracted and transmitted to the interactive terminal 40.

The interactive terminal 40 determines the interactive frame time information based on the triggering operation, sends an image reconstruction instruction containing the interactive frame time information to the server, receives the spliced image of the preset frame image and the corresponding parameter data in the image combination corresponding to the interactive frame time returned from the cloud server cluster 33, determines the virtual viewpoint position information based on the interactive operation, selects the corresponding pixel data and depth data and the corresponding parameter data in the spliced image according to the preset rule, performs combined rendering on the selected pixel data and depth data, reconstructs the multi-angle free view video data corresponding to the virtual viewpoint position at the interactive frame time, and plays the video data.

It may be understood that, each collection device in the collection array and the data processing device may be connected through an exchange and/or a local area network, the number of the playing terminal and the number of the interaction terminals may be one or more, the playing terminal and the interaction terminal may be the same terminal device, the data processing device may be disposed in an on-site non-collection area or a cloud according to a specific scenario, and the server may be disposed in an on-site non-collection area, a cloud or a terminal access side according to a specific scenario.

The embodiment of the invention also provides a server corresponding to the data processing method, and in order to enable the person skilled in the art to better understand and realize the embodiment of the invention, the embodiment is described in detail below by means of specific embodiments with reference to the accompanying drawings.

Referring to the schematic structure of the server shown in fig. 5, in an embodiment of the present invention, as shown in fig. 5, the server 50 may include:

a data receiving unit 51 adapted to receive frame images of a plurality of synchronous video frames uploaded by the data processing apparatus as an image combination;

a parameter data calculation unit 52 adapted to determine parameter data corresponding to the image combination;

a depth data calculation unit 53 adapted to determine depth data of each frame of image in the image combination;

the video data obtaining unit 54 is adapted to reconstruct a frame image of a preset virtual viewpoint path based on the parameter data corresponding to the image combination, the pixel data and the depth data of a preset frame image in the image combination, and obtain corresponding multi-angle free view video data, where the multi-angle free view video data includes: multi-angle freeview spatial data and multi-angle freeview temporal data of frame images ordered according to frame moments.

The first data transmission unit 55 is adapted to insert the multi-angle freeview video data into a data stream to be played of the play control device and play the data stream through the play terminal.

The plurality of synchronous video frames can be obtained by intercepting video frames at a specified frame moment in a plurality of paths of video data streams synchronously acquired and uploaded from different positions of a field acquisition area based on a video frame interception instruction, and the photographing angles of the plurality of synchronous video frames are different.

The server can be placed in the non-acquisition area, the cloud end or the terminal access side according to specific situations.

In an implementation, as shown in fig. 5, the video data acquisition unit 54 may include:

a data mapping subunit 541, adapted to map depth data of a frame image preset in the image combination to corresponding virtual views according to a relationship between virtual parameter data of each virtual view in the preset virtual view path and corresponding parameter data of the image combination;

The data reconstruction subunit 542 is adapted to reconstruct a frame image according to the pixel data and the depth data of the preset frame image mapped to the corresponding virtual viewpoint and the preset virtual viewpoint path, respectively, to obtain corresponding multi-angle free view video data.

In an implementation, as shown in fig. 5, the server 50 may further include:

a stitched image generating unit 56, adapted to generate a stitched image corresponding to the image combination based on the pixel data and the depth data of the preset frame image in the image combination, where the stitched image may include a first field and a second field, where the first field includes the pixel data of the preset frame image in the image combination, and the second field includes the depth data of the preset frame image in the image combination;

the data storage unit 57 is adapted to store stitched images of the image combinations and parameter data corresponding to the image combinations.

In an implementation, as shown in fig. 5, the server 50 may further include:

the data extraction unit 58 is adapted to determine the interaction frame time information of the interaction time based on the received image reconstruction instruction from the interaction terminal, and extract the spliced image of the preset frame image in the corresponding image combination corresponding to the interaction frame time and the corresponding parameter data of the corresponding image combination;

The second data transmission unit 59 is adapted to transmit the stitched image and the corresponding parameter data extracted by the data extraction unit 58 to the interactive terminal, so that the interactive terminal selects corresponding pixel data and depth data and corresponding parameter data in the stitched image according to a preset rule based on virtual viewpoint position information determined by the interactive operation, performs combined rendering on the selected pixel data and depth data, and reconstructs multi-angle free view video data corresponding to the virtual viewpoint position at the moment of the interactive frame, and plays the multi-angle free view video data.

The embodiment of the invention also provides a data interaction method and a data processing system, which can acquire the data stream to be played from the playing control equipment in real time and play and display the data stream in real time, wherein each interaction identifier in the data stream to be played is associated with the appointed frame time of the video data, and then the interaction data corresponding to the appointed frame time of the interaction identifier can be acquired in response to the triggering operation of the interaction identifier.

By adopting the data interaction scheme in the embodiment of the invention, in the playing process, the interaction data can be obtained according to the triggering operation of the interaction identifier, and further multi-angle free view display is carried out, so that the user interaction experience is improved. The following detailed description is of specific embodiments, particularly directed to data interaction methods and data processing systems, with reference to the accompanying drawings.

The following describes a data interaction method according to an embodiment of the present invention by specific steps with reference to a flowchart of the data interaction method shown in fig. 6.

S61, obtaining a data stream to be played from the playing control equipment in real time and playing and displaying the data stream in real time, wherein the data stream to be played comprises video data and interactive identifications, and each interactive identification is associated with a designated frame time of the data stream to be played.

The appointed frame time can take frames as units, N to M frames are taken as appointed frame time, N and M are integers not less than 1, and N is not more than M; alternatively, the specified frame time may be a time unit, and X to Y seconds may be positive numbers, and X and Y may be equal to or less than Y.

In a specific implementation, the data stream to be played may be associated with a plurality of designated frame moments, and the play control device may generate, based on information of each designated frame moment, an interaction identifier corresponding to each designated frame moment, so that when the data stream to be played is played and displayed in real time, the corresponding interaction identifier may be displayed at the designated frame moment. Wherein, each interactive identification and video data can be associated in different modes according to actual conditions.

In an embodiment of the present invention, the data stream to be played may include a plurality of frame moments corresponding to the video data, and because each interactive identifier also has a corresponding designated frame moment, information of each designated frame moment corresponding to each interactive identifier and information of each frame moment in the data stream to be played may be matched, and the frame moments of the same information may be associated with the interactive identifiers, so that when the data stream to be played is played in real time and the corresponding frame moment is reached, the corresponding interactive identifier may be displayed.

For example, the data stream to be played includes N frame moments, and the playing control device generates corresponding M interactive identifications based on information of M specified frame moments. If the information of the ith frame time is the same as the information of the jth appointed frame time, the ith frame time and the jth interaction identifier can be associated, and the jth interaction identifier can be displayed when the real-time playing and displaying is carried out until the ith frame time, wherein i is a natural number not greater than N, and j is a natural number not greater than M.

S62, responding to triggering operation of an interaction identifier, and acquiring interaction data corresponding to the appointed frame time of the interaction identifier, wherein the interaction data comprises multi-angle free view angle data.

In a specific implementation, each interactive data corresponding to each designated frame time may be stored in a preset storage device, and because the interactive identifier and the designated frame time have a corresponding relationship, by executing a triggering operation on the interactive terminal, the interactive identifier displayed by the interactive terminal may be triggered, and according to the triggering operation on the interactive identifier, the designated frame time corresponding to the triggered interactive identifier may be obtained. Therefore, the interactive data of the appointed frame moment corresponding to the triggered interactive identification can be obtained.

For example, the preset storage device may store M pieces of interaction data, where the M pieces of interaction data respectively correspond to M designated frame moments, and the M designated frame moments correspond to M interaction identifiers. Assuming that the triggered interaction identifier is Pi, a specified frame time Ti corresponding to the interaction identifier Pi can be obtained according to the triggered interaction identifier Pi. Thereby, the interactive data of the specified frame time Ti corresponding to the obtained interactive identification Pi is acquired. Wherein i is a natural number.

The triggering operation can be a triggering operation input by a user or automatically generated by the interactive terminal.

And, the preset storage device can be placed in the on-site non-acquisition area, cloud end or terminal access side. Specifically, the preset storage device may be a data processing device, a server, or an interactive terminal in the embodiment of the present invention, or an edge node device located on the side of the interactive terminal, such as a base station, a set top box, a router, a home data center server, a hotspot device, and so on.

And S63, based on the interaction data, performing image display of the multi-angle free view angles at the appointed frame time.

In a specific implementation, an image reconstruction algorithm may be used to reconstruct images of the multi-angle freeview data of the interactive data, and then perform image presentation of the multi-angle freeview at the specified frame time.

And if the appointed frame time is one frame time, displaying the static image of the multi-angle free view angle; and if the appointed frame time corresponds to a plurality of frame times, displaying the dynamic image with the multi-angle free view angle.

By adopting the scheme, in the video playing process, the interactive data can be acquired according to the triggering operation of the interactive identification, and then multi-angle free view display is carried out, so that the user interaction experience is improved.

In a specific implementation, the multi-angle freeview data may be generated based on a plurality of frame images corresponding to the received specified frame time, where the plurality of frame images are obtained by intercepting, by a data processing device, multiple video data streams synchronously acquired by a plurality of acquisition devices in an acquisition array at the specified frame time, and the multi-angle freeview data may include pixel data, depth data, and parameter data of the plurality of frame images, where an association relationship exists between the pixel data and the depth data of each frame image.

The pixel data of the frame image may be any one of YUV data and RGB data, or may be other data capable of expressing the frame image. The depth data may include depth values that are in one-to-one correspondence with the pixel data of the frame image, or may be a partial value selected from a set of depth values that are in one-to-one correspondence with the pixel data of the frame image. The specific selection of depth data depends on the specific scenario.

In a specific implementation, the parameter data corresponding to the plurality of frame images may be obtained through a parameter matrix, where the parameter matrix may include an internal parameter matrix, an external parameter matrix, a rotation matrix, a translation matrix, and the like. Thereby, the correlation between the three-dimensional geometrical position of the specified point of the surface of the spatial object and its corresponding point in the plurality of frame images can be determined.

In the embodiment of the invention, an SFM algorithm can be adopted, and based on a parameter matrix, the obtained multiple frame images are subjected to feature extraction, feature matching and global optimization, and the obtained parameter estimation value is used as the corresponding parameter data of the multiple frame images. Specific algorithms used in the feature extraction, feature matching and global optimization processes may be found in the foregoing description.

In a specific implementation, depth data for each frame image may be determined based on the plurality of frame images. Wherein the depth data may comprise depth values corresponding to pixels of each frame of image. The distance of the acquisition point to the various points in the field may be taken as the above-mentioned depth value, which may directly reflect the geometry of the visible surface in the area to be viewed. For example, with the origin of the photographing coordinate system as the optical center, the depth value may be the distance from each point in the field to the optical center along the photographing optical axis. It will be appreciated by those skilled in the art that the distances may be relative values and that multiple frame images may be referenced to the same basis.

In an embodiment of the present invention, a binocular stereoscopic algorithm may be used to calculate depth data from each frame of image. In addition, the depth data can be obtained by analyzing and indirectly estimating the luminosity features, the brightness features and the like of the frame image.

In another embodiment of the present invention, the MVS algorithm may be used to reconstruct a frame image, and the pixels of each frame image may be matched, and the three-dimensional coordinates of each pixel may be reconstructed to obtain points with image consistency, and then the depth data of each frame image may be calculated. Or, the pixels of the selected frame images can be matched, the three-dimensional coordinates of the pixels of each selected frame image are reconstructed, the points with image consistency are obtained, and then the depth data of the corresponding frame images are calculated. The pixel data of the frame image corresponds to the calculated depth data, and the mode of selecting the frame image can be set according to specific situations, for example, the distance between the frame image of the depth data and other frame images can be calculated according to the needs, and partial frame images can be selected.

In a specific implementation, the data processing device may intercept the frame-level synchronized video frames at the specified frame time in the multi-path video data stream based on the received video frame intercept instruction.

In a specific implementation, the video frame interception instruction may include frame time information for intercepting a video frame, and the data processing device intercepts the video frame of the corresponding frame time from the multi-path video data stream according to the frame time information in the video frame interception instruction. And the data processing equipment sends the frame time information in the video frame interception instruction to the play control equipment, and the play control equipment can obtain corresponding appointed frame time according to the received frame time information and generate corresponding interaction identification according to the received frame time information.

In a specific implementation, the plurality of acquisition devices in the acquisition array are placed at different positions of the on-site acquisition area according to a preset multi-angle free view angle range, and the data processing device can be placed in the on-site non-acquisition area or the cloud.

In a specific implementation, the multi-angle free view may refer to a spatial position of a virtual view point enabling a scene to be freely switched and a view angle. For example, the multi-angle free view may be a 6-degree-of-freedom (6 DoF) view, wherein the spatial position of the virtual viewpoint may be expressed as (x, y, z), and the view may be expressed as three rotation directions

The 6-degree-of-freedom directions are taken as 6-degree-of-freedom (6 DoF) viewing angles.

And, the multi-angle free view angle range can be determined according to the requirements of application scenes.

In a specific implementation, the play control device may generate, based on the information of the frame time of the intercepted video frame from the data processing device, an interactive identifier associated with the video frame at the corresponding time in the data stream to be played. For example, after receiving a video frame interception instruction, the data processing device sends frame time information in the video frame interception instruction to the play control device. Then, the play control device may generate a corresponding interactive identifier based on the time information of each frame.

In a specific implementation, corresponding interaction data can be generated according to the objects displayed on site, the associated information of the displayed objects and the like. For example, the interaction data may further include at least one of: the method comprises the steps of on-site analysis data, information data of an acquisition object, information data of equipment associated with the acquisition object, information data of an on-site deployed object and information data of a logo displayed on site. Then, based on the interaction data, multi-angle free view angle display is performed, and richer interaction information can be displayed to a user through the multi-angle free view angle, so that user interaction experience can be further enhanced.

For example, when playing basketball game, the interactive data may include, in addition to multi-angle free view angle data, one or more of analysis data of the basketball game, information data of a player, information data of shoes worn by the player, information data of basketball, information data of a logo of a site sponsor, and the like.

In a specific implementation, in order to conveniently return to the data stream to be played after the image presentation is finished, with continued reference to fig. 6, after the step 63, the method may further include:

and S64, switching to acquire the data stream to be played from the playing control equipment in real time and playing and displaying the data stream in real time when the interaction ending signal is detected.

For example, when an interaction ending operation instruction is received, switching to a data stream to be played, which is obtained from the playing control device in real time, and performing real-time playing and displaying.

For another example, when the image display of the multi-angle free view angle at the designated frame time is detected to the last image, the method switches to the data stream to be played, which is obtained from the playing control device in real time, and performs real-time playing display.

In a specific embodiment, the displaying of the multi-angle free view image based on the interaction data in step 63 may specifically include the following steps:

And determining a virtual viewpoint according to the interactive operation, wherein the virtual viewpoint is selected from a multi-angle free view angle range, the multi-angle free view angle range is a range supporting switching view of virtual viewpoints of a region to be watched, and then displaying an image for watching the region to be watched based on the virtual viewpoint, and the image is generated based on the interactive data and the virtual viewpoint.

In a specific implementation, a virtual viewpoint path may be preset, and the virtual viewpoint path may include a plurality of virtual viewpoints. Since the virtual view points are selected from the multi-angle free view angle range, the corresponding first virtual view points can be determined according to the image view angles displayed during interactive operation, and then the images corresponding to the virtual view points can be displayed sequentially from the first virtual view points according to the preset virtual view point sequence.

In the embodiment of the invention, a DIBR algorithm can be adopted, and pixel data and depth data corresponding to the appointed frame time of the triggered interactive mark are combined and rendered according to the parameter data in the multi-angle free view angle data and the preset virtual view point path, so that the image reconstruction based on the preset virtual view point path is realized, the corresponding multi-angle free view angle video data is obtained, and further, corresponding images can be displayed in sequence from the first virtual view point according to the sequence of the preset virtual view point.

And if the specified frame time corresponds to the same frame time, the obtained multi-angle free view video data can comprise multi-angle free view space data of images ordered according to the frame time, and a static image of the multi-angle free view can be displayed; if the specified frame time corresponds to different frame times, the obtained multi-angle free view video data may include multi-angle free view spatial data and multi-angle free view time data of frame images ordered according to the frame times, and a dynamic image of the multi-angle free view may be displayed, that is, a frame image of a video frame of the multi-angle free view is displayed.

The embodiment of the invention also provides a system corresponding to the data interaction method, and in order to enable the person skilled in the art to better understand and realize the embodiment of the invention, the detailed description is given below through specific embodiments with reference to the accompanying drawings.

Referring to the schematic diagram of the data processing system shown in FIG. 7, data processing system 70 may include: acquisition array 71, data processing device 72, server 73, play control device 74, and interactive terminal 75, wherein:

the acquisition array 71 may include a plurality of acquisition devices disposed at different locations in the field acquisition area according to a preset multi-angle free view range, adapted to synchronously acquire multiple video data streams in real time and upload the video data streams to the data processing device 72 in real time;

The data processing device 72 is adapted to intercept the multiple paths of video data streams at a designated frame time according to the received video frame intercept command, obtain a plurality of frame images corresponding to the designated frame time and frame time information corresponding to the designated frame time, upload the plurality of frame images at the designated frame time and the frame time information corresponding to the designated frame time to the server 73, and send the frame time information at the designated frame time to the play control device 74;

the server 73 is adapted to receive the plurality of frame images and the frame time information uploaded by the data processing device 72, and generate interaction data for interaction based on the plurality of frame images, the interaction data including multi-angle freeview data, the interaction data being associated with the frame time information;

the play control device 74 is adapted to determine a specified frame time corresponding to the frame time information uploaded by the data processing device 72 in a data stream to be played, generate an interactive identifier associated with the specified frame time, and transmit the data stream to be played including the interactive identifier to the interactive terminal 75;

The interactive terminal 75 is adapted to play and display the video containing the interactive identifier in real time based on the received data stream to be played, and acquire the interactive data stored in the server 73 and corresponding to the designated frame moment based on the triggering operation of the interactive identifier, so as to display the multi-angle free view image.

By adopting the scheme, in the playing process, the interactive data can be acquired according to the triggering operation of the interactive identification, and then multi-angle free view display is carried out, so that the user interaction experience is improved.

In a specific implementation, the multi-angle free view may refer to a spatial position of a virtual view point enabling a scene to be freely switched and a view angle. And, the multi-angle free view angle range can be determined according to the requirements of application scenes. The multi-angle free view may be a 6-degree of freedom (6 DoF) view.

In a specific implementation, the acquisition device itself may have a function of encoding and packaging, so that original video data acquired from a corresponding angle in real time and synchronously may be encoded and packaged. And, the acquisition device may be provided with a compression function.

In a specific implementation, the server 73 is adapted to generate the multi-angle freeview data based on a plurality of received frame images corresponding to the specified frame time, where the multi-angle freeview data includes pixel data, depth data, and parameter data of the plurality of frame images, and an association exists between the pixel data and the depth data of each frame image.

In a specific implementation, the plurality of collection devices in the collection array 71 may be disposed at different locations in the on-site collection area according to a preset multi-angle free view range, the data processing device 72 may be disposed in the off-site collection area or the cloud, and the server 73 may be disposed in the off-site collection area, the cloud, or the terminal access side.

In a specific implementation, the play control device 74 is adapted to generate the interactive identifier associated with the corresponding video frame in the data stream to be played based on the frame information moment of the video frame intercepted by the data processing device 72.

In a specific implementation, the interactive terminal 75 is further adapted to switch to the data stream to be played obtained from the play control device 74 in real time and play and show in real time when detecting the end of interaction signal.

For better understanding and implementation of the embodiments of the present invention, the following details of the data processing system through specific application scenarios, as shown in fig. 8, is a schematic structural diagram of the data processing system in another application scenario in the embodiment of the present invention, which shows a basketball game playing application scenario, where the scene is a basketball field area on the left side, and the data processing system 80 may include: the system comprises an acquisition array 81, a data processing device 82, a cloud server cluster 83, a play control device 84 and an interactive terminal 85, wherein the acquisition array is composed of all acquisition devices.

The basketball rim is used as a core point of view, the core point of view is used as a circle center, and a sector area which is positioned on the same plane with the core point of view can be used as a preset multi-angle free view angle range. Accordingly, each acquisition device in the acquisition array 81 can be fan-shaped to be placed at different positions of the field acquisition area according to a preset multi-angle free view angle range, and video data streams can be synchronously acquired from corresponding angles in real time respectively.

While the data processing device 82 may be placed in an off-site acquisition area so as not to interfere with the operation of the acquisition device. The data processing device 82 may send a pull stream command to each of the acquisition devices in the acquisition array 81 via a wireless local area network. Each acquisition device in the acquisition array 81 transmits the obtained video data stream to the data processing device 82 in real time based on the streaming command sent by the data processing device 82. Wherein each acquisition device in the acquisition array 81 can transmit the obtained video data stream to the data processing device 82 in real time through the switch 87. Each acquisition device can compress the acquired original video data in real time and transmit the compressed data to the data processing device in real time so as to further save local area network transmission resources.

When the data processing device 82 receives a video frame interception instruction, it intercepts video frames at a specified frame time from the received multi-channel video data stream to obtain frame images corresponding to the multiple video frames and frame time information corresponding to the specified frame time, and uploads the multiple frame images at the specified frame time and the frame time information corresponding to the specified frame time to the cloud server cluster 83, and sends the frame time information at the specified frame time to the play control device 84. The video frame intercepting instruction can be sent manually by a user or can be automatically generated by data processing equipment.

The server may be located in the cloud and in order to be able to process data more quickly in parallel, the server 83 of the cloud may be composed of a plurality of different servers or server groups according to the difference in processing data.

For example, the cloud server cluster 83 may include: the first cloud server 831, the second cloud server 832, the third cloud server 833, and the fourth cloud server 834. The first cloud server 831 may be configured to determine corresponding parameter data of the plurality of frame images; the second cloud server 832 may be configured to determine depth data of each of the plurality of frame images; the third cloud server 833 may reconstruct a frame image of a preset virtual viewpoint path by using a DIBR algorithm based on the parameter data corresponding to the plurality of frame images, the depth data and the pixel data of the preset frame image in the plurality of frame images; the fourth cloud server 834 may be configured to generate multi-angle freeview video.

In an implementation, the multi-angle freeview video data may include: multi-angle freeview spatial data and multi-angle freeview temporal data of frame images ordered according to frame moments. The interactive data may include multi-angle freeview data, which may include pixel data and depth data of a plurality of frame images, and parameter data, and an association relationship exists between the pixel data and the depth data of each frame image.

The cloud server cluster 83 may store the interaction data according to the specified frame time information.

The play control device 84 may generate an interactive identifier associated with the specified frame time according to the frame time information uploaded by the data processing device, and transmit a data stream to be played including the interactive identifier to the interactive terminal 85.

The interactive terminal 85 can play the presentation video in real time and display the interactive identifier at the moment of corresponding video frames based on the received data stream to be played. When an interaction identifier is triggered, the interaction terminal 85 may acquire the interaction data corresponding to the designated frame time and stored in the server cluster 83 of the cloud end, so as to display the multi-angle free view image. Upon detecting the interaction end signal, the interaction terminal 85 may switch to obtain the data stream to be played from the play control device 84 in real time and play the data stream in real time.

Referring to the schematic diagram of another data processing system shown in FIG. 38, data processing system 380 may include: acquisition array 381, data processing device 382, playback control device 383, and interactive terminal 384; wherein:

the collection array 381 comprises a plurality of collection devices, and the collection devices are arranged at different positions of the field collection area according to a preset multi-angle free view angle range, and are suitable for synchronously collecting multiple paths of video data streams in real time and uploading the video data streams to the data processing device in real time;

the data processing device 382 is adapted to intercept the multiple paths of video data streams at a specified frame time according to the received video frame intercept command, obtain multiple frame images corresponding to the specified frame time and frame time information corresponding to the specified frame time, and send the frame time information of the specified frame time to the play control device 383;

the play control device 383 is adapted to determine a specified frame time corresponding to the frame time information uploaded by the data processing device 382 in a data stream to be played, generate an interactive identifier associated with the specified frame time, and transmit the data stream to be played including the interactive identifier to the interactive terminal 384;

The interactive terminal 384 is adapted to play and display the video containing the interactive identifier in real time based on the received data stream to be played, acquire a plurality of frame images corresponding to the specified frame time of the interactive identifier from the data processing device 382 based on the triggering operation of the interactive identifier, generate interactive data for interaction based on the plurality of frame images, and display the multi-angle free view image, where the interactive data includes multi-angle free view data.

In an implementation, the data processing device may be flexibly deployed according to user requirements, for example, the data processing device may be placed in an off-site collection area or cloud.

By adopting the data processing system, in the playing process, the interactive data can be acquired according to the triggering operation of the interactive identification, and then multi-angle free view display is carried out, so that the user interaction experience is improved.

The embodiment of the invention also provides a terminal corresponding to the data interaction method, and in order to enable the person skilled in the art to better understand and realize the embodiment of the invention, the embodiment is described in detail below through specific embodiments with reference to the accompanying drawings.

Referring to the schematic structure of the interactive terminal shown in fig. 9, the interactive terminal 90 may include:

A data stream obtaining unit 91, adapted to obtain, in real time, a data stream to be played from a playing control device, where the data stream to be played includes video data and an interactive identifier, and the interactive identifier is associated with a designated frame time of the data stream to be played;

the playing display unit 92 is adapted to play and display the video and the interactive identifier of the data stream to be played in real time;

an interaction data obtaining unit 93, adapted to obtain interaction data corresponding to the specified frame time in response to a triggering operation of the interaction identifier, the interaction data including multi-angle free view angle data;

an interaction display unit 94, adapted to perform image display of the multi-angle free view at the specified frame time based on the interaction data;

and the switching unit 95 is adapted to trigger switching to the data stream to be played acquired from the playing control device in real time by the data stream acquisition unit 91 and to perform real-time playing and displaying by the playing and displaying unit 92 when the interaction end signal is detected.

The interactive data can be generated by a server and transmitted to the interactive terminal, and can also be generated by the interactive terminal.

The interactive terminal can acquire the data stream to be played from the playing control equipment in real time in the process of playing the video, and can display the corresponding interactive identification at the corresponding frame moment. In a specific implementation, as shown in fig. 4, an interactive interface schematic diagram of an interactive terminal in an embodiment of the present invention is shown.

The interactive terminal 40 obtains the data stream to be played from the play control device in real time, and when the real-time play and display is performed to the 1 st frame time T1, the first interactive identifier 42 can be displayed on the progress bar 41, and when the real-time play and display is performed to the second frame time T2, the second interactive identifier 43 can be displayed on the progress bar. Wherein, the black part of the progress bar is a played part, and the white part is an unrepeated part.

The triggering operation may be a triggering operation input by a user, or may be a triggering operation automatically generated by the interactive terminal, for example, the interactive terminal may automatically initiate the triggering operation when detecting that the identifier of the multi-angle free viewpoint data frame exists. When the user manually triggers, the interaction terminal can display the interaction prompt information and then select the time information for triggering the interaction, or the interaction terminal can receive the historical time information for triggering the interaction by the user operation, wherein the historical time information can be the time information before the current playing time.

Referring to fig. 4, 7 and 9, when the system of the interactive terminal reads the corresponding interactive identifier 43 on the progress bar 41, interactive prompt information may be displayed, and when the user does not select triggering, the interactive terminal 40 may continue to read the subsequent video data, and the played portion of the progress bar 41 may continue to advance. When the user selects triggering, the interactive terminal 40 receives feedback and generates an image reconstruction instruction of a designated frame time of the corresponding interactive identifier, and sends the image reconstruction instruction to the server 73.

For example, when the user selects to trigger the current interactive identifier 43, the interactive terminal 40 generates an image reconstruction instruction of the interactive identifier 43 corresponding to the designated frame time T2 after receiving the feedback, and sends the image reconstruction instruction to the server 73. The server can send corresponding interaction data of the appointed frame time T2 according to the image reconstruction instruction.

The user may also select a trigger history interactive identifier, for example, the interactive identifier 42 displayed by the played portion 41a on the trigger progress bar, and the interactive terminal 40 generates an image reconstruction instruction of the interactive identifier 42 corresponding to the designated frame time T1 after receiving the feedback, and sends the image reconstruction instruction to the server 73. The server can send corresponding interaction data of the appointed frame time T1 according to the image reconstruction instruction. The interactive terminal 40 may perform image processing on the multi-angle freeview data of the interactive data by using an image reconstruction algorithm, and then perform image display of the multi-angle freeview at the specified frame time. If the appointed frame time is a frame time, displaying a static image with a multi-angle free view angle; and if the appointed frame time corresponds to a plurality of frame times, displaying the dynamic image with the multi-angle free view angle.

Referring to fig. 4, 38 and 9, when the system of the interactive terminal reads the corresponding interactive identifier 43 on the progress bar 41, interactive prompt information may be displayed, and when the user does not select triggering, the interactive terminal 40 may continue to read the subsequent video data, and the played portion of the progress bar 41 may continue to advance. When the user selects triggering, the interactive terminal 40 receives feedback and generates an image reconstruction instruction of a designated frame time of the corresponding interactive identifier, and sends the image reconstruction instruction to the data processing device 382.

For example, when the user selects to trigger the current interactive identifier 43, the interactive terminal 40 generates an image reconstruction instruction of the interactive identifier 43 at the corresponding designated frame time T2 after receiving the feedback, and sends the image reconstruction instruction to the data processing device. The data processing device 382 may transmit a plurality of frame images corresponding to the specified frame time T2 according to the image reconstruction instruction.

The user may also select a trigger history interactive identifier, for example, the interactive identifier 42 displayed by the played portion 41a on the trigger progress bar, and the interactive terminal 40 generates an image reconstruction instruction of the interactive identifier 42 corresponding to the designated frame time T1 after receiving the feedback, and sends the image reconstruction instruction to the data processing device. The data processing device may transmit a plurality of frame images corresponding to the specified frame time T1 according to the image reconstruction instruction.

The interactive terminal 40 may generate interactive data for interaction based on the plurality of frame images, and may perform image processing on multi-angle freeview data of the interactive data using an image reconstruction algorithm, and then perform image presentation of the multi-angle freeview at the designated frame time. If the appointed frame time is a frame time, displaying a static image with a multi-angle free view angle; and if the appointed frame time corresponds to a plurality of frame times, displaying the dynamic image with the multi-angle free view angle.

In a specific implementation, the interactive terminal of the embodiment of the present invention may be an electronic device with a touch screen function, a Virtual Reality (VR) terminal, an edge node device connected to a display, and an IoT (The Internet of Things) device with a display function.

As shown in fig. 40, in another embodiment of the present invention, an interactive interface of an interactive terminal is shown in fig. 40, where the interactive terminal is an electronic device 400 with a touch screen function, and when a corresponding interactive identifier 402 on a progress bar 401 is read, an interface of the electronic device 400 may display an interactive prompt information box 403. The user may select according to the content of the interaction prompt information box 403, when the user makes a trigger operation of selecting "yes", the electronic device 400 may generate an image reconstruction instruction of the interaction frame moment corresponding to the interaction identifier 402 after receiving feedback, and when the user makes a non-trigger operation of selecting "no", the electronic device 400 may continue to read the subsequent video data.

As shown in fig. 41, an interactive interface of another interactive terminal in the embodiment of the present invention is shown as a headset VR terminal 410, and when the corresponding interactive identifier 412 on the progress bar 411 is read, the interface of the headset VR terminal 410 may display an interactive prompt information frame 413. The user may select according to the content of the interaction prompt information box 413, when the user makes a trigger operation (e.g. nodding) of selecting "yes", the head-mounted VR terminal 410 may generate an image reconstruction instruction at the interaction frame moment corresponding to the interaction identifier 412 after receiving the feedback, and when the user makes a non-trigger operation (e.g. nodding) of selecting "no", the head-mounted VR terminal 410 may continue to read the subsequent video data.

As shown in fig. 42, in an embodiment of the present invention, an interactive interface of another interactive terminal is shown, the interactive terminal is an edge node device 421 connected to a display 420, and when the edge node device 421 reads a corresponding interactive identifier 423 on a progress bar 422, the display 420 may display an interactive prompt message box 424. The user may select according to the content of the interaction prompt information box 424, when the user makes a trigger operation of selecting "yes", the edge node device 421 may generate an image reconstruction instruction of the interaction frame moment corresponding to the interaction identifier 423 after receiving feedback, and when the user makes a non-trigger operation of selecting "no", the edge node device 421 may continue to read the subsequent video data.

In a specific implementation, the interactive terminal may establish a communication connection with at least one of the data processing device and the server, and may use a wired connection or a wireless connection.

Fig. 43 is a schematic connection diagram of an interactive terminal according to an embodiment of the present invention. The edge node device 430 establishes a wireless connection with the

interaction devices

431, 432 and 433 through the internet of things.

In a specific implementation, after the interactive terminal triggers the interactive identifier, the interactive terminal may perform image display of multiple angle free view angles at a designated frame time corresponding to the triggered interactive identifier, and determine virtual viewpoint position information based on the interactive operation, as shown in fig. 44, which is an interactive operation schematic diagram of the interactive terminal in the embodiment of the present invention, a user may perform horizontal operation or vertical operation on an interactive operation interface, and an operation track may be a straight line or a curve.

In a specific implementation, as shown in fig. 45, an interactive interface schematic diagram of another interactive terminal in an embodiment of the present invention is shown. And after the user clicks the interaction identifier, the interaction terminal acquires the interaction data of the interaction identifier at the appointed frame moment.

If the user does not take a new operation, the triggering operation is the interactive operation, and the corresponding first virtual viewpoint can be determined according to the visual angle of the displayed image during the interactive operation. If the user takes a new operation, the new operation is an interactive operation, and the corresponding first virtual viewpoint can be determined according to the visual angle of the displayed image during the interactive operation.

Then, images corresponding to the virtual viewpoints may be sequentially displayed from the first virtual viewpoint in a preset order of the virtual viewpoints. If the specified frame time corresponds to the same frame time, the obtained multi-angle free view video data can comprise multi-angle free view space data of images ordered according to the frame time, and a static image of the multi-angle free view can be displayed; if the specified frame time corresponds to different frame times, the obtained multi-angle free view video data may include multi-angle free view spatial data and multi-angle free view time data of frame images ordered according to the frame times, and a dynamic image of the multi-angle free view may be displayed, that is, a frame image of a video frame of the multi-angle free view is displayed.

In one embodiment of the present invention, reference is made to FIGS. 45 and 46. The multi-angle freeview video data obtained by the interactive terminal may include multi-angle freeview spatial data and multi-angle freeview temporal data of frame images ordered according to frame moments, the user slides horizontally to the right to generate an interactive operation, a corresponding first virtual viewpoint is determined, and since different virtual viewpoints may correspond to different multi-angle freeview spatial data and multi-angle freeview temporal data, as shown in fig. 46, the frame images displayed in the interactive interface are changed in time and space with the interactive operation, the content displayed in the frame images is changed from the athlete running in fig. 45 to the ending point line to the athlete in fig. 46 to cross the ending point line, and the viewing angle displayed in the frame images is changed from the left view to the front view with the athlete as a target object.

Similarly, fig. 45 and 47 can be obtained, the content of the frame image presentation is changed from the player of fig. 45 to the finish line, to the player of fig. 47 having crossed the finish line, and the viewing angle of the frame image presentation is changed from the left view to the right view with the player as the target object.

Similarly, fig. 45 and 48 are obtained, the user slides vertically upward to cause the interactive operation, the content of the frame image presentation is changed from the player's running to the finish line of fig. 45 to the player's having crossed the finish line of fig. 48, and the viewing angle of the frame image presentation is changed from the left view to the top view with the player as the target object.

It can be understood that different interactive operations can be obtained according to the operation of a user, and corresponding first virtual viewpoints can be determined according to the visual angles of the images displayed during the interactive operations; according to the obtained multi-angle free view video data, a static image or a dynamic image of the multi-angle free view can be displayed, and the embodiment of the invention is not limited.

In a specific implementation, the interaction data may further include at least one of: the method comprises the steps of on-site analysis data, information data of an acquisition object, information data of equipment associated with the acquisition object, information data of an on-site deployed object and information data of a logo displayed on site.

In an embodiment of the present invention, as shown in fig. 10, an interactive interface of another interactive terminal in the embodiment of the present invention is schematically shown. After the interaction identifier is triggered, the interaction terminal 100 may perform image presentation of the multi-angle free view angle at a specified frame time corresponding to the triggered interaction identifier, and may superimpose field analysis data on an image (not shown), as shown by field analysis data 101 in fig. 10.

In an embodiment of the present invention, as shown in fig. 11, an interactive interface of another interactive terminal in the embodiment of the present invention is shown. After the user triggers the interaction identifier, the interaction terminal 110 may perform image presentation of the multi-angle free view angle at the designated frame time corresponding to the triggered interaction identifier, and may superimpose the information data of the acquisition object on an image (not shown), as shown by the information data 111 of the acquisition object in fig. 11.

In an embodiment of the present invention, as shown in fig. 12, an interactive interface of another interactive terminal in the embodiment of the present invention is shown. After the user triggers the interaction identifier, the interaction terminal 120 may perform image presentation of the multi-angle free view angle at the designated frame time corresponding to the triggered interaction identifier, and may superimpose the information data of the acquisition object on the image (not shown), as shown by the information data 121-123 of the acquisition object in fig. 12.

In an embodiment of the present invention, as shown in fig. 13, an interactive interface diagram of another terminal in the embodiment of the present invention is shown. After the user triggers the interaction identifier, the interaction terminal 130 may perform image display of the multi-angle free view angle at the designated frame time corresponding to the triggered interaction identifier, and may superimpose information data of the object deployed on the site on an image (not shown), as shown by information data 131 of the file package in fig. 13.

In an embodiment of the present invention, as shown in fig. 14, an interactive interface diagram of another terminal in the embodiment of the present invention is shown. After the interaction identifier is triggered, the interaction terminal 140 may perform image display of a multi-angle free view angle at a designated frame time corresponding to the triggered interaction identifier, and may superimpose information data of a logo displayed on site on an image (not shown), as shown by logo information data 141 in fig. 14.

Therefore, the user can acquire more relevant interaction information through the interaction data, and know the watched content more deeply, comprehensively and professionally, so that the user interaction experience can be further enhanced.

Referring to the schematic structure of another interactive terminal shown in fig. 39, the interactive terminal 390 may include: processor 391, network component 392, memory 393 and display unit 394; wherein:

The processor 391 is adapted to obtain a data stream to be played in real time through the network component 392, and to obtain interactive data corresponding to a specified frame time of an interactive identifier in response to a triggering operation of the interactive identifier, wherein the data stream to be played includes video data and the interactive identifier, the interactive identifier is associated with the specified frame time of the data stream to be played, and the interactive data includes multi-angle free view data;

the memory 393 is adapted to store a data stream to be played acquired in real time;

the display unit 394 is adapted to play and display the video and the interactive identifier of the data stream to be played in real time based on the data stream to be played obtained in real time, and perform image display of the multi-angle free view angle at the designated frame time based on the interactive data.

The interactive terminal 390 may acquire the interactive data of the specified frame time from the server storing the interactive data, or may acquire a plurality of frame images corresponding to the specified frame time from the data processing device storing the frame images, and then generate corresponding interactive data.

In order to enable those skilled in the art to better understand and implement the embodiments of the present invention, a further detailed description of a processing scheme of the multi-angle freeview video image on the field side is provided below.

Referring to the flowchart of the data processing method shown in fig. 15, in an embodiment of the present invention, the method specifically may include the following steps:

s151, when the sum of code rates of compressed video data streams pre-transmitted by all the acquisition devices in the acquisition array is not larger than a preset bandwidth threshold, respectively sending a stream pulling instruction to all the acquisition devices in the acquisition array, wherein all the acquisition devices in the acquisition array are arranged at different positions of an on-site acquisition area according to a preset multi-angle free view angle range.

The multi-angle free view may refer to a spatial position of a virtual viewpoint enabling a scene to be freely switched and a view angle. And, the multi-angle free view angle range can be determined according to the requirements of application scenes.

In a specific implementation, the preset bandwidth threshold may be determined according to a transmission capability of a transmission network where each acquisition device in the acquisition array is located. For example, the upstream bandwidth of the transmission network is 1000Mbps, and the preset bandwidth value may be 1000Mbps.

S152, receiving compressed video data streams transmitted in real time by all the acquisition devices in the acquisition array based on the pull stream instruction, wherein the compressed video data streams are obtained by synchronously acquiring and compressing data from corresponding angles in real time by all the acquisition devices in the acquisition array.

In a specific implementation, the acquisition device may have a function of encoding and packaging, so that original Video data acquired from a corresponding angle in Real time can be encoded and packaged, where an encapsulation format adopted by the acquisition device may be any one of AVI, quickTime File Format, MPEG, WMV, real Video, flash Video, matroska, or may be any other encapsulation format, and an encoding format adopted by the acquisition device may be an h.261, h.263, h.264, h.265, MPEG, AVS, or may be any other encoding format. In addition, the acquisition equipment can have a compression function, the higher the compression rate is, the smaller the compressed data volume can be under the condition that the data volume is the same before compression, and the bandwidth pressure of real-time synchronous transmission can be relieved, so that the acquisition equipment can adopt technologies such as predictive coding, transform coding, entropy coding and the like to improve the compression rate of video.

By adopting the data processing method, whether the transmission bandwidths are matched is determined before the streaming, so that data transmission congestion in the streaming process can be avoided, data acquired by each acquisition device and obtained by data compression can be synchronously transmitted in real time, the processing speed of multi-angle free view video data is increased, the low-delay playing of the multi-angle free view video is realized under the condition that bandwidth resources and data processing resources are limited, and the implementation cost is reduced.

In a specific implementation, the sum of code rates of the compressed video data streams pre-transmitted by the acquisition devices in the acquisition array is not greater than a preset bandwidth threshold value by acquiring the numerical value of the parameter of each acquisition device. For example, the acquisition array may include 40 acquisition devices, where the code rate of the compressed video data stream of each acquisition device is 15Mbps, the overall code rate of the acquisition array is 15×40=600 Mbps, and if the preset bandwidth threshold is 1000Mbps, it is determined that the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is not greater than the preset bandwidth threshold. And then, respectively sending a stream pulling instruction to each acquisition device according to the IP addresses of 40 acquisition devices in the acquisition array.

In a specific implementation, in order to ensure that the values of the parameters of the acquisition devices in the acquisition array are uniform, so that the acquisition devices can synchronously acquire and compress data in real time, the values of the parameters of the acquisition devices in the acquisition array can be set before the pull-stream instructions are respectively sent to the acquisition devices in the acquisition array. Wherein, the parameters of the acquisition device may include: and acquiring parameters and compression parameters, wherein the sum of code rates of compressed video data streams obtained by real-time synchronous acquisition and data compression from corresponding angles by each acquisition device in the acquisition array according to the set numerical value of the parameters of each acquisition device is not more than a preset bandwidth threshold.

Because the acquisition parameters and the compression parameters complement each other, under the condition that the numerical value of the compression parameters is unchanged, the data size of the original video data can be reduced by setting the numerical value of the acquisition parameters, so that the time of data compression processing is shortened; under the condition that the value of the acquisition parameter is unchanged, setting the value of the compression parameter can correspondingly reduce the compressed data quantity, so that the data transmission time is shortened. As another example, setting a higher compression rate may save transmission bandwidth, and setting a lower sampling rate may also save transmission bandwidth. Accordingly, the acquisition parameters and/or compression parameters may be set according to the actual situation.

Therefore, before the streaming starts, the numerical value of the parameter of each acquisition device in the acquisition array can be set, the unification of the numerical value of the parameter of each acquisition device in the acquisition array is ensured, each acquisition device can synchronously acquire and compress data in real time from a corresponding angle, and the sum of the code rates of the obtained compressed video data streams is not more than a preset bandwidth threshold, so that network congestion can be avoided, and low-delay playing of multi-angle free view video can be realized under the condition of limited bandwidth resources.

In a specific embodiment, the acquisition parameters may include a focal length parameter, an exposure parameter, a resolution parameter, a coding rate parameter, a coding format parameter, and the like, and the compression parameters may include a compression rate parameter, a compression format parameter, and the like, and by setting values of different parameters, the value most suitable for the transmission network where each acquisition device is located is obtained.

In order to simplify the setting flow and save the setting time, before setting the values of the parameters of each acquisition device in the acquisition array, it may be determined whether the sum of the code rates of the compressed video data streams obtained by performing the acquisition and data compression on each acquisition device in the acquisition array according to the set values of the parameters is greater than a preset bandwidth threshold, and when the sum of the code rates of the obtained compressed video data streams is greater than the preset bandwidth threshold, the values of the parameters of each acquisition device in the acquisition array may be set before respectively sending the pull stream instruction to each acquisition device in the acquisition array. It can be appreciated that in the implementation, the values of the acquisition parameters and the values of the compression parameters may also be set according to imaging quality requirements such as resolution of the multi-angle free view image to be displayed.

In a specific implementation, the process from transmission to writing of the compressed video data stream obtained by each acquisition device continuously occurs, so before each acquisition device in the acquisition array sends a stream pulling instruction to each acquisition device in the acquisition array, whether the sum of code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than a preset writing speed threshold can be determined, and when the sum of code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than the preset writing speed threshold, the numerical value of the parameter of each acquisition device in the acquisition array can be set, so that each acquisition device in the acquisition array synchronously acquires and compresses data according to the set numerical value of the parameter of each acquisition device in real time from a corresponding angle, and the sum of the code rates of the compressed video data streams obtained by each acquisition device in the acquisition array is not greater than the preset writing speed threshold.

In a specific implementation, the preset writing speed threshold may be determined according to a data storage writing speed of the storage medium. For example, the upper limit of the data storage writing speed of a Solid State Disk (Solid State Disk or Solid State Drive, SSD) of the data processing device is 100Mbps, and the preset writing speed threshold may be 100Mbps.

By adopting the scheme, before the pulling of the stream is started, the sum of the code rates of the compressed video data streams obtained by the acquisition devices from the corresponding angles in real time and synchronously acquiring and compressing the data can be ensured to be not more than the preset writing speed threshold, so that the data writing congestion can be avoided, the link smoothness of the compressed video data streams in the acquisition, transmission and writing processes is ensured, the compressed video streams uploaded by the acquisition devices can be processed in real time, and further the playing of multi-angle free view video is realized.

In implementations, the compressed video data streams obtained by the respective acquisition devices may be stored. When a video frame interception instruction is received, according to the received video frame interception instruction, intercepting a video frame with synchronous frame level in each compressed video data stream, and synchronously uploading the intercepted video frame to the appointed target.

The specified target terminal can be a preset target terminal or a target terminal specified by a video frame interception instruction. And the intercepted video frames can be packaged, uploaded to the appointed target end through a network transmission protocol, and then analyzed to obtain corresponding video frames with synchronous frame levels in the compressed video data stream.

Therefore, the subsequent processing of the video frames intercepted by the compressed video data stream is carried out by the appointed target end, so that network transmission resources can be saved, the pressure and difficulty for deploying a large number of server resources on site can be reduced, the data processing load can be greatly reduced, and the transmission time delay of the multi-angle free view video frames can be shortened.

In particular implementations, to ensure that video frames that are frame-level synchronized in each compressed video data stream are truncated, as shown in fig. 16, the following steps may be included:

s161, determining one path of compressed video data stream in the compressed video data streams of all the acquisition devices in the acquisition array received in real time as a reference data stream;

s162, determining a video frame to be intercepted in the reference data stream based on the received video frame intercepting instruction, and selecting video frames in the rest compressed video data streams synchronous with the video frame to be intercepted in the reference data stream as the video frames to be intercepted of the rest compressed video data streams;

S163, intercepting the video frames to be intercepted in each compressed video data stream.

In order for those skilled in the art to better understand and implement the embodiments of the present invention, a detailed description of how to determine the video frames to be intercepted in each compressed video data stream follows from a specific application scenario.

In an embodiment of the present invention, the acquisition array may include 40 acquisition devices, so that 40 paths of compressed video data streams may be received in real time, and it is assumed that, among the compressed video data streams of each acquisition device in the acquisition array received in real time, a compressed video data stream A1 corresponding to an acquisition device A1' is determined as a reference data stream, then, based on feature information X of an object in a video frame indicated to be intercepted in a received video frame interception instruction, a video frame A1 in the reference data stream that is consistent with the feature information X of the object is determined as a video frame to be intercepted, and then, based on feature information X1 of the object in the video frame A1 to be intercepted in the reference data stream, video frames A2-a40 in the remaining compressed video data streams A2-a40 that are consistent with the feature information X1 of the object are selected as video frames to be intercepted of the remaining compressed video data streams.

Wherein the feature information of the object may include at least one of shape feature information, color feature information, position feature information, and the like. The feature information X of the object in the video frame indicated to be intercepted in the video frame intercepting instruction and the feature information X1 of the object in the video frame a1 to be intercepted in the reference data stream may be the same representation mode of the feature information of the same object, for example, the feature information X and X1 of the object are two-dimensional feature information; the feature information X of the object and the feature information X1 of the object may also be different representations of the feature information of the same object, for example, the feature information X of the object may be two-dimensional feature information and the feature information X1 of the object may be three-dimensional feature information. And, a similarity threshold may be preset, and when the similarity threshold is satisfied, the feature information X of the object may be considered to be identical to X1, or the feature information X1 of the object may be considered to be identical to the feature information X2-X40 of the objects in the remaining compressed video data streams A2-a 40.

The specific representation mode and the similarity threshold of the feature information of the object can be determined according to the preset multi-angle free view angle range and the scene on site, and the embodiment of the invention is not limited in any way.

In another embodiment of the present invention, the acquisition array may include 40 acquisition devices, so that 40 paths of compressed video data streams may be received in real time, and it is assumed that, among the compressed video data streams of each acquisition device in the acquisition array received in real time, the compressed video data stream B1 corresponding to the acquisition device B1' is determined as a reference data stream, then, based on the timestamp information Y of the video frame indicated to be intercepted in the received video frame interception instruction, the video frame B1 corresponding to the timestamp information Y in the reference data stream is determined as a video frame to be intercepted, and then, based on the timestamp information Y1 in the video frame B1 to be intercepted in the reference data stream, the video frames B2-B40 corresponding to the timestamp information Y1 in the remaining compressed video data streams B2-B40 are selected as video frames to be intercepted in the remaining compressed video data streams.

The timestamp information Y of the video frame indicated to be intercepted in the video frame intercepting instruction may have a certain error with the timestamp information Y1 in the video frame b1 to be intercepted in the reference data stream, for example, the timestamp information corresponding to the video frame in the reference data stream is inconsistent with the timestamp information Y, and an error of 0.1ms exists, and an error range may be preset, for example, the error range is ±1ms, and an error of 0.1ms is within the error range, so that the video frame b1 corresponding to the timestamp information Y1 different from the timestamp information Y by 0.1ms may be selected as the video frame to be intercepted in the reference data stream. The specific error range and the selection rule of the timestamp information y1 in the reference data stream may be determined according to the on-site acquisition device and the transmission network, which is not limited in this embodiment.

It can be understood that, in the above embodiment, the method for determining the video frames to be intercepted in each compressed video data stream may be used alone or simultaneously, which is not limited by the embodiment of the present invention.

By utilizing the data processing method, the data processing equipment can smoothly and smoothly pull the data collected by each collecting equipment and compressed by the data.

The technical scheme of data processing of the acquisition array in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification.

Referring to the flowchart of the data processing method shown in fig. 17, in an embodiment of the present invention, the method specifically may include the following steps:

s171, each acquisition device in the acquisition array, which is arranged at different positions of the field acquisition area according to the preset multi-angle free view angle range, respectively acquires original video data from corresponding angles in real time synchronously, and respectively compresses the acquired original video data in real time to obtain corresponding compressed video data streams.

And S172, when the data processing equipment connected with the acquisition array link determines that the sum of the code rates of the compressed video data streams pre-transmitted by all the acquisition equipment in the acquisition array is not greater than a preset bandwidth threshold, respectively sending a stream pulling instruction to all the acquisition equipment in the acquisition array.

In a specific implementation, the preset bandwidth threshold may be determined according to a transmission capability of a transmission network where each acquisition device in the acquisition array is located, for example, an uplink bandwidth of the transmission network is 1000Mbps, and the preset bandwidth value may be 1000Mbps.

And S173, each acquisition device in the acquisition array transmits the obtained compressed video data stream to the data processing device in real time based on the pull stream instruction.

In a specific implementation, the data processing device may be set according to an actual scenario. For example, when there is a suitable space in the field, the data processing device may be placed in an off-site acquisition area as a field server; when the field has no suitable space, the data processing equipment can be placed in the cloud as a cloud server.

By adopting the scheme, when the data processing equipment connected with the acquisition array link determines that the sum of the code rates of the compressed video data streams pre-transmitted by the acquisition equipment in the acquisition array is not greater than the preset bandwidth threshold, the data processing equipment respectively sends a stream pulling instruction to the acquisition equipment in the acquisition array, so that the data acquired by the acquisition equipment and obtained by data compression can be synchronously transmitted in real time, real-time stream pulling can be carried out through a transmission network, and data transmission congestion in the stream pulling process can be avoided; and then, each acquisition device in the acquisition array transmits the obtained compressed video data stream to the data processing device in real time based on the streaming instruction, and the bandwidth pressure of real-time synchronous transmission can be relieved due to compression of the data transmitted by each acquisition device, so that the processing speed of multi-angle free view video data is accelerated.

Therefore, a large number of servers can be prevented from being arranged on site to process data, original data collected by an SDI collecting card is not needed to be summarized, the original data is processed by a computing server in a site machine room, expensive SDI video transmission cables and SDI interfaces can be avoided, data transmission is performed through a common transmission network, low-delay playing of multi-angle free view video can be realized under the condition that bandwidth resources and data processing resources are limited, and implementation cost is reduced.

In a specific implementation, in order to simplify the setting process and save the setting time, before setting the values of the parameters of each acquisition device in the acquisition array, the data processing device may determine whether the sum of the code rates of the compressed video data streams obtained by performing acquisition and data compression on each acquisition device in the acquisition array according to the set values of the parameters is greater than a preset bandwidth threshold, and when the sum of the code rates of the obtained compressed video data streams is greater than the preset bandwidth threshold, the data processing device may set the values of the parameters of each acquisition device in the acquisition array and send a pull stream instruction to each acquisition device in the acquisition array.

In a specific implementation, the process from transmission to writing of the compressed video data stream obtained by each acquisition device continuously occurs, and it is further required to ensure that the data processing device is smooth when writing the compressed video data stream obtained by each acquisition device, so before sending a stream pulling instruction to each acquisition device in the acquisition array, the data processing device may further determine whether the sum of code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than a preset writing speed threshold, and when the sum of code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than the preset writing speed threshold, the data processing device may set a value of a parameter of each acquisition device in the acquisition array, so that the sum of code rates of the compressed video data streams obtained by real-time synchronous acquisition and data compression from corresponding angles according to the set value of the parameter of each acquisition device in the acquisition array is not greater than the preset writing speed threshold.

In a specific implementation, the preset writing speed threshold may be determined according to a data storage writing speed of the data processing apparatus.

In a specific implementation, data transmission can be performed between each acquisition device in the acquisition array and the data processing device through at least one of the following modes:

1. Data transmission is carried out through the exchanger;

the switch is used for connecting each acquisition device in the acquisition array with the data processing device, and the switch can collect and uniformly transmit the compressed video data streams of more acquisition devices to the data processing device, so that the number of ports supported by the data processing device can be reduced. For example, the switch supports 40 inputs, so that the data processing device can receive video streams of an acquisition array consisting of 40 acquisition devices at the same time through the switch, and the number of the data processing devices can be reduced.

2. And carrying out data transmission through a local area network.

And each acquisition device in the acquisition array is connected with the data processing device through a local area network, the local area network can transmit the compressed video data stream of the acquisition device to the data processing device in real time, the number of ports supported by the data processing device is reduced, and the number of the data processing devices can be further reduced.

In a specific implementation, the data processing device may store (may be a buffer) the compressed video data streams obtained by the respective acquisition devices, and when receiving a video frame interception instruction, the data processing device may intercept a video frame synchronized at a frame level in each compressed video data stream according to the received video frame interception instruction, and synchronously upload the intercepted video frame to the specified target end.

The data processing device may establish a connection with a target end through a port or an IP address in advance, or may synchronously upload the video frame obtained by interception to the port or the IP address specified by the video frame interception instruction. And the data processing device can encapsulate the intercepted video frame, upload the encapsulated video frame to the appointed target end through a network transmission protocol, and then analyze the encapsulated video frame to obtain a frame-level synchronous video frame in a corresponding compressed video data stream.

By adopting the scheme, the compressed video data streams obtained by real-time synchronous acquisition and data compression of each acquisition device in the acquisition array can be uniformly transmitted to the data processing device, after a received video frame interception instruction, the data processing device can synchronously upload the video frames which are intercepted and are synchronous at the frame level in each compressed video data stream to the appointed target end through preliminary processing of dotting and frame interception, and the follow-up processing of the video frames intercepted by the compressed video data streams is transmitted to the appointed target end, so that network transmission resources can be saved, the pressure and difficulty of on-site deployment can be reduced, the data processing load can be greatly reduced, and the transmission time delay of the multi-angle free view video frames can be shortened.

In a specific implementation, in order to intercept video frames synchronized at a frame level in each compressed video data stream, the data processing device may determine one of the compressed video data streams of each acquisition device in the acquisition array received in real time as a reference data stream, then, based on the received video frame intercept instruction, the data processing device may determine a video frame to be intercepted in the reference data stream, select video frames in the remaining compressed video data streams synchronized with the video frame to be intercepted in the reference data stream as video frames to be intercepted in the remaining compressed video data streams, and finally, intercept the video frames to be intercepted in each compressed video data stream by the data processing device. The specific frame cutting method may refer to the examples of the foregoing embodiments, and will not be described herein.

The embodiment of the present invention further provides a data processing device corresponding to the data processing method in the above embodiment, and in order to enable those skilled in the art to better understand and implement the embodiment of the present invention, the following detailed description is provided by specific embodiments with reference to the accompanying drawings.

Referring to a schematic structural diagram of a data processing apparatus shown in fig. 18, in an embodiment of the present invention, as shown in fig. 18, a data processing apparatus 180 may include:

The first transmission matching unit 181 is adapted to determine whether a sum of code rates of compressed video data streams pre-transmitted by each acquisition device in the acquisition array is not greater than a preset bandwidth threshold, wherein each acquisition device in the acquisition array is placed at different positions of a field acquisition area according to a preset multi-angle free view angle range.

The instruction sending unit 182 is adapted to send a pull stream instruction to each acquisition device in the acquisition array when it is determined that the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is not greater than a preset bandwidth threshold.

The data stream receiving unit 183 is adapted to receive a compressed video data stream transmitted in real time by each collecting device in the collecting array based on the pull stream command, where the compressed video data stream is obtained by respectively and synchronously collecting and compressing data in real time from a corresponding angle by each collecting device in the collecting array.

By adopting the data processing equipment, before the data processing equipment sends a streaming instruction to each acquisition equipment in the acquisition array, whether the transmission bandwidth is matched or not is determined, so that data transmission congestion in the streaming process can be avoided, data acquired by each acquisition equipment and obtained by data compression can be synchronously transmitted in real time, the processing speed of multi-angle free view video data is accelerated, multi-angle free view video is realized under the condition that bandwidth resources and data processing resources are limited, and the implementation cost is reduced.

In one embodiment of the present invention, as shown in fig. 18, the data processing device 180 may further include:

a first parameter setting unit 184, adapted to set values of parameters of each acquisition device in the acquisition array before sending a pull-stream instruction to each acquisition device in the acquisition array, respectively;

wherein, the parameters of the acquisition device may include: and acquiring parameters and compression parameters, wherein the sum of code rates of compressed video data streams obtained by real-time synchronous acquisition and data compression from corresponding angles by each acquisition device in the acquisition array according to the set numerical value of the parameters of each acquisition device is not more than a preset bandwidth threshold.

In one embodiment of the present invention, in order to simplify the setup process and save setup time, as shown in fig. 18, the data processing apparatus 180 may further include:

the second transmission matching unit 185 is adapted to determine, before setting the value of the parameter of each acquisition device in the acquisition array, whether the sum of the code rates of the compressed video data streams obtained by performing acquisition and data compression by each acquisition device in the acquisition array according to the set value of the parameter is not greater than a preset bandwidth threshold.

The writing matching unit 186 is adapted to determine whether the sum of code rates of compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than a preset writing speed threshold;

and the second parameter setting unit 187 is adapted to set the value of the parameter of each acquisition device in the acquisition array when the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than a preset writing speed threshold, so that the sum of the code rates of the compressed video data streams obtained by real-time synchronous acquisition and data compression from corresponding angles according to the set value of the parameter of each acquisition device in the acquisition array is not greater than the preset writing speed threshold.

Therefore, before the pulling of the stream is started, the sum of the code rates of the compressed video data streams obtained by the acquisition devices from the corresponding angles in real time and synchronously acquiring and compressing the data can be ensured to be not more than the preset writing speed threshold, so that the data writing congestion can be avoided, the link smoothness of the compressed video data streams in the acquisition, transmission and writing processes is ensured, the compressed video streams uploaded by the acquisition devices can be processed in real time, and further the playing of multi-angle free view video is realized.

the frame-cutting processing unit 188 is adapted to cut out the video frames synchronized at the frame level in each compressed video data stream according to the received video frame cutting instruction;

and an uploading unit 189, adapted to synchronously upload the intercepted video frames to the specified target end.

The specified target terminal can be a preset target terminal or a target terminal specified by a video frame interception instruction.

Therefore, the subsequent processing of the video frames intercepted by the compressed video data stream is carried out by the appointed target end, so that network transmission resources can be saved, the pressure and difficulty of on-site deployment are reduced, the data processing load can be greatly reduced, and the transmission delay of the multi-angle free view video frames is shortened.

In an embodiment of the present invention, as shown in fig. 18, the frame-cutting processing unit 188 may include:

a reference data stream selection subunit 1881, adapted to determine one of the compressed video data streams of each acquisition device in the acquisition array received in real time as a reference data stream;

a video frame selection subunit 1882, adapted to determine a video frame to be intercepted in the reference data stream based on the received video frame interception instruction, and select video frames in each of the other compressed video data streams synchronized with the video frame to be intercepted in the reference data stream as video frames to be intercepted in each of the other compressed video data streams;

The video frame interception subunit 1883 is adapted to intercept video frames to be intercepted in each compressed video data stream.

In an embodiment of the present invention, as shown in fig. 18, the video frame selection subunit 1882 may include at least one of the following:

the first video frame selecting module 18821 is adapted to select, according to the feature information of the object in the video frames to be intercepted in the reference data stream, the video frames, which are consistent with the feature information of the object, in the other compressed video data streams as the video frames to be intercepted in the other compressed video data streams;

the second video frame selection module 18822 is adapted to select, according to the timestamp information of the video frame to be intercepted in the reference data stream, the video frame in the other compressed video data streams, which is consistent with the timestamp information, as the video frame to be intercepted of the other compressed video data streams.

The embodiment of the invention also provides a data processing system corresponding to the data processing method, the data processing device is adopted to realize real-time receiving of the multipath compressed video data stream, and in order to enable the person skilled in the art to better understand and realize the embodiment of the invention, the detailed description is given below through the specific embodiment with reference to the accompanying drawings.

Referring to the schematic diagram of the data processing system shown in FIG. 19, in an embodiment of the present invention, data processing system 190 may comprise: acquisition array 191 and data processing device 192, acquisition array 191 includes a plurality of acquisition devices of setting up the regional different positions of scene acquisition according to the multi-angle free view scope of predetermineeing, wherein:

each acquisition device in the acquisition array 191 is adapted to respectively and synchronously acquire original video data from a corresponding angle in real time, respectively perform real-time data compression on the acquired original image data to obtain a compressed video data stream synchronously acquired from the corresponding angle in real time, and transmit the obtained compressed video data stream to the data processing device 192 in real time based on a streaming instruction sent by the data processing device 192;

the data processing device 192 is adapted to send a stream pulling instruction to each acquisition device in the acquisition array 191 and receive a compressed video data stream transmitted in real time by each acquisition device in the acquisition array 191 when it is determined that the sum of code rates of compressed video data streams pre-transmitted by each acquisition device in the acquisition array is not greater than a preset bandwidth threshold.

By adopting the scheme, a large number of servers can be prevented from being arranged on site to process data, the original data collected by the SDI collecting card is not required to be summarized, the original data is processed by the computing server in the site machine room, expensive SDI video transmission cables and SDI interfaces can be avoided, data transmission and streaming are carried out by a common transmission network, low-delay playing of multi-angle free view video is realized under the condition that bandwidth resources and data processing resources are limited, and implementation cost is reduced.

In an embodiment of the present invention, the data processing device 192 is further adapted to set a value of a parameter of each of the acquisition devices in the acquisition array 191 before sending a pull stream command to each of the acquisition devices in the acquisition array;

wherein the parameters of the acquisition device include: and acquiring parameters and compression parameters, wherein the sum of code rates of compressed video data streams obtained by real-time synchronous acquisition and data compression from corresponding angles by each acquisition device in the acquisition array according to the set numerical value of the parameters of each acquisition device is not more than a preset bandwidth threshold.

Therefore, before the streaming starts, the data processing equipment can set the numerical value of the parameter of each acquisition equipment in the acquisition array, ensure that the numerical value of the parameter of each acquisition equipment in the acquisition array is uniform, each acquisition equipment can synchronously acquire and compress data in real time from a corresponding angle, and the sum of the code rates of the obtained compressed video data streams is not more than a preset bandwidth threshold, so that network congestion can be avoided, and low-delay playing of multi-angle free view video can be realized under the condition of limited bandwidth resources.

In an embodiment of the present invention, before sending a stream pulling instruction to each of the collection devices in the collection array 191, the data processing device 192 determines whether the sum of the code rates of the compressed video data streams pre-transmitted by each of the collection devices in the collection array 191 is greater than a preset writing speed threshold, and sets the value of the parameter of each of the collection devices in the collection array 191 when the sum of the code rates of the compressed video data streams pre-transmitted by each of the collection devices in the collection array 191 is greater than the preset writing speed threshold, so that each of the collection devices in the collection array 192 synchronously collects and compresses data in real time from a corresponding angle according to the set value of the parameter of each of the collection devices.

Therefore, before the pulling of the stream is started, the sum of the code rates of the compressed video data streams obtained by the acquisition devices from the corresponding angles in real time and synchronously acquiring and compressing the data can be ensured to be not more than the preset writing speed threshold, so that the data writing congestion of the data processing device can be avoided, the link smoothness of the compressed video data streams in the acquisition, transmission and writing processes is ensured, the compressed video streams uploaded by the acquisition devices can be processed in real time, and further the playing of multi-angle free view video is realized.

In a specific implementation, each acquisition device in the acquisition array and the data processing device are adapted to be connected through a switch and/or a local area network.

In one embodiment of the present invention, the data processing system 190 may further include a designated target 193.

The data processing device 192 is adapted to intercept video frames synchronized at a frame level in each compressed video stream according to the received video frame intercept command, and synchronously upload the intercepted video frames to the designated target 193;

the designated target 193 is adapted to receive the video frames intercepted by the data processing device 192 based on the video frame interception instruction.

The data processing device may establish a connection with a target end through a port or an IP address in advance, or may synchronously upload the video frame obtained by interception to the port or the IP address specified by the video frame interception instruction.

By adopting the scheme, the compressed video data streams obtained by real-time synchronous acquisition and data compression of each acquisition device in the acquisition array can be uniformly transmitted to the data processing device, after a received video frame interception instruction, the data processing device can synchronously upload the video frames which are intercepted and are synchronous at the frame level in each compressed video data stream to the appointed target end through preliminary processing of dotting and frame interception, and the follow-up processing of the video frames intercepted by the compressed video data streams is carried out by the appointed target end, so that network transmission resources can be saved, the pressure and difficulty of on-site deployment can be reduced, the data processing load can be greatly reduced, and the transmission time delay of the multi-angle free view video frames can be shortened.

In an embodiment of the present invention, the data processing device 192 is adapted to determine one of the compressed video data streams of each of the acquisition devices in the acquisition array 191 received in real time as a reference data stream; determining a video frame to be intercepted in the reference data stream based on the received video frame intercepting instruction, and selecting video frames in the rest compressed video data streams synchronous with the video frame to be intercepted in the reference data stream as the video frames to be intercepted of the rest compressed video data streams; finally, the video frames to be intercepted in each compressed video data stream are intercepted.

In order for those skilled in the art to better understand and practice embodiments of the present invention, a frame synchronization scheme between a data processing device and an acquisition device is described in detail below with respect to specific embodiments.

Referring to the flowchart of the data synchronization method shown in fig. 20, in an embodiment of the present invention, the method specifically may include the following steps:

s201, respectively sending a stream pulling instruction to each acquisition device in an acquisition array, wherein each acquisition device in the acquisition array is arranged at different positions of a field acquisition area according to a preset multi-angle free view angle range, and each acquisition device in the acquisition array respectively acquires video data streams from corresponding angles in real time and synchronously.

In particular implementations, there are a number of implementations for achieving pull flow synchronization. For example, a pull stream instruction can be simultaneously sent to each acquisition device in the acquisition array; or, the method may just send a pulling instruction to the master acquisition device in the acquisition array to trigger the pulling of the master acquisition device, and then the master acquisition device synchronizes the pulling instruction to all the slave acquisition devices to trigger the pulling of all the slave acquisition devices.

S202, video data streams respectively transmitted by all the acquisition devices in the acquisition array based on the streaming instruction are received in real time, and whether frame-level synchronization is carried out between the video data streams respectively transmitted by all the acquisition devices in the acquisition array is determined.

In a specific implementation, the acquisition device itself may have a function of encoding and packaging, so that original video data acquired from a corresponding angle in real time and synchronously may be encoded and packaged. And each acquisition device can also have a compression function, the higher the compression rate is, the smaller the compressed data volume can be under the condition of the same data volume before compression, and the bandwidth pressure of real-time synchronous transmission can be relieved, so that the acquisition device can adopt the technologies of predictive coding, transform coding, entropy coding and the like to improve the compression rate of video.

S203, when the video data streams respectively transmitted by the acquisition devices in the acquisition array are not in frame-level synchronization, respectively transmitting a stream pulling instruction to the acquisition devices in the acquisition array again until the video data streams respectively transmitted by the acquisition devices in the acquisition array are in frame-level synchronization.

By adopting the data synchronization method, whether frame-level synchronization exists between video data streams respectively transmitted by all the acquisition devices in the acquisition array or not is determined, and multi-path data synchronous transmission can be ensured, so that the transmission problems of frame missing and multi-frame can be avoided, the data processing speed is improved, and the requirement of low-delay playing of multi-angle free view video is further met.

In a specific implementation, when each acquisition device in the acquisition array is started manually, there is a start time error, and it is possible that the acquisition of the video data stream does not start at the same time. Thus, at least one of the following means may be employed to ensure that each acquisition device in the acquisition array respectively acquires the video data stream from the corresponding angle in real time and synchronously:

1. when at least one acquisition device acquires an acquisition starting instruction, the acquisition device acquiring the acquisition starting instruction synchronizes the acquisition starting instruction to other acquisition devices, so that each acquisition device in the acquisition array starts to synchronously acquire video data streams from corresponding angles in real time based on the acquisition starting instruction.

For example, the acquisition array may include 40 acquisition devices, when the acquisition device A1 acquires the acquisition start instruction, the acquisition device A1 synchronously sends the acquired acquisition start instruction to the other acquisition devices A2-a40, and after all the acquisition devices receive the acquisition start instruction, each acquisition device starts to synchronously acquire the video data stream from the corresponding angle in real time based on the acquisition start instruction. Because the data transmission speed between the acquisition devices is far faster than the manual starting speed, the starting time error generated by the manual starting can be reduced.

2. And each acquisition device in the acquisition array synchronously acquires video data streams from corresponding angles in real time based on preset clock synchronous signals.

For example, a clock signal synchronization device may be provided, and each acquisition device may be respectively connected to the clock signal synchronization device, and when the clock signal synchronization device receives a trigger signal (e.g., a synchronous acquisition start instruction), the clock signal synchronization device may transmit a clock synchronization signal to each acquisition device, and each acquisition device starts to synchronously acquire the video data stream from a corresponding angle in real time based on the clock synchronization signal. Because the clock signal transmitting device can transmit the clock synchronization signal to each acquisition device based on the preset trigger signal, each acquisition device can acquire synchronously and is not easy to be interfered by external conditions and manual operation, and therefore the synchronization precision and the synchronization efficiency of each acquisition device can be improved.

In a specific implementation, due to the influence of the network transmission environment, each acquisition device in the acquisition array may not receive the pull stream command at the same time, and there may be a time difference of several milliseconds or less between each acquisition device, so that the video data streams transmitted in real time by each acquisition device are not synchronous, as shown in fig. 21, the acquisition array includes the acquisition devices 1 and 2, the acquisition parameters of the acquisition devices 1 and 2 are set to be the same, wherein the acquisition frame rate is X fps, and the video frames acquired by the acquisition devices 1 and 2 are synchronously acquired at the frame level.

The acquisition interval T of each frame in the acquisition devices 1 and 2 is

Assuming that the data processing device sends a streaming command r at the time T0, the acquisition device 1 receives the streaming command r at the time T1, and the acquisition device 2 receives the streaming command r at the time T2, if the acquisition devices 1 and 2 both receive the streaming command at the same acquisition interval T, the acquisition devices 1 and 2 can be considered to receive the streaming command at the same time, and the acquisition devices 1 and 2 can respectively transmit frame-level synchronous video data streams; if the acquisition devices 1 and 2 do not receive within the same acquisition interval, it can be considered that the acquisition devices 1 and 2 do not receive the pull at the same timeThe acquisition devices 1 and 2 cannot realize synchronous transmission of the frame-level video data stream in the stream command. Frame level synchronization of video data streaming may also be referred to as pull stream synchronization. Once the pull synchronization is achieved, the pull will automatically continue until stopped.

The reason why a frame-level synchronized video data stream cannot be transmitted may be:

1) A stream pulling instruction needs to be sent to each acquisition device respectively;

2) The local area network has delay in transmitting the pull stream command.

Thus, it may be determined whether or not frame-level synchronization is between video data streams respectively transmitted by respective acquisition devices in the acquisition array in at least one of the following ways:

1. The characteristic information of the object of the N frame of each video data stream can be matched when the N frame of the video data stream respectively transmitted by each acquisition device in the acquisition array is acquired, and when the characteristic information of the object of the N frame of each video data stream meets a preset similarity threshold, the characteristic information of the object of the N frame of the video data stream respectively transmitted by each acquisition device in the acquisition array is determined to be consistent, so that the frame level synchronization among the video data streams respectively transmitted by each acquisition device is realized.

Wherein N is an integer not less than 1, and the feature information of the object of the nth frame of each video data stream may include at least one of shape feature information, color feature information, and position feature information.

2. The time stamp information of the nth frame of each video data stream can be matched when the nth frame of the video data stream respectively transmitted by each acquisition device in the acquisition array is acquired, wherein N is an integer not less than 1. When the time stamp information of the N frames of the video data streams is consistent, frame-level synchronization between the video data streams respectively transmitted by the acquisition devices is determined.

And when the video data streams respectively transmitted by the acquisition devices in the acquisition array are not in frame-level synchronization, respectively transmitting a stream pulling instruction to the acquisition devices in the acquisition array again, and determining whether to perform frame-level synchronization or not by adopting at least one mode until the video data streams respectively transmitted by the acquisition devices in the acquisition array are in frame-level synchronization.

In a specific implementation, video frames in the video data stream of each acquisition device may be intercepted and transmitted to a designated target end, and in order to ensure frame level synchronization of the intercepted video frames, as shown in fig. 22, the following steps may be included:

s221, determining one path of video data stream in the video data streams of all the acquisition devices in the acquisition array received in real time as a reference data stream.

S222, determining video frames to be intercepted in the reference data stream based on the received video frame intercepting instruction, and selecting video frames in other video data streams synchronous with the video frames to be intercepted in the reference data stream as video frames to be intercepted in other video data streams.

S223, intercepting video frames to be intercepted in each video data stream.

S224, synchronously uploading the intercepted video frames to the appointed target terminal.

By adopting the scheme, frame cutting synchronization can be realized, frame cutting efficiency is improved, the display effect of the generated multi-angle free view video is further improved, and user experience is enhanced. In addition, the coupling between the process of selecting and intercepting the video frames and the process of generating the multi-angle free view video can be reduced, the independence among the processes is enhanced, the later maintenance is convenient, the intercepted video frames are synchronously uploaded to the appointed target end, the network transmission resources can be saved, the data processing load can be reduced, and the speed of generating the multi-angle free view video by data processing is improved.

In order for those skilled in the art to better understand and implement embodiments of the present invention, how to determine video frames to be intercepted in each video data stream is described in detail below by way of specific application examples.

One way is to select the video frames, which are consistent with the characteristic information of the object, in the rest video data streams as the video frames to be intercepted of the rest video data streams according to the characteristic information of the object in the video frames to be intercepted in the reference data streams.

For example, the acquisition array includes 40 acquisition devices, so that 40 paths of video data streams can be received in real time, it is assumed that among the video data streams of each acquisition device in the acquisition array received in real time, the video data stream A1 corresponding to the acquisition device A1' is determined as a reference data stream, then, based on the characteristic information X of an object in the video frames indicated to be intercepted in the received video frame interception instruction, the video frame A1 consistent with the characteristic information X of the object in the reference data stream is determined as a video frame to be intercepted, and then, based on the characteristic information X1 of the object in the video frame A1 to be intercepted in the reference data stream, the video frames A2-a40 consistent with the characteristic information X1 of the object in each of the rest of the video data streams A2-a40 are selected as video frames to be intercepted of each of the rest of the video data streams.

Wherein, the characteristic information of the object can comprise shape characteristic information, color characteristic information, position characteristic information and the like; the feature information X of the object in the video frame indicated to be intercepted in the video frame intercepting instruction and the feature information X1 of the object in the video frame a1 to be intercepted in the reference data stream may be the same representation mode of the feature information of the same object, for example, the feature information X and X1 of the object are two-dimensional feature information; the feature information X of the object and the feature information X1 of the object may also be different representations of the feature information of the same object, for example, the feature information X of the object may be two-dimensional feature information and the feature information X1 of the object may be three-dimensional feature information. And, a similarity threshold may be preset, and when the similarity threshold is satisfied, the feature information X of the object may be considered to be consistent with X1, or the feature information X1 of the object may be considered to be consistent with the feature information X2-X40 of the objects in the remaining video data streams A2-a 40.

The specific representation mode and the similarity threshold of the feature information of the object can be determined according to the preset multi-angle free view angle range and the scene on site, and the embodiment is not limited.

In another mode, according to the time stamp information of the video frames in the reference data stream, selecting the video frames which are consistent with the time stamp information in the rest video data streams as the video frames to be intercepted of the rest video data streams.

For example, 40 acquisition devices may be included in the acquisition array, so that 40 paths of video data streams may be received in real time, and it is assumed that, among the video data streams received in real time for each acquisition device in the acquisition array, the video data stream B1 corresponding to the acquisition device B1 is determined as a reference data stream, then, based on the timestamp information Y of the video frame indicated to be intercepted in the received video frame interception instruction, the video frame B1 corresponding to the timestamp information Y in the reference data stream is determined as a video frame to be intercepted, and then, based on the timestamp information Y1 in the video frame B1 to be intercepted in the reference data stream, the video frames B2-B40 corresponding to the timestamp information Y1 in the remaining video data streams B2-B40 are selected as video frames to be intercepted in the remaining video data streams.

It can be understood that, in the above embodiment, the method for determining the video frames to be intercepted in each video data stream may be used alone or simultaneously, which is not limited by the embodiment of the present invention.

By adopting the scheme, the efficiency of synchronous selection and synchronous interception of the video frames and the accuracy of the result can be improved, so that the integrity and the synchronism of the transmission data can be improved.

The embodiment of the invention also provides a data processing device corresponding to the data processing method, and in order to enable a person skilled in the art to better understand and realize the embodiment of the invention, the detailed description is made below by means of specific embodiments with reference to the accompanying drawings.

Referring to a schematic structural diagram of the data processing apparatus shown in fig. 23, in an embodiment of the present invention, as shown in fig. 23, the data processing apparatus 230 may include:

the instruction sending unit 231 is adapted to send a pull-stream instruction to each acquisition device in the acquisition array, where each acquisition device in the acquisition array is placed at different positions in the field acquisition area according to a preset multi-angle free view angle range, and each acquisition device in the acquisition array is respectively configured to synchronously acquire video data streams from corresponding angles in real time;

the data stream receiving unit 232 is adapted to receive, in real time, video data streams respectively transmitted by each acquisition device in the acquisition array based on the pull stream instruction;

The first synchronization judging unit 233 is adapted to determine whether the video data streams respectively transmitted by the collecting devices in the collecting array are in frame level synchronization, and re-trigger the instruction sending unit 231 until the video data streams respectively transmitted by the collecting devices in the collecting array are in frame level synchronization when the video data streams respectively transmitted by the collecting devices in the collecting array are not in frame level synchronization.

Wherein the data processing device may be set according to an actual scenario. For example, when there is room available on site, the data processing device may be placed in an off-site acquisition area as a site server; when the site has no free space, the data processing device can be placed in the cloud as a cloud server.

By adopting the data processing equipment, whether frame-level synchronization exists between video data streams respectively transmitted by all the acquisition equipment in the acquisition array or not is determined, and multi-path data synchronous transmission can be ensured, so that the transmission problems of frame missing and multi-frame can be avoided, the data processing speed is improved, and the requirement of low-delay playing of multi-angle free view video is further met.

In an embodiment of the present invention, as shown in fig. 23, the data processing device 230 may further include:

A reference video stream determining unit 234, adapted to determine one of the video data streams of each acquisition device in the acquisition array received in real time as a reference data stream;

a video frame selection unit 235, adapted to determine a video frame to be intercepted in the reference data stream based on the received video frame interception instruction, and select video frames in the remaining video data streams synchronized with the video frame to be intercepted in the reference data stream as video frames to be intercepted in the remaining video data streams;

a video frame capturing unit 236 adapted to capture video frames to be captured in each video data stream;

and the uploading unit 237 is adapted to synchronously upload the intercepted video frames to the designated target end.

The data processing device 230 may establish a connection with a target end through a port or an IP address in advance, or may synchronously upload the video frame obtained by interception to the port or the IP address specified by the video frame interception instruction.

By adopting the scheme, frame cutting synchronization can be realized, frame cutting efficiency is improved, the display effect of the generated multi-angle free view video is further improved, and user experience is enhanced. And the coupling between the process of selecting and intercepting the video frames and the process of generating the multi-angle free view video is reduced, the independence among the processes is enhanced, the later maintenance is convenient, the intercepted video frames are synchronously uploaded to the appointed target end, the network transmission resources can be saved, the data processing load can be reduced, and the speed of generating the multi-angle free view video by data processing is improved.

In an embodiment of the present invention, as shown in fig. 23, the video frame selection unit 235 includes at least one of the following:

the first video frame selection module 2351 is adapted to select, according to the feature information of the object in the video frames to be intercepted in the reference data stream, a video frame in each of the other video data streams, which is consistent with the feature information of the object, as the video frame to be intercepted in each of the other video data streams;

the second video frame selection module 2352 is adapted to select, according to the timestamp information of the video frames in the reference data stream, the video frames in the remaining video data streams that are consistent with the timestamp information as the video frames to be intercepted of the remaining video data streams.

The embodiment of the invention also provides a data synchronization system corresponding to the data processing method, the data processing equipment is adopted to realize real-time receiving of the multipath video data stream, and in order to enable the person skilled in the art to better understand and realize the embodiment of the invention, the detailed description is given below through the specific embodiment with reference to the accompanying drawings.

Referring to the schematic structure of the data synchronization system shown in fig. 24, in an embodiment of the present invention, the data synchronization system 240 may include: the system comprises an acquisition array 241 arranged in a field acquisition area and a data processing device 242 connected with the acquisition array in a link, wherein the acquisition array 241 comprises a plurality of acquisition devices, and each acquisition device in the acquisition array 241 is arranged at different positions of the field acquisition area according to a preset multi-angle free view angle range, wherein:

each acquisition device in the acquisition array 241 is adapted to synchronously acquire video data streams from corresponding angles in real time, and transmit the acquired video data streams to the data processing device 242 in real time based on a streaming instruction sent by the data processing device 242;

the data processing device 242 is adapted to send a streaming instruction to each of the collection devices in the collection array 241, receive, in real time, video data streams respectively transmitted by each of the collection devices in the collection array 241 based on the streaming instruction, and re-send the streaming instruction to each of the collection devices in the collection array 241 when frame-level synchronization is not performed between the video data streams respectively transmitted by each of the collection devices in the collection array 241, until frame-level synchronization is performed between the video data streams transmitted by each of the collection devices in the collection array 241.

By adopting the data synchronization system in the embodiment of the invention, whether frame-level synchronization exists among video data streams respectively transmitted by all the acquisition devices in the acquisition array is determined, so that multi-path data synchronous transmission can be ensured, the transmission problems of frame missing and multi-frame can be avoided, the data processing speed is improved, and the requirement of low-delay playing of multi-angle free view video is further met.

In a specific implementation, the data processing device 242 is further adapted to determine one of the video data streams of each of the acquisition devices in the acquisition array 241 received in real time as a reference data stream; determining a video frame to be intercepted in the reference data stream based on the received video frame intercepting instruction, and selecting video frames in other video data streams synchronous with the video frame to be intercepted in the reference data stream as the video frames to be intercepted of the other video data streams; intercepting video frames to be intercepted in each video data stream and synchronously uploading the video frames obtained by interception to the appointed target end.

The data processing device 240 may establish a connection with a target terminal through a port or an IP address in advance, or may synchronously upload the video frame obtained by interception to the port or the IP address specified by the video frame interception instruction.

In an embodiment of the present invention, the data synchronization system 240 may further include a cloud server, which is adapted to serve as a specified target.

In another embodiment of the present invention, as shown in fig. 34, the data synchronization system 240 may further include a play control device 341 adapted to serve as a designated target.

In yet another embodiment of the present invention, as shown in fig. 35, the data synchronization system 240 may further include an interactive terminal 351 adapted to be a designated target.

In an embodiment of the present invention, at least one of the following manners may be adopted to ensure that each acquisition device in the acquisition array 241 respectively acquires the video data stream synchronously from the corresponding angle in real time:

1. each acquisition device in the acquisition array is connected through a synchronous line, wherein when at least one acquisition device acquires an acquisition starting instruction, the acquisition device acquiring the acquisition starting instruction synchronizes the acquisition starting instruction to other acquisition devices through the synchronous line, so that each acquisition device in the acquisition array starts to synchronously acquire video data streams from corresponding angles in real time based on the acquisition starting instruction;

For better understanding and implementing the embodiments of the present invention by those skilled in the art, a detailed description of a data synchronization system is provided below through a specific application scenario, as shown in fig. 25, where the data synchronization system includes an acquisition array 251 formed by each acquisition device, a data processing device 252, and a cloud server cluster 253.

At least one acquisition device in each acquisition device in the acquisition array 251 acquires an acquisition start instruction, and synchronizes the acquired acquisition start instruction to other acquisition devices through a synchronization line 254, so that each acquisition device in the acquisition array starts to acquire a video data stream from a corresponding angle in real time and synchronously based on the acquisition start instruction.

The data processing device 252 may send a pull stream command to each of the collection devices in the collection array 251 through a wireless local area network. Each acquisition device in the acquisition array 251 transmits the obtained video data stream to the data processing device 252 in real time through the switch 255 based on the pull stream command sent by the data processing device 252.

The data processing device 252 determines whether the video data streams respectively transmitted by the collecting devices in the collecting array 251 are in frame level synchronization, and when the video data streams respectively transmitted by the collecting devices in the collecting array 251 are not in frame level synchronization, it re-sends a stream pulling instruction to each collecting device in the collecting array 251 until the video data streams respectively transmitted by the collecting devices in the collecting array 251 are in frame level synchronization.

After determining the frame-level synchronization between the video data streams transmitted by the collecting devices in the collecting array 251, the data processing device 252 determines one of the video data streams of the collecting devices in the collecting array 251 received in real time as a reference data stream, determines a video frame to be intercepted in the reference data stream according to the video frame intercepting instruction after receiving the video frame intercepting instruction, then, the data processing device 252 selects the video frames in the rest video data streams synchronized with the video frame to be intercepted in the reference data stream as the video frames to be intercepted in the rest video data streams, intercepts the video frames to be intercepted in the rest video data streams, and synchronously uploads the intercepted video frames to the cloud.

The cloud server cluster 253 performs subsequent processing on the intercepted video frame to obtain a multi-angle free view video for playing.

In an implementation, the cloud server cluster 253 may include: the first cloud server 2531, the second cloud server 2532, the third cloud server 2533, and the fourth cloud server 2534. Wherein, the first cloud server 2531 may be used for parameter calculation; the second cloud server 2532 may be configured to perform depth calculation to generate a depth map; the third cloud server 2533 may be configured to perform frame image reconstruction on a preset virtual viewpoint path by using DIBR; the fourth cloud server 2534 may be configured to generate multi-angle freeview video.

It can be understood that the data processing device may be disposed in an on-site non-acquisition area or disposed in a cloud according to an actual situation, where the data synchronization system may use at least one of a cloud server, a play control device, or an interactive terminal as a transmitting end of a video frame capturing instruction, or may use other devices capable of transmitting the video frame capturing instruction, which is not limited by the embodiment of the present invention.

It should be noted that, the data processing system and the like in the foregoing embodiments may be applied to the data synchronization system in the embodiment of the present invention.

The embodiment of the invention also provides the acquisition equipment corresponding to the data processing method, and the acquisition equipment is suitable for synchronizing the acquisition start instruction to other acquisition equipment when acquiring the acquisition start instruction, starting to synchronously acquire the video data stream from a corresponding angle in real time, and transmitting the acquired video data stream to the data processing equipment in real time when receiving the stream pulling instruction sent by the data processing equipment. In order that those skilled in the art will better understand and practice the embodiments of the present invention, a detailed description of specific embodiments will be provided with reference to the accompanying drawings.

Referring to the schematic structural diagram of the collection device shown in fig. 36, in an embodiment of the present invention, the collection device 360 includes: photoelectric conversion imaging module 361, processor 362, encoder 363, transmission unit 365, wherein:

a photoelectric conversion camera assembly 361 adapted to collect an image;

the processor 362 is adapted to synchronize the acquisition start instruction to other acquisition devices through the transmission component 365 when the acquisition start instruction is acquired, and start to process the image acquired by the photoelectric conversion camera component 361 in real time to obtain an image data sequence, and transmit the obtained video data stream to the data processing device through the transmission component 365 in real time when the pull stream instruction is acquired;

the encoder 363 is adapted to encode the sequence of image data to obtain a corresponding video data stream.

As an alternative, as shown in fig. 36, the acquisition device 360 may further comprise a sound recording component 364 adapted to acquire sound signals to obtain audio data.

The acquired image data sequence and audio data may be processed by a processor 362 and then encoded by an encoder 363 to obtain a corresponding video data stream. And the processor 362 can synchronize the acquisition initiation instructions to other acquisition devices via the transmission component 365 when acquiring the acquisition initiation instructions; upon receiving the pull stream instruction, the obtained video data stream is transmitted to the data processing apparatus in real time by the transmission section 365.

In a specific implementation, the acquisition device can be placed at different positions of the field acquisition region according to a preset multi-angle free view angle range, and the acquisition device can be fixedly arranged at a certain point of the field acquisition region or can move in the field acquisition region to form an acquisition array. Therefore, the acquisition device can be a fixed device or a mobile device, so that the video data stream can be flexibly acquired at multiple angles.

Fig. 37 is a schematic diagram of an acquisition array in an application scenario in an embodiment of the present invention, wherein a stage center is used as a core viewpoint, the core viewpoint is used as a center of a circle, and a sector area with the core viewpoint located on the same plane is used as a preset multi-angle free view angle range. The collection devices 371-375 in the collection array are arranged in different positions of the field collection area in a fan shape according to the preset multi-angle free view angle range. The acquisition device 376 is a mobile device that can be moved to a designated location according to instructions for flexible acquisition. Also, the acquisition device may be a handheld device to supplement the acquisition data in the event of a failure of the acquisition device or in a spatially small area, for example, handheld device 377 in the stage audience area of fig. 37 may be added to the acquisition array to provide a video data stream for the stage audience area.

As described above, the depth map calculation is needed to generate the multi-angle freeview data, but the time for the current depth map calculation is long, and how to reduce the time for generating the depth map and increase the depth map generation rate is a problem to be solved.

In view of the above problems, the embodiment of the present invention provides a computing node cluster, where multiple computing nodes can simultaneously generate depth maps in parallel and in batch for texture data synchronously acquired by the same acquisition array. Specifically, the depth map calculation process may be divided into multiple steps of obtaining a rough depth map through a first depth calculation, determining an unstable region in the rough depth map and then performing a second depth calculation, where in each step, multiple computing nodes in the computing node cluster may perform the first depth calculation on texture data collected by multiple collection devices in parallel to obtain the rough depth map, and perform verification and performing the second depth calculation on the obtained rough depth map in parallel, so that time for calculating the depth map may be saved, and a depth map generation rate may be improved. The following is a further detailed description of specific embodiments with reference to the accompanying drawings.

Referring to a flowchart of a depth map generating method shown in fig. 26, in an embodiment of the present invention, depth map generation is performed by using a plurality of computing nodes in a computing node cluster, and for convenience of description, any computing node in the computing node cluster is referred to as a first computing node. The following describes the depth map generation method of the computing node cluster in detail through specific steps:

S261, receiving texture data, wherein the texture data are synchronously acquired by a plurality of acquisition devices in the same acquisition array.

In a specific implementation, the plurality of acquisition devices can be placed at different positions of the field acquisition area according to a preset multi-angle free view angle range, and the acquisition devices can be fixedly arranged at a certain point of the field acquisition area or can move in the field acquisition area to form an acquisition array. Wherein, the multi-angle free view may refer to a spatial position of a virtual view point enabling a scene to be freely switched and a view angle. For example, the multi-angle free view angle may be a view angle with 6 degrees of freedom (6 DoF), and the acquisition device adopted in the acquisition array may be a general camera, a video recorder, a handheld device such as a mobile phone, etc., and specific implementation may be referred to other embodiments of the present invention, which are not described herein.

The texture data, that is, the pixel data of the two-dimensional image frame acquired by the acquisition device, may be an image at one frame time, or may be pixel data of a frame image corresponding to a video stream formed by continuous or discontinuous frame images.

S262, the first computing node performs first depth computation according to the first texture data and the second texture data to obtain a first rough depth map.

Here, for clarity and conciseness of description, the texture data satisfying the preset first mapping relation with the first computing node in the texture data is referred to as first texture data; and the texture data acquired by the acquisition equipment which meets the preset first spatial position relation with the acquisition equipment of the first texture data is called second texture data.

In a specific implementation, the first mapping relationship may be obtained based on a preset first mapping relationship table or through random mapping. For example, the texture data processed by each computing node may be pre-allocated according to the number of computing nodes in the computing node cluster and the number of acquisition devices in the acquisition array corresponding to the texture data. A special allocation node can be set to allocate the calculation tasks of each calculation node in the calculation node cluster, and the allocation node can obtain the first mapping relation based on a preset first mapping relation table or through random mapping. For example, if there are 40 collection devices in the collection array, for achieving the highest concurrent processing efficiency, 40 calculation nodes may be configured, where each collection device corresponds to one calculation node. If there are only 20 computing nodes, in case that the processing capacity of each computing node is the same or approximately equivalent, in order to achieve the highest concurrent processing efficiency and load balancing requirements, texture data collected by two collection devices corresponding to each computing node may be set. Specifically, a mapping relationship between an acquisition device identifier corresponding to texture data and an identifier of each computing node can be set as the first mapping relationship, and the texture data acquired by the corresponding acquisition device in the acquisition array is directly distributed to the corresponding computing node based on the first mapping relationship. In the implementation, the calculation tasks can be randomly allocated, and the texture data collected by each collection device in the collection array are randomly allocated to each calculation node in the calculation node cluster, so that all the texture data collected by the collection array can be copied on each calculation node in the calculation node cluster in advance for improving the processing efficiency.

As an example, any server in the server cluster may perform the first depth calculation according to the first texture data and the second texture data.

For the preset first spatial position relationship between the first texture data and the second texture data, for example, the second texture data may be texture data acquired by an acquisition device that satisfies a preset first distance relationship with an acquisition device of the first texture data, or texture data acquired by an acquisition device that satisfies a preset first number relationship with an acquisition device of the first texture data, or texture data acquired by an acquisition device that satisfies a preset first distance relationship with an acquisition device of the first texture data and satisfies a preset first number relationship.

The first preset number can be any integer value from 1 to N-1, and N is the total amount of the acquisition devices in the acquisition array. In an embodiment of the present invention, the first preset number is 2, so that the highest possible image quality can be obtained with the least amount of computation. For example, assuming that the computing node 9 corresponds to the camera 9 in the preset first mapping relationship, the rough depth map of the camera 9 may be calculated by using texture data of the camera 9 and texture data of the

cameras

5, 6, 7, 10, 11, 12 adjacent to the camera 9.

It will be appreciated that, in a specific implementation, the second texture data may also be data acquired by an acquisition device that satisfies other types of first spatial positional relationships with the acquisition device of the first texture data, for example, the first spatial positional relationships may also satisfy a preset angle, satisfy a preset relative position, and so on.

And S263, the first computing node synchronizes the first rough depth map to the rest computing nodes in the computing node cluster, and a rough depth map set is obtained.

After the rough depth map calculation, cross validation is needed to determine the unstable region in each rough depth map for refined solution in the next step. For any one of the rough depth maps, cross-validation is required by the rough depth maps corresponding to a plurality of acquisition devices around the acquisition device corresponding to the rough depth map. (typically, the rough depth map to be verified and the rough depth maps corresponding to all other acquisition devices are cross-verified together), so that the rough depth maps calculated by each computing node need to be synchronized to the rest of computing nodes in the computing node cluster respectively, after step S263, each computing node in the computing node cluster obtains the rough depth maps calculated by the rest of computing nodes in the computing node cluster, and each server obtains the identical rough depth map set.

And S264, the first computing node verifies a second rough depth map in the rough depth map set by adopting a third rough depth map to obtain an unstable region in the second rough depth map.

The second rough depth map and the first computing node can meet a preset second mapping relation; the third rough depth map may be a rough depth map corresponding to an acquisition device that satisfies a preset second spatial position relationship with the acquisition device corresponding to the second rough depth map.

The second mapping relation can be obtained based on a preset second mapping relation table or through random mapping. For example, the texture data processed by each computing node may be pre-allocated according to the number of computing nodes in the computing node cluster and the number of acquisition devices in the acquisition array corresponding to the texture data. In a specific implementation, a special allocation node may be set to allocate the computing task of each computing node in the computing node cluster, and the allocation node may obtain the second mapping relationship based on a preset second mapping relationship table or through random mapping. Specific examples of setting the second mapping relation may be referred to as implementation examples of the aforementioned first mapping relation.

It may be appreciated that, in a specific implementation, the second mapping relationship may completely correspond to the first mapping relationship, or may not correspond to the first mapping relationship. For example, when the number of cameras is equal to the number of computing nodes, a one-to-one second mapping relationship can be established between the acquisition device corresponding to the data (including texture data and rough depth map) and the identity of the computing node processing the data according to the hardware identity.

It is to be understood that the descriptions of the first, second and third coarse depth maps are provided herein for clarity and brevity. In a specific implementation, the first rough depth map may be the same as or different from the second rough depth map; the acquisition equipment corresponding to the third rough depth map and the acquisition equipment corresponding to the second rough depth map can meet a preset second spatial position relation.

For the second spatial position relationship, as a specific example, the texture data corresponding to the third rough depth map may be texture data acquired by an acquisition device corresponding to the second rough depth map that satisfies a preset second distance relationship, or the texture data corresponding to the third texture depth map may be texture data acquired by an acquisition device corresponding to the second rough depth map that satisfies a preset second number relationship, or the texture data corresponding to the third rough depth map may be texture data acquired by an acquisition device corresponding to the second rough depth map that satisfies a preset second distance relationship and a preset second number relationship.

The second preset number may take any integer value from 1 to N-1, where N is the total amount of the collection devices in the collection array. In a specific implementation, the second preset number may be equal to or different from the first preset number. In an embodiment of the present invention, the second preset number is 2, so that the highest possible image quality can be obtained with the least amount of computation.

In a specific implementation, the second spatial position relationship may be another type of spatial position relationship, for example, satisfying a preset angle, satisfying a preset relative position, and the like.

And S265, the first computing node performs second depth computation according to the unstable region in the second rough depth map, texture data corresponding to the second rough depth map and texture data corresponding to the third rough depth map, and obtains a corresponding fine depth map.

It should be noted that, the difference between the second depth calculation and the first depth calculation is that the depth map candidate value in the second rough depth map selected by the second depth calculation does not include the depth value of the unstable region, so that the unstable region in the generated depth map can be eliminated, the generated depth map is more accurate, and the quality of the generated multi-angle free view image can be improved.

Illustrated with one application scenario:

the server S may perform a first round of depth calculation (first depth calculation) based on the allocated texture data of the camera M and the texture data of the camera satisfying a preset first spatial positional relationship with the camera M, to obtain a rough depth map.

After the cross-validation in step S264, the refinement of the depth map may be continued on the same server. Specifically, the server S may cross-verify the rough depth map corresponding to the allocated camera M with the results of all other rough depth maps, so as to obtain an unstable region in the rough depth map corresponding to the camera M, and then, the server S may perform a round of depth map calculation (second depth calculation) on the unstable region in the rough depth map corresponding to the allocated camera M, the texture data collected by the camera M, and the texture information of N cameras around the camera M, so as to obtain a refined depth map corresponding to the first texture data (texture data collected by the camera M).

The rough depth map corresponding to the camera M is a rough depth map calculated based on texture data acquired by the camera M and texture data acquired by an acquisition device which meets a preset first spatial position relation with the camera M.

And S266, taking a fine depth map set of the fine depth map obtained by each calculation node as a finally generated depth map.

By adopting the embodiment, a plurality of computing nodes can simultaneously and synchronously perform depth map generation on texture data synchronously acquired by the same acquisition array in parallel in a batch mode, so that the depth map generation efficiency can be greatly improved.

In addition, by adopting the scheme, through secondary depth calculation, unstable areas in the generated depth map are eliminated, so that the obtained fine depth map is more accurate, and the quality of the generated multi-angle free view angle image can be improved.

In a specific implementation, according to the size of the data volume of the texture data to be processed and the requirement on the depth map generation speed, the configuration of the computing nodes and the number of the computing nodes in the computing node cluster can be selected. For example, the computing node cluster may be a server cluster formed by a plurality of servers, where the plurality of servers may be deployed in a centralized manner or may be located in a distributed manner. In some embodiments of the present invention, some or all of the computing node devices in the computing node cluster may be used as a local server, or may be used as edge node devices, or may be used as cloud computing devices.

As another example, the cluster of computing nodes may also be a computing device formed for multiple CPUs or GPUs. The embodiment of the present invention further provides a computing node adapted to form a computing node cluster with at least another computing node, for generating a depth map, and referring to a schematic structural diagram of the computing node shown in fig. 27, the computing node 270 may include:

an input unit 271 adapted to receive texture data originating from a plurality of acquisition devices in the same acquisition array being acquired simultaneously;

the first depth calculating unit 272 is adapted to perform a first depth calculation according to the first texture data and the second texture data, to obtain a first rough depth map, wherein: the first texture data and the computing node meet a preset first mapping relation; the second texture data are texture data acquired by acquisition equipment which meets the preset first spatial position relation with the acquisition equipment of the first texture data;

a synchronization unit 273, adapted to synchronize the first rough depth map to the rest of the computing nodes in the computing node cluster, to obtain a rough depth map set;

the verifying unit 274 is adapted to verify, for a second rough depth map in the rough depth map set, with a third rough depth map, to obtain an unstable region in the second rough depth map, where: the second rough depth map and the computing node meet a preset second mapping relation; the third rough depth map is a rough depth map corresponding to acquisition equipment, corresponding to the second rough depth map, of which the acquisition equipment meets a preset second spatial position relation;

The second depth calculating unit 275 is adapted to perform a second depth calculation according to the unstable region in the second rough depth map, the texture data corresponding to the second rough depth map, and the texture data corresponding to the third rough depth map, to obtain a corresponding fine depth map, where: the depth map candidate value in the second rough depth map selected by the second depth calculation does not contain the depth value of the unstable region;

an output unit 276 adapted to output the fine depth map such that the cluster of computing nodes obtains a fine depth map set as a finally generated depth map.

By adopting the computing nodes, the depth map computing process can comprise a plurality of steps of obtaining the rough depth map through first depth computing, determining an unstable region in the rough depth map, then performing second depth computing and the like, and the depth map computing is performed through the steps, so that the computing nodes can be used for computing respectively, and the generating efficiency of the depth map can be improved.

The embodiment of the invention also provides a computing node cluster, which can comprise a plurality of computing nodes, wherein the computing nodes in the computing node cluster can simultaneously generate depth maps in parallel and in batch mode for texture data synchronously acquired by the same acquisition array. For convenience of description, any computing node in the computing node cluster is referred to as a first computing node.

In some embodiments of the present invention, the first computing node is adapted to perform a first depth calculation according to the first texture data and the second texture data in the received texture data, to obtain a first rough depth map; synchronizing the first rough depth map to other computing nodes in the computing node cluster to obtain a rough depth map set; verifying a second rough depth map in the rough depth map set by adopting a third rough depth map to obtain an unstable region in the second rough depth map; according to the unstable region in the second rough depth map, texture data corresponding to the second rough depth map and texture data corresponding to the third rough depth map, performing second depth calculation to obtain a corresponding fine depth map, and outputting the obtained fine depth map so that the computing node cluster takes the obtained fine depth map set as a finally generated depth map;

wherein the first texture data and the first computing node satisfy a preset first mapping relation; the second texture data are texture data acquired by acquisition equipment which meets the preset first spatial position relation with the acquisition equipment of the first texture data; the second rough depth map and the first computing node meet a preset second mapping relation; the third rough depth map is a rough depth map corresponding to acquisition equipment, corresponding to the second rough depth map, of which the acquisition equipment meets a preset second spatial position relation; and the depth map candidate in the second coarse depth map selected by the second depth calculation does not contain the depth value of the unstable region.

Referring to a schematic diagram of depth map processing of a server cluster shown in fig. 28, texture data collected by N cameras in a camera array are respectively input to N servers in the server cluster, first depth calculation is performed respectively to obtain rough depth maps 1 to N, then each server copies the rough depth maps obtained by calculation to other servers in the server cluster respectively and realizes time synchronization, and then each server verifies the rough depth maps distributed by itself respectively and performs second depth calculation to obtain a depth map after fine calculation as a depth map generated by the server cluster. According to the calculation process, each server in the server cluster can perform first depth calculation on texture data acquired by a plurality of cameras in parallel, verify each rough depth map in the rough depth map set and perform second depth calculation, and the whole depth map generation process is performed by a plurality of servers in parallel, so that the time for calculating the depth map can be greatly saved, and the depth map generation efficiency is improved.

The specific implementation manner and the beneficial effects of the computing node and the computing node cluster in the embodiment of the present invention can be referred to the depth map generation method in the foregoing embodiment of the present invention, and are not described herein again.

The server cluster may further store the generated depth map, or output the depth map to the terminal device according to the request, so as to further generate and display the virtual viewpoint image, which is not described herein.

The embodiment of the present invention further provides a computer readable storage medium, on which computer instructions are stored, where the computer instructions can execute the steps of the depth map generation method described in any one of the foregoing embodiments, and the specific reference may be made to the steps of the foregoing depth map generation method, which are not described herein again.

In addition, currently known virtual viewpoint Image generation methods Based on Depth-Image-Based Rendering (DIBR) are difficult to meet the requirements of multi-angle free view applications in playback.

The inventor finds that the current DIBR virtual viewpoint image generation method has low concurrency, and is usually processed by a CPU, however, as the generation method of each virtual viewpoint image involves more steps, each step is more complex, and therefore, the generation method is more difficult to realize by a parallel processing method.

In order to solve the problems, the embodiment of the invention provides a method for generating virtual viewpoint images through parallel processing, which can greatly accelerate the timeliness of virtual viewpoint image generation of multi-angle free view angles, thereby meeting the requirements of low-delay playing and real-time interaction of multi-angle free view angle videos and improving user experience.

The following detailed description of the embodiments of the invention will be presented to persons skilled in the art to make the best mode and advantages of the invention apparent to those skilled in the art.

Referring to the flowchart of the virtual-viewpoint-image generating method shown in fig. 29, in a specific implementation, a virtual viewpoint image may be generated by:

s291, acquiring an image combination of a multi-angle free view angle, parameter data of the image combination and preset virtual view point path data, wherein the image combination comprises a plurality of angle synchronous texture images and depth images with corresponding relations.

The multi-angle free view may refer to a spatial position and a view angle of a virtual view point that enables a scene to be freely switched. The multi-angle free view angle range can be determined according to the requirements of application scenes.

In specific implementation, an acquisition array formed by a plurality of acquisition devices can be arranged on the scene, each acquisition device in the acquisition array can be arranged at different positions of a scene acquisition area according to a preset multi-angle free view angle range, and each acquisition device can synchronously acquire scene images to obtain a plurality of angle synchronous texture maps. For example, a scene may be acquired by multiple cameras, video cameras, etc. at multiple angles in a synchronized manner.

The image in the image combination of the multi-angle free view angles can be an image of a complete free view angle. In a specific implementation, a viewing angle with 6 degrees of freedom (Degree of Freedom, doF) is possible, that is, the spatial position of the viewpoint and the viewing angle can be freely switched. As previously described, the spatial position of the viewpoint may be expressed as coordinates (x, y, z), and the viewing angle may be expressed as three rotational directions

And thus may be referred to as 6DoF.

In the virtual viewpoint image generation process, an image combination of multi-angle free view angles and parameter data of the image combination can be acquired first.

In an implementation, texture maps in the image combination are in one-to-one correspondence with depth maps. The texture map may be in any type of two-dimensional image format, for example, any one of BMP, PNG, JPEG, webp format and the like. The depth map may represent the distance of points in the scene relative to the camera, i.e. each pixel value in the depth map represents the distance between a point in the scene and the camera.

Texture maps in image combinations are multiple two-dimensional images that are synchronized. Depth data for each two-dimensional image may be determined based on the plurality of two-dimensional images.

Wherein the depth data may comprise depth values corresponding to pixels of the two-dimensional image. The distance of the acquisition device to the various points in the region to be viewed may be taken as the above-mentioned depth value, which may directly reflect the geometry of the visible surface in the region to be viewed. For example, the depth value may be a distance from each point in the region to be viewed to the optical center along the camera optical axis, and the origin of the camera coordinate system may be the optical center. It will be appreciated by those skilled in the art that the distance may be a relative value, with the same reference being used for the plurality of images.

The depth data may include depth values that are in one-to-one correspondence with pixels of the two-dimensional image, or may be partial values selected from a set of depth values that are in one-to-one correspondence with pixels of the two-dimensional image. It will be appreciated by those skilled in the art that the depth value set may be stored in the form of a depth map, and in a specific implementation, the depth data may be data obtained by downsampling an original depth map, and the depth value set corresponding to pixels of the two-dimensional image (texture map) one by one is stored in the form of an image of the original depth map according to pixel point arrangement of the two-dimensional image (texture map).

In the embodiment of the invention, the image combination of the multi-angle free view angles and the parameter data of the image combination can be obtained through the following steps, and the description is given below through specific application scenes.

As a specific embodiment of the present invention, the method may include the steps of: the first step is acquisition and calculation of a depth map, and comprises three main steps of: video acquisition by multiple cameras (Multi-camera Video Capturing), camera inside and outside parameter computation (Camera Parameter Estimation), and depth map computation (Depth Map Calculation). For multi-camera acquisition, it is desirable that the video acquired by each camera be aligned at the frame level.

Texture Image (i.e., synchronized multiple images) can be obtained by video acquisition with multiple cameras; the Camera Parameter (Camera Parameter) can be obtained through the Camera internal and external Parameter calculation, namely the Parameter data of the image combination, including internal Parameter data and external Parameter data; by Depth Map calculation, a Depth Map (Depth Map) can be obtained.

And multiple groups of texture maps and depth maps with corresponding relations in synchronization in the image combination can be spliced together to form a frame of spliced image. The spliced images can have a variety of spliced structures. Each frame stitched image may be combined as one image. In the image combination, a plurality of groups of texture maps and depth maps can be spliced and combined according to a preset relation. Specifically, the texture map and the depth map of the image combination can be divided into a texture map region and a depth map region according to the position relationship, the texture map region stores pixel values of each texture map respectively, and the depth map region stores depth values corresponding to each texture map respectively according to the preset position relationship. The texture map region and the depth map region may be continuous or may be spaced apart. In the embodiment of the invention, the position relation between the texture map and the depth map in the image combination is not limited.

In a specific implementation, the parameter data of each image in the image combination may be obtained from the attribute information of the image. Wherein, the parameter data may include external parameter data and may also include internal parameter data. The external parameter data is used for describing the space coordinates, the gesture and the like of the shooting equipment, and the internal parameter data is used for expressing the attribute information of the shooting equipment such as the optical center, the focal length and the like of the shooting equipment. The internal parameter data may also include distortion parameter data. The distortion parameter data includes radial distortion parameter data and tangential distortion parameter data. Radial distortion occurs during the process of transforming the photographing apparatus coordinate system into the physical coordinate system of the image. Tangential distortion occurs during the fabrication of the camera because the plane of the photoreceptor is not parallel to the lens. Information such as a photographing position, a photographing angle, and the like of an image can be determined based on the external parameter data. In virtual viewpoint image generation, the determined spatial mapping relationship can be made more accurate by incorporating internal parameter data including distortion parameter data.

In a specific implementation, the virtual viewpoint path may be preset. For example, for a sports game, such as basketball or soccer, an arcuate path may be pre-planned, for example, in accordance with which a corresponding virtual viewpoint image is generated whenever a highlight appears.

In a particular application, the virtual viewpoint path may be set based on a particular location or perspective in the scene (e.g., under-basket, on-scene, referee perspective, trainer perspective, etc.), or based on a particular object (e.g., player on the scene, host, audience on the scene, and actors in the video image, etc.).

The path data corresponding to the virtual viewpoint path may include position data of a series of virtual viewpoints in the path.

S292, according to the preset virtual viewpoint path data and the parameter data of the image combination, selecting a texture map and a depth map of a corresponding group of each virtual viewpoint in the virtual viewpoint path from the image combination.

In a specific implementation, according to the position data of each virtual viewpoint in the virtual viewpoint path data and the parameter data of the image combination, a corresponding group of texture map and depth map which satisfy a preset position relationship and/or a number relationship with each virtual viewpoint position can be selected from the image combination. For example, in a virtual viewpoint position area where the camera density is high, only texture maps and corresponding depth maps captured by two cameras closest to the virtual viewpoint may be selected, and in a virtual viewpoint position area where the camera density is low, texture maps and corresponding depth maps captured by three or four cameras closest to the virtual viewpoint may be selected.

In an embodiment of the present invention, texture maps and depth maps corresponding to 2 to N acquisition devices closest to each virtual viewpoint position in the virtual viewpoint path may be selected, respectively, where N is the number of all the acquisition devices in the acquisition array. For example, texture maps and depth maps corresponding to two acquisition devices closest to each virtual viewpoint location may be selected by default. In a specific implementation, the user may set the number of the selected acquisition devices closest to the virtual viewpoint position, and the maximum number of the selected acquisition devices does not exceed the number of the acquisition devices corresponding to the image combination.

By adopting the mode, no special requirement is required for the spatial position distribution of the acquisition equipment in the acquisition array (for example, the acquisition equipment can be distributed in a linear mode, in an arc array mode or in any irregular arrangement mode), the actual distribution condition of the acquisition equipment is determined according to the acquired virtual viewpoint position data and parameter data corresponding to the image combination, and then the selection of texture maps and depth maps of corresponding groups in the image combination is selected by adopting an adaptive strategy, so that higher selection freedom and flexibility can be provided under the conditions of reducing the data operand and ensuring the quality of the generated virtual viewpoint image, and in addition, the installation requirement for the acquisition equipment in the acquisition array is reduced, and the acquisition equipment is convenient to adapt to different field requirements and installation operability.

In an embodiment of the present invention, according to the virtual viewpoint position data and the parameter data of the image combination, a preset number of texture maps and depth maps of the corresponding group closest to the virtual viewpoint position are selected from the image combination.

It will be appreciated that in a specific implementation, other preset rules may be used to select a corresponding set of texture maps and depth maps from the image combination. The texture map and the depth map of the respective group may also be selected from the image combination, for example, according to the processing power of the virtual viewpoint image generating device, or according to a user's requirement for the generation speed, a definition requirement for the generated image (such as general definition, high definition or super definition, etc.).

S293, inputting texture maps and depth maps of corresponding groups of virtual viewpoints into a graphics processor, and respectively carrying out combined rendering on the texture maps and the pixel points in the depth maps of corresponding groups in the selected image combination by a plurality of threads by taking the pixel points as processing units for each virtual viewpoint in a virtual viewpoint path to obtain an image corresponding to the virtual viewpoint.

The graphic processor (Graphics Processing Unit, GPU), also known as a display core, a visual processor, a display chip and the like, is a microprocessor specially used for performing image and graphic related operation, and can be configured in personal computers, workstations, electronic game machines and some mobile terminals (such as tablet computers, smart phones and the like) with image related operation requirements.

For a better understanding and implementation of embodiments of the present invention, a brief description of the architecture of a GPU employed in some embodiments of the present invention is provided below. It should be noted that the GPU architecture is only a specific example, and does not limit the GPU to which the embodiments of the present invention are applied.

In some embodiments of the invention, the GPU may employ a unified device architecture (Compute Unified Device Architecture, CUDA) parallel programming architecture to render a combination of pixels in the texture map and depth map of the respective set in the selected image combination. CUDA is a new hardware and software architecture for distributing and managing computations on GPUs as data-parallel computing devices without mapping them to graphics application programming interfaces (Application Programming Interface, APIs).

With CUDA programming, a GPU may be considered a computing device capable of executing a large number of threads in parallel. It runs as a co-processor to the host CPU or host, in other words, a parallel, computationally intensive portion of the data in an application running on the host is dropped onto the GPU.

Rather, a portion of an application that executes a single piece of data independent of multiple times in the application may be isolated into a function that runs on the GPU device as many different threads. To this end, such functions may be compiled into an instruction set of the GPU device, and the resulting program, known as a Kernel (Kernel), is downloaded onto the GPU. The Thread batch of the execution core is organized as Thread blocks (Thread blocks).

A thread block is a collection of threads that can cooperate by effectively sharing data through some fast shared memory and synchronizing their execution to coordinate memory accesses. In implementations, a synchronization point may be specified in the kernel whose threads in the thread block will be suspended until they all reach the synchronization point.

In implementations, the maximum number of threads a thread block can contain is limited. However, blocks of the same dimension and size executing the same kernel may be batched into one block grid (Grid of Thread Blocks) so that the total number of threads that can be started in a single kernel call is much greater.

From the above, with the CUDA structure, a large number of threads can be simultaneously processed on the GPU in parallel, so that the virtual viewpoint image generation speed can be greatly improved.

For better understanding and implementation by those skilled in the art, the process of processing each step of the combined rendering in pixel units is described in detail below.

In a specific implementation, referring to the flowchart of the method for performing the combined rendering by the GPU shown in fig. 30, step S293 may be implemented by:

and S2931, performing forward mapping on the depth maps of the corresponding groups in parallel, and mapping the depth maps onto the virtual viewpoints.

The forward mapping of the depth map is to map the depth map of the original camera (acquisition device) to the position of the virtual camera through the conversion of the coordinate space position, so as to obtain the depth map of the position of the virtual camera. Specifically, the forward mapping of the depth map is an operation of mapping each pixel of the depth map of the original camera (acquisition device) to a virtual viewpoint according to a preset coordinate mapping relationship.

In a specific implementation, a first Kernel (Kernel) function may be run on the GPU to forward map pixels in the corresponding set of depth maps in parallel onto corresponding virtual viewpoint locations.

The inventor finds that in the forward mapping process, the front background shielding problem and the mapping gap effect can exist, so that the quality of the generated image is influenced. First, in the embodiment of the present invention, for the front background occlusion problem, for a plurality of depth values mapped to the same pixel of the virtual viewpoint, atomic operation may be adopted, and a value with the largest pixel value is taken, so as to obtain a first depth map of the corresponding virtual viewpoint position. Then, in order to improve the influence caused by the mapping gap effect, a second depth map of the virtual viewpoint position can be created based on the first depth map of the virtual viewpoint position, each pixel in the second depth map is processed in parallel, and the maximum value of the pixel points of the preset area around the corresponding pixel position in the first depth map is taken.

In the forward mapping process, each pixel can be processed in parallel, so that the forward mapping processing speed can be greatly increased, and the ageing performance of forward mapping is improved.

And S2932, performing post-processing on the depth map after forward mapping in parallel.

After the forward mapping is finished, post-processing can be performed on the virtual viewpoint depth map, specifically, a preset second kernel function can be run on the GPU, and median filtering processing can be performed on each pixel in the second depth map obtained by the forward mapping in a preset area around the pixel position. Because the median filtering processing can be performed on each pixel in the second depth map in parallel, the post-processing speed can be greatly increased, and the aging performance of the post-processing is improved.

S2933, reversely mapping the texture maps of the corresponding groups in parallel.

The present step is to calculate the coordinates in the original camera texture map from the values of the depth map for the virtual viewpoint position, and calculate the corresponding values by sub-pixel interpolation calculation. In the GPU, the sub-pixel values can be directly interpolated according to bilinear, so in this step, only the coordinates calculated according to each pixel need to be directly taken in the original camera texture. In a specific implementation, a preset third kernel function may be run on the GPU, and interpolation operation is performed on pixels in the texture map of the selected corresponding group in parallel, so that a corresponding virtual texture map may be generated.

And through running a third core function on the GPU, interpolation operation is performed on pixels in the texture map of the selected corresponding group in parallel to generate a corresponding virtual texture map, so that the processing speed of reverse mapping can be greatly increased, and the ageing performance of the reverse mapping is improved.

S2934, fusing pixels in each virtual texture map generated after reverse mapping in parallel.

In a specific implementation, a fourth kernel function may be run on the GPU, and weighting fusion is performed in parallel on pixels at the same position in each virtual texture map generated after reverse mapping.

And running a fourth core function on the GPU, and carrying out weighted fusion on pixels at the same position in each virtual texture map generated after reverse mapping in parallel, so that the fusion speed of the virtual texture maps can be greatly increased, and the timeliness performance of image fusion is improved.

The following is a detailed description of a specific example.

In step S2931, for forward mapping of the depth map, first, the projection mapping relationship of each pixel point may be calculated by the first Kernel function of the GPU.

Assuming that a pixel point (u, v) in an image of a real camera, firstly, changing an image coordinate (u, v) to a coordinate [ X, Y, Z ] under a camera coordinate system through a perspective projection model of a corresponding camera ] ^T . It will be appreciated that there are different conversion methods for the perspective projection models of different cameras.

For example, for a perspective projection model:

wherein [ u, v,1] ^T Is the homogeneous coordinates of pixel (u, v) [ X, Y.Z ]] ^T Is (u, v) corresponds to the coordinates of the real object in the camera coordinate system, f _x 、f _y Focal lengths in x, y directions, c _x 、c _y The optical center coordinates in the x, y directions, respectively.

Therefore, for a certain pixel point (u, v) in the image, the depth value Z of the pixel, the physical parameter (f) of the corresponding camera lens are known _x 、f _y 、c _x 、c _y Can be obtained from the parameter data of the image combination), the coordinates [ X, Y.Z ] of the corresponding point in the camera coordinate system can be obtained by the formula (1)] ^T 。

After the conversion from the image coordinate system to the camera coordinate system, the coordinates of the object under the current camera coordinate system can be converted into the coordinate system of the camera where the virtual viewpoint is located according to the coordinate conversion in the three-dimensional space. The following transformation formula can be adopted:

wherein R is ₁₂ A rotation matrix of 3x3, T ₁₂ Is a translation vector.

Assume that the three-dimensional coordinates after transformation are [ X ] ₁ ,Y ₁ ,Z ₁ ] ^T Through the previous description from the image coordinate system to the camera coordinate system, the inverse transformation is applied, and the corresponding relation position of the transformed virtual camera three-dimensional coordinate and the virtual camera image coordinate can be obtained. Thereby, a projection relationship of points from the real viewpoint image to the virtual viewpoint image is established. And transforming each pixel point in the real viewpoint and performing rounding operation of the coordinate points to obtain a projection depth map in the virtual viewpoint image.

After the point-to-point mapping relation between the original camera depth map and the virtual camera depth map is established, because a plurality of positions in the original camera depth map may be mapped to the same position in the virtual camera depth map in the projection process of the depth map, a front background shielding relation in the forward mapping process of the depth map exists. As shown in formula (3):

Depth(u,v)＝min[Depth _1-N (u,v)] (3)

the value with the smallest depth value is also the value with the largest depth image pixel value, so the value with the largest pixel value is taken from the mapped depth image, and the first depth image of the corresponding virtual viewpoint position can be obtained.

In a specific implementation, the operation of taking the maximum or minimum value of the plurality of point maps can be provided in the parallel environment of the CUDA, and specifically, the operation can be performed by calling an atomic operation function atomic min or atomic max provided by the CUDA.

In the process of obtaining the first depth map, a gap effect may be generated, that is, a part of pixel points may not be covered due to the problem of mapping accuracy. Aiming at the problem, the embodiment of the invention can carry out gap covering treatment on the obtained first depth map. In one embodiment of the present invention, a gap masking process of 3*3 is performed on the first depth map. The specific masking treatment process is as follows:

First, a second depth map of a virtual viewpoint position is created, then, for each pixel D (x, y) in the second depth map, an existing pixel d_old (x, y) in a range of a periphery 3*3 in the first depth map of the virtual viewpoint position is taken, and a maximum value of pixels in a range of a periphery 3*3 in the first depth map is taken, which can be implemented by the following kernel function operation:

D(x,y)＝Max[D_old(X,y)] (4)

it will be appreciated that the size range of the surrounding area during the gap-masking process may take on other values, such as 5*5. To obtain a better treatment effect, the setting can be performed empirically.

For step S2932, in an implementation, the second depth map of virtual viewpoint positions may be median filtered 3*3 or 5*5. For example, for median filtering of 3*3, the second core mapping function of the GPU may operate as follows:

/>

in step S2933, a third kernel function running on the GPU, which calculates coordinates in the original camera texture map from the values of the depth map, may perform the inverse implementation of step S2391.

In step S2934, for the pixel points f (x, y) of the virtual viewpoint position (x, y), the pixel values of the corresponding positions of the texture map mapped by all the original cameras may be weighted according to the confidence level conf (x, y). The fourth kernel function may be calculated using the following formula:

f(x,y)＝∑conf(x,y)*f(x,y) (6)

Through the above steps S2931 to S2934, a virtual viewpoint image can be obtained. In specific implementation, the virtual texture map obtained after weighted fusion can be further processed and optimized. For example, the hole filling may be performed in parallel on each pixel in the weighted texture map, so as to obtain an image corresponding to the virtual viewpoint.

For hole filling of virtual texture maps, in implementations, a separate windowing approach may be employed for each pixel to perform parallel operations. For example, for each hole pixel, a window of size n×m may be opened, and then the value of the hole pixel is weighted according to the non-hole pixel value in the window. By the method, the generation of the virtual viewpoint image can be completely performed on the GPU in parallel, so that the generation process can be greatly accelerated.

As shown in the schematic diagram of the hole filling method in fig. 31, a hole area F exists for the generated virtual viewpoint view G, and rectangular windows a and b are opened for the pixel F1 and the pixel F2 in the hole area F, respectively. Then, for the pixel F1, all pixels (partial pixels may be obtained by downsampling) are obtained from the existing non-hole pixels in the rectangular window, and the value of the pixel F1 in the hole area F is obtained by weighting (or average weighting) according to the distance. Similarly, the same operation is performed for the pixel f2, and the value of the pixel f2 can be obtained. In a specific implementation, a fifth kernel function can be run on the GPU for parallelization processing to speed up hole filling time.

The fifth kernel function may be calculated using the following formula:

P(x,y)＝Average[Window(x,y)] (7)

where P (x, y) is the value of a certain point in the hole, window (x, y) is the value (or downsampled value) of all existing pixels in the hole area, and Average is the Average (or weighted Average) value of these pixels.

In the embodiment of the invention, in addition to generating the virtual viewpoint images of the virtual viewpoint positions in parallel in units of pixels, in order to further increase the efficiency of generating the virtual viewpoint path images, the texture map and the depth map of the corresponding group of each virtual viewpoint in the virtual viewpoint path may be respectively input into the GPUs, and the GPUs may be generated in parallel.

In a specific implementation, to further improve the processing efficiency, the above steps may be performed separately by different block grids.

Referring to the schematic structural diagram of the virtual viewpoint image generating system shown in fig. 32, in an embodiment of the present invention, the virtual viewpoint image generating system 320 may include a CPU321 and a GPU322, wherein:

the CPU321 is adapted to obtain an image combination of a multi-angle free view angle, parameter data of the image combination, and preset virtual view path data, where the image combination includes a plurality of angle synchronous texture maps and depth maps with corresponding relationships; selecting texture maps and depth maps of corresponding groups of virtual viewpoints in the virtual viewpoint path from the image combination according to the preset virtual viewpoint path data and the parameter data of the image combination;

And the GPU322 is suitable for calling corresponding core functions for each virtual viewpoint in the virtual viewpoint path, and performing combined rendering on the texture map and the pixel points in the depth map of the corresponding group in the selected image combination in parallel to obtain an image corresponding to the virtual viewpoint.

In particular, the GPU322 is adapted to forward map the corresponding set of depth maps onto the virtual viewpoint in parallel; performing post-processing on the depth map after forward mapping in parallel; reversely mapping the texture maps of the corresponding groups in parallel; and fusing pixels in each virtual texture map generated after the reverse mapping in parallel.

The GPU322 may use steps S2931 to S2934 in the foregoing virtual viewpoint image generating method, and the hole filling step to generate the virtual viewpoint image of each virtual viewpoint, which may be specifically described in the foregoing embodiments, and will not be described herein.

In an implementation, the GPU may be one GPU or multiple GPUs, as shown in fig. 32.

In a specific application, the GPU may be an independent GPU chip, or one GPU core in one GPU chip, or one GPU server, or a GPU chip formed by packaging multiple GPU chips or multiple GPU cores, or a GPU cluster formed by multiple GPU servers.

Accordingly, the texture map and the depth map of the corresponding group of each virtual viewpoint in the virtual viewpoint path may be input to the plurality of GPU chips, the plurality of GPU cores, or the plurality of GPU servers, respectively, to generate a plurality of virtual viewpoint images in parallel. For example, the virtual viewpoint path data corresponding to a certain virtual viewpoint path contains 20 virtual viewpoint position coordinates in total, and the data corresponding to the 20 virtual viewpoint position coordinates can be input into a plurality of GPU chips in parallel, for example, 10 GPU chips in total, so that the data corresponding to the 20 virtual viewpoint position coordinates can be processed in parallel in two batches, and each GPU chip can generate the virtual viewpoint image corresponding to the virtual viewpoint position in parallel in pixel units, so that the generation speed of the virtual viewpoint image can be greatly increased, and the time efficiency performance of virtual viewpoint image generation can be improved.

The embodiment of the present invention further provides an electronic device, referring to the schematic structural diagram of the electronic device shown in fig. 33, the electronic device 330 may include a memory 331, a CPU332, and a GPU333, where the memory 331 stores computer instructions that can be executed on the CPU332 and the GPU333, and the steps of the virtual viewpoint image generating method according to any one of the foregoing embodiments of the present invention are suitable for being executed when the CPU332 and the GPU333 cooperatively execute the computer instructions, and detailed descriptions of the foregoing embodiments will be omitted herein.

In an implementation, the electronic device may be one server, or may be a server cluster formed by multiple servers.

The above embodiments can be applied to live scenes, and two or more embodiments can be combined as required in the application process. It will be appreciated by those skilled in the art that the above embodiments are not limited to live scenes, and the embodiments of the present invention may also be applicable to playing requirements of non-live scenes, such as recording and broadcasting, rebroadcasting, and other scenes with low latency requirements, for video or image acquisition, data processing of video data streams, and image generation of servers.

The specific implementation manner, working principle, specific action and effect of each device or system in the embodiment of the invention can be referred to in the specific description of the corresponding method embodiment.

Embodiments of the present invention also provide a computer readable storage medium having stored thereon computer instructions which, when executed, perform the steps of the method of any of the embodiments of the present invention described above.

The computer readable storage medium may be any suitable readable storage medium such as an optical disc, a mechanical hard disc, a solid state hard disc, and the like. The method for executing the instructions stored on the computer readable storage medium may refer to the embodiments of the above methods, and will not be described in detail.

The embodiment of the invention also provides a server, which comprises a memory and a processor, wherein the memory stores computer instructions capable of being executed on the processor, and the processor can execute the steps of the method according to any embodiment of the invention when executing the computer instructions. The specific implementation of the method executed by the computer instruction during execution may refer to the steps of the method in the foregoing embodiment, and will not be described in detail.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and the scope of the invention should be assessed accordingly to that of the appended claims.

Claims

1. A method of data synchronization, comprising:

respectively sending a drawing command to each acquisition device in an acquisition array, wherein the drawing command is used for realizing drawing synchronization, each acquisition device in the acquisition array is arranged at different positions of an on-site acquisition area according to a preset multi-angle free view angle range, and each acquisition device in the acquisition array synchronously acquires video data streams from corresponding angles in real time;

Receiving video data streams respectively transmitted by all the acquisition devices in the acquisition array based on the streaming instruction in real time, and determining whether frame-level synchronization is performed between the video data streams respectively transmitted by all the acquisition devices in the acquisition array;

and when the frame-level synchronization is not carried out among the video data streams respectively transmitted by the acquisition devices in the acquisition array, respectively sending a stream pulling instruction to the acquisition devices in the acquisition array again until the frame-level synchronization is carried out among the video data streams respectively transmitted by the acquisition devices in the acquisition array.

2. The data synchronization method according to claim 1, further comprising:

determining one path of video data stream in the video data streams of all the acquisition devices in the acquisition array received in real time as a reference data stream;

determining a video frame to be intercepted in the reference data stream based on the received video frame intercepting instruction, and selecting video frames in other video data streams synchronous with the video frame to be intercepted in the reference data stream as the video frames to be intercepted of the other video data streams;

intercepting video frames to be intercepted in each video data stream;

and synchronously uploading the intercepted video frames to a designated target end.

3. The method for synchronizing data according to claim 2, wherein the selecting the video frame of the remaining video data streams synchronized with the video frame to be intercepted in the reference data stream as the video frame to be intercepted of the remaining video data streams includes at least one of:

selecting video frames which are consistent with the characteristic information of the object in all other video data streams as video frames to be intercepted of all other video data streams according to the characteristic information of the object in the video frames to be intercepted in the reference data stream;

and selecting the video frames which are consistent with the time stamp information in the rest video data streams as video frames to be intercepted of the rest video data streams according to the time stamp information of the video frames in the reference data streams.

4. A method of synchronizing data according to any of claims 1-3, wherein each acquisition device in the acquisition array is arranged to synchronize the acquisition of video data streams in real time from a respective angle, comprising at least one of:

when at least one acquisition device acquires an acquisition starting instruction, the acquisition device acquiring the acquisition starting instruction synchronizes the acquisition starting instruction to other acquisition devices, so that each acquisition device in the acquisition array starts to acquire video data streams from corresponding angles in real time and synchronously based on the acquisition starting instruction;

And each acquisition device in the acquisition array synchronously acquires video data streams from corresponding angles in real time based on preset clock synchronous signals.

5. A data processing apparatus, comprising:

the system comprises an instruction sending unit, a control unit and a control unit, wherein the instruction sending unit is suitable for respectively sending a drawing instruction to each acquisition device in an acquisition array, the drawing instruction is used for realizing drawing synchronization, each acquisition device in the acquisition array is arranged at different positions of a field acquisition area according to a preset multi-angle free view angle range, and each acquisition device in the acquisition array is respectively used for synchronously acquiring video data streams from corresponding angles in real time;

the data stream receiving unit is suitable for receiving video data streams respectively transmitted by all the acquisition devices in the acquisition array based on the streaming instruction in real time;

the first synchronization judging unit is suitable for determining whether the video data streams respectively transmitted by the acquisition devices in the acquisition array are in frame level synchronization or not, and re-triggering the instruction sending unit when the video data streams respectively transmitted by the acquisition devices in the acquisition array are not in frame level synchronization until the video data streams respectively transmitted by the acquisition devices in the acquisition array are in frame level synchronization.

6. The data processing apparatus of claim 5, further comprising:

the reference video stream determining unit is suitable for determining one path of video data stream in the video data streams of all the acquisition devices in the acquisition array received in real time as a reference data stream;

the video frame selecting unit is suitable for determining video frames to be intercepted in the reference data stream based on the received video frame intercepting instruction, and selecting video frames in other video data streams synchronous with the video frames to be intercepted in the reference data stream as video frames to be intercepted in other video data streams;

the video frame intercepting unit is suitable for intercepting video frames to be intercepted in each video data stream;

and the uploading unit is suitable for synchronously uploading the intercepted video frames to the appointed target end.

7. The data processing apparatus of claim 6, wherein the video frame selection unit comprises at least one of:

the first video frame selection module is suitable for selecting video frames which are consistent with the characteristic information of the object in all other video data streams as video frames to be intercepted of all other video data streams according to the characteristic information of the object in the video frames to be intercepted in the reference data stream;

The second video frame selection module is suitable for selecting video frames which are consistent with the time stamp information in the rest video data streams according to the time stamp information of the video frames in the reference data stream as video frames to be intercepted of the rest video data streams.

8. A data synchronization system, comprising: the system comprises an acquisition array arranged in a field acquisition area and data processing equipment connected with the acquisition array in a link, wherein the acquisition array comprises a plurality of acquisition equipment, and each acquisition equipment in the acquisition array is arranged at different positions of the field acquisition area according to a preset multi-angle free view angle range, wherein:

each acquisition device in the acquisition array is suitable for respectively and synchronously acquiring video data streams from corresponding angles in real time, and transmitting the acquired video data streams to the data processing device in real time based on a stream pulling instruction sent by the data processing device, wherein the stream pulling instruction is used for realizing stream pulling synchronization;

the data processing equipment is suitable for respectively sending a stream pulling instruction to each acquisition equipment in the acquisition array, receiving video data streams respectively transmitted by each acquisition equipment in the acquisition array based on the stream pulling instruction in real time, and when frame-level synchronization is not carried out among the video data streams respectively transmitted by each acquisition equipment in the acquisition array, respectively sending the stream pulling instruction to each acquisition equipment in the acquisition array again until frame-level synchronization is carried out among the video data streams transmitted by each acquisition equipment in the acquisition array.

9. The data synchronization system of claim 8, wherein the data processing device is further adapted to determine one of the video data streams of each of the acquisition devices in the acquisition array received in real time as a reference data stream; determining a video frame to be intercepted in the reference data stream based on the received video frame intercepting instruction, and selecting video frames in other video data streams synchronous with the video frame to be intercepted in the reference data stream as the video frames to be intercepted of the other video data streams; intercepting video frames to be intercepted in each video data stream and synchronously uploading the video frames obtained by interception to a designated target end.

10. The data synchronization system according to claim 8 or 9, wherein,

each acquisition device in the acquisition array is connected through a synchronous line, wherein when at least one acquisition device acquires an acquisition starting instruction, the acquisition device acquiring the acquisition starting instruction synchronizes the acquisition starting instruction to other acquisition devices through the synchronous line, so that each acquisition device in the acquisition array starts to synchronously acquire video data streams from corresponding angles in real time based on the acquisition starting instruction; or alternatively, the first and second heat exchangers may be,

11. An acquisition device, comprising: processor, photoelectric conversion camera module, encoder and transmission part, wherein:

the photoelectric conversion camera shooting assembly is suitable for acquiring images;

the processor is suitable for synchronizing the acquisition start instruction to other acquisition equipment through the transmission component when the acquisition start instruction is acquired, starting to process the image acquired by the photoelectric conversion camera component in real time to obtain an image data sequence, and transmitting the obtained video data stream to the data processing equipment through the transmission component when the stream pulling instruction is acquired, wherein the stream pulling instruction is used for realizing stream pulling synchronization;

the encoder is adapted to encode the image data sequence to obtain a corresponding video data stream.

12. A computer readable storage medium having stored thereon computer instructions, which when run perform the steps of the method of any of claims 1 to 4.

13. A server comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the method of any of claims 1 to 4.