WO2021249414A1 - 数据处理方法、系统、相关设备和存储介质 - Google Patents

数据处理方法、系统、相关设备和存储介质 Download PDF

Info

Publication number
WO2021249414A1
WO2021249414A1 PCT/CN2021/099047 CN2021099047W WO2021249414A1 WO 2021249414 A1 WO2021249414 A1 WO 2021249414A1 CN 2021099047 W CN2021099047 W CN 2021099047W WO 2021249414 A1 WO2021249414 A1 WO 2021249414A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
frame
video
data
video frame
Prior art date
Application number
PCT/CN2021/099047
Other languages
English (en)
French (fr)
Inventor
盛骁杰
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2021249414A1 publication Critical patent/WO2021249414A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/21805Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6587Control parameters, e.g. trick play commands, viewpoint selection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8146Monomedia components thereof involving graphical data, e.g. 3D object, 2D graphics

Definitions

  • the embodiments of this specification relate to the field of data processing technology, and in particular to a data processing method, system, related equipment, and storage medium.
  • 6 Degrees of Freedom (6DoF) technology is a technology to provide a high degree of freedom viewing experience.
  • the user can adjust the viewing angle of the video through interactive means while watching, and watch from the free point of view that the user wants to watch. , Thereby greatly improving the viewing experience.
  • AR augmented reality
  • embodiments of this specification provide a data processing method, system, and related equipment and storage medium.
  • the embodiment of this specification provides a data processing method, including:
  • the virtual information image and the corresponding video frame are synthesized and displayed.
  • the multi-angle free-view video is based on the parameter data corresponding to the image combination formed by multiple synchronized video frames at the specified frame time intercepted from the multi-channel synchronized video stream, and the preset frame image in the image combination Pixel data and depth data are obtained by performing frame image reconstruction on a preset virtual viewpoint path, where the multiple synchronized video frames include frame images of different shooting angles.
  • the acquiring the virtual information image generated based on the augmented reality special effect input data of the target object includes:
  • said combining the virtual information image with the corresponding video frame and displaying it includes: sorting according to the frame time and the virtual viewpoint position of the corresponding frame time, and comparing the virtual information image at the corresponding frame time with the corresponding frame time
  • the video frames are synthesized and displayed.
  • the combining and displaying the virtual information image and the corresponding video frame includes at least one of the following:
  • the virtual information image is superimposed on the corresponding video frame to obtain a superimposed composite video frame, and the superimposed composite video frame is displayed.
  • the displaying the fused video frame includes: inserting the fused video frame into the to-be-played video stream for playing and displaying.
  • the acquiring the target object in the video frame of the multi-angle free-view video includes: generating an interactive control instruction in response to a special effect, and acquiring the target object in the video frame of the multi-angle free-view video.
  • the acquiring the virtual information image generated based on the augmented reality special effect input data of the target object includes: generating the virtual information image based on the augmented reality special effect input data of the target object according to a preset special effect generation method.
  • the virtual information image corresponding to the target object includes: generating the virtual information image based on the augmented reality special effect input data of the target object according to a preset special effect generation method.
  • the embodiment of this specification also provides another data processing method, including:
  • frame image reconstruction is performed on the preset virtual viewpoint path to obtain the corresponding video frame of the multi-angle free-view video;
  • the target object in the video frame specified by the special effect generation instruction is acquired, the augmented reality special effect input data of the target object is acquired, and the corresponding augmented reality special effect input data is generated based on the target object's augmented reality special effect input data Virtual information image;
  • the composite video frame is displayed.
  • the generating a corresponding virtual information image based on the augmented reality special effect input data of the target object includes:
  • the preset first special effect generation method is used to generate the corresponding A virtual information image matching the target object in the video frame.
  • acquiring a target object in a video frame specified by the special effect generation instruction, and acquiring augmented reality special effect input data of the target object includes:
  • the historical data of the target object is acquired, and the historical data is processed according to the special effect output type to obtain the augmented reality special effect input data corresponding to the special effect output type.
  • the generating corresponding virtual information image based on the augmented reality special effect input data of the target object includes at least one of the following:
  • Input the augmented reality special effect input data of the target object into a preset three-dimensional model, and output the position of the target object in the video frame of the multi-angle free-view video based on the three-dimensional calibration to match the target object Virtual information image;
  • the augmented reality special effect input data of the target object is input into a preset machine learning model, and the position of the target object in the video frame of the multi-angle free-view video obtained based on the three-dimensional calibration is outputted with the target Virtual information image of object matching.
  • the combining the virtual information image with the designated video frame to obtain a combined video frame includes:
  • the virtual information image and the designated video frame are fused to obtain a fused video frame.
  • the displaying the composite video frame includes: inserting the composite video frame into the to-be-played video stream of the playback control device to be played through the playback terminal.
  • the method further includes:
  • the spliced image includes a first field and a second field, wherein the first field includes a preset in the image combination Pixel data of a frame image, the second field includes depth data of the image combination;
  • the interactive terminal In response to the image reconstruction instruction from the interactive terminal, determine the interactive frame time information at the interactive time, acquire the spliced image of the preset frame image in the image combination corresponding to the interactive frame time and the parameter data corresponding to the image combination and send it to the interactive
  • the terminal enables the interactive terminal to select corresponding pixel data and depth data and corresponding parameter data in the stitched image based on the virtual viewpoint position information determined by the interactive operation, and perform the selected pixel data and depth data
  • the combined rendering is used to reconstruct the video frame of the multi-angle free-view video corresponding to the virtual viewpoint position at the time of the interaction frame and play it.
  • the method further includes:
  • the method further includes:
  • an interactive instruction is generated, and the virtual information image corresponding to the spliced image of the preset video frame is acquired;
  • the virtual information image corresponding to the spliced image of the preset video frame is sent to the interactive terminal, so that the interactive terminal sends the video frame of the multi-angle free-view video corresponding to the virtual viewpoint position at the time of the interactive frame to the virtual
  • the information image is synthesized and processed to obtain synthesized video frames and displayed.
  • the method further includes: in response to the user-side special effect exit interaction instruction, stopping acquiring the virtual information image corresponding to the spliced image of the preset video frame.
  • the generating an interactive instruction in response to a user-side special effect from an interactive terminal to obtain a virtual information image corresponding to a spliced image of the preset video frame includes:
  • the acquiring a virtual information image matching the target object in the preset video frame includes:
  • the virtual information image corresponding to the spliced image of the preset video frame is sent to the interactive terminal, so that the interactive terminal displays the multi-angle free-view video corresponding to the virtual viewpoint position at the time of the interactive frame
  • the video frame and the virtual information image are combined to obtain a combined video frame, including:
  • the virtual information image corresponding to the spliced image of the preset video frame is sent to the interactive terminal, so that the interactive terminal superimposes the video frame of the multi-angle free-view video corresponding to the virtual viewpoint position at the time of the interactive frame
  • the virtual information image is described, and the superimposed composite video frame is obtained.
  • the embodiment of this specification also provides another data processing method, including:
  • the interactive terminal In response to the image reconstruction instruction from the interactive terminal, determine the interactive frame time information at the interactive time, acquire the spliced image of the preset frame image in the image combination corresponding to the interactive frame time and the parameter data corresponding to the image combination and send it to the interactive
  • the terminal enables the interactive terminal to select corresponding pixel data and depth data and corresponding parameter data in the stitched image based on the virtual viewpoint position information determined by the interactive operation, and perform the selected pixel data and depth data Combined rendering to reconstruct and play the video frame of the multi-angle free-view video corresponding to the virtual viewpoint position at the time of the interaction frame;
  • the virtual information image corresponding to the spliced image of the preset video frame is sent to the interactive terminal, so that the interactive terminal associates the video frame of the multi-angle free-view video corresponding to the virtual viewpoint position at the time of the interactive frame with the
  • the virtual information image is synthesized to obtain a synthesized video frame
  • the composite video frame is displayed.
  • the stitched image of the preset video frame is generated based on the pixel data and depth data of the image combination at the moment of the interaction frame, and the stitched image includes a first field and a second field, wherein the first field Includes the pixel data of the preset frame image in the image combination, and the second field includes the depth data of the image combination;
  • the image combination at the interactive frame moment is obtained based on intercepting multiple synchronized video frames at a specified frame moment from multiple synchronized video streams, and the multiple synchronized video frames include frame images with different shooting angles of view.
  • acquiring the virtual information image corresponding to the spliced image of the preset video frame indicated by the special effect generation interactive control instruction includes:
  • the embodiment of this specification also provides another data processing method, including:
  • the virtual information image and the corresponding video frame are synthesized and displayed.
  • the acquiring a virtual information image of a video frame corresponding to a specified frame time of the special effect display identifier in response to a triggering operation on a special effect display identifier in the image of the multi-angle free-view video includes:
  • the combining and displaying the virtual information image and the corresponding video frame includes:
  • the virtual information image is superimposed on the video frame at the designated frame time to obtain and display the superimposed composite video frame.
  • the embodiment of this specification provides a data processing system, including:
  • the target object obtaining unit is adapted to obtain the target object in the video frame of the multi-angle free-view video
  • the virtual information image acquisition unit is adapted to acquire the virtual information image generated based on the augmented reality special effect input data of the target object;
  • the image synthesis unit is adapted to perform synthesis processing on the virtual information image and the corresponding video frame to obtain a synthesized video frame;
  • the display unit is suitable for displaying the obtained composite video frame.
  • the embodiment of this specification provides another data processing system, including: a data processing device, a server, a playback control device, and a playback terminal, where:
  • the data processing device is adapted to obtain multiple synchronized video frames by intercepting the video frame at a specified frame time from multiple video data streams synchronized in real time from different locations in the field collection area based on a video frame interception instruction. Uploading multiple synchronized video frames at a specified frame time to the server;
  • the server is adapted to receive multiple synchronized video frames uploaded by the data processing device as an image combination, determine the parameter data corresponding to the image combination and the depth data of each frame image in the image combination, and based on the image Combine the corresponding parameter data, the pixel data and depth data of the preset frame image in the image combination, perform frame image reconstruction on the preset virtual viewpoint path, and obtain the corresponding video frame of the multi-angle free-view video; and respond to special effects Generate instructions to obtain the target object in the video frame specified by the special effect generation instruction, obtain the augmented reality special effect input data of the target object, and generate the corresponding virtual information image based on the augmented reality special effect input data of the target object , Performing synthesis processing on the virtual information image and the designated video frame to obtain a synthesized video frame, and input the synthesized video frame to a playback control device;
  • the playback control device is adapted to insert the composite video frame into the video stream to be played
  • the playback terminal is adapted to receive the to-be-played video stream from the playback control device and perform real-time playback.
  • system further includes an interactive terminal; wherein:
  • the server is further adapted to generate a spliced image corresponding to the image combination based on the pixel data and depth data of the image combination, the spliced image includes a first field and a second field, wherein the first field includes The pixel data of the preset frame image in the image combination, the second field includes the depth data of the image combination; and the stitched image of the image combination and the parameter data corresponding to the image combination are stored; and in response to The image reconstruction instruction of the interactive terminal determines the interactive frame time information at the interactive time, and obtains the spliced image of the preset frame image in the image combination corresponding to the interactive frame time and the parameter data corresponding to the image combination and sends it to the interactive terminal ;
  • the interactive terminal is adapted to send the image reconstruction instruction to the server based on the interactive operation, and select the corresponding pixel data and depth in the stitched image according to preset rules based on the virtual viewpoint position information determined by the interactive operation
  • the data and corresponding parameter data are combined to render the selected pixel data and depth data, and the video frame of the multi-angle free-view video corresponding to the virtual viewpoint position at the time of the interaction frame is reconstructed and played.
  • the server is further adapted to generate an interactive control instruction according to the server special effect, generate and store a virtual information image corresponding to the spliced image of the preset video frame indicated by the server special effect generation interactive control instruction.
  • the server is further adapted to generate an interactive instruction in response to a user-side special effect from an interactive terminal, obtain a virtual information image corresponding to a spliced image of the preset video frame, and convert the spliced image of the preset video frame Sending the corresponding virtual information image to the interactive terminal;
  • the interactive terminal is adapted to synthesize a video frame of a multi-angle free-view video corresponding to a virtual viewpoint position at the time of the interactive frame and the virtual information image to obtain a synthesized video frame and display it.
  • the embodiment of this specification provides a server, including:
  • the data receiving unit is adapted to receive multiple synchronized video frames at specified frame moments intercepted from multiple synchronized video streams as an image combination, and the multiple synchronized video frames include frame images of different shooting angles;
  • a parameter data calculation unit adapted to determine parameter data corresponding to the image combination
  • a depth data calculation unit adapted to determine the depth data of each frame of the image in the image combination
  • the video data acquisition unit is adapted to perform frame image reconstruction on the preset virtual viewpoint path based on the corresponding parameter data of the image combination, the pixel data and the depth data of the preset frame image in the image combination, to obtain the corresponding multi-angle Video frames of free-view videos;
  • the first virtual information image generation unit is adapted to respond to a special effect generation instruction, obtain a target object in a video frame specified by the special effect generation instruction, obtain augmented reality special effect input data of the target object, and based on the target object Input data of augmented reality special effects to generate corresponding virtual information images;
  • An image synthesizing unit adapted to synthesize the virtual information image and the designated video frame to obtain a synthesized video frame
  • the first data transmission unit is adapted to output the composite video frame to be inserted into the video stream to be played.
  • the first virtual information image generating unit is adapted to take the augmented reality special effect input data of the target object as input, and the video frame of the target object in the multi-angle free-view video obtained based on three-dimensional calibration Using the preset first special effect generation method to generate a virtual information image matching the target object in the corresponding video frame.
  • the embodiment of this specification provides another server, including:
  • the image reconstruction unit is adapted to determine the interactive frame time information at the interactive time in response to the image reconstruction instruction from the interactive terminal, and obtain the spliced image of the preset frame image in the image combination corresponding to the interactive frame time and the corresponding parameter data of the image combination ;
  • the virtual information image generating unit is adapted to generate a virtual information image corresponding to a spliced image of the image combination of the video frame indicated by the special effect generating interactive control instruction in response to the special effect generating interactive control instruction;
  • the data transmission unit is adapted to perform data interaction with an interactive terminal, and includes: transmitting the spliced image of the preset video frame in the image combination corresponding to the interactive frame moment and the parameter data corresponding to the image combination to the interactive terminal, so that The interactive terminal selects corresponding pixel data and depth data and corresponding parameter data in the stitched image based on the virtual viewpoint position information determined by the interactive operation, and performs combined rendering of the selected pixel data and depth data according to preset rules, The image of the multi-angle free-view video corresponding to the virtual viewpoint position at the time of the interactive frame is reconstructed and played; and the virtual information image corresponding to the spliced image of the preset frame image indicated by the interactive control instruction for special effect generation is transmitted to the
  • the interactive terminal enables the interactive terminal to synthesize the video frame of the multi-angle free-view video corresponding to the virtual viewpoint position at the time of the interactive frame and the virtual information image to obtain a multi-angle free-view synthesized video frame and play it.
  • the embodiment of this specification also provides an interactive terminal, including:
  • the first display unit is adapted to display images of multi-angle free-view video in real time, wherein the image of the multi-angle free-view video is parameter data of an image combination formed by a plurality of synchronized video frame images at a specified frame time, The pixel data and the depth data of the image combination are reconstructed, and the multiple synchronized video frames include frame images of different shooting angles;
  • the special effect data obtaining unit is adapted to obtain a virtual information image corresponding to a specified frame time of the special effect display flag in response to a triggering operation of a special effect display mark in the multi-angle free-view video image;
  • the second display unit is adapted to superimpose and display the virtual information image on a video frame of the multi-angle free-view video.
  • the embodiment of this specification provides an electronic device including a memory and a processor.
  • the memory stores computer instructions that can run on the processor.
  • the processor executes any of the foregoing implementations when the computer instructions are executed. Example of the steps of the method described.
  • the embodiments of this specification provide a computer-readable storage medium on which computer instructions are stored, and the computer instructions execute the steps of the method described in any of the foregoing embodiments when the computer instructions are run.
  • the augmented reality based on the target object is obtained.
  • the virtual information image generated by the special effect input data, and the virtual information image and the corresponding video frame are synthesized and displayed.
  • the image combination is The pixel data and depth data at the preset frame time are reconstructed from the preset virtual view path, and there is no need to reconstruct based on all the video frames in the multi-channel synchronized video stream, so the amount of data processing and data transmission can be reduced Reduce the transmission delay of multi-angle free-view video.
  • a virtual information image matching the position of the target object is obtained, and the obtained virtual information image can be made to match the position of the target object.
  • the position of the target object in the three-dimensional space is more matched, and the displayed virtual information image is more in line with the real state in the three-dimensional space. Therefore, the displayed composite video frame is more realistic and vivid, which can enhance the user's visual experience.
  • the virtual information image at the corresponding frame time is sorted according to the frame time sequence and the virtual viewpoint position at the corresponding frame time.
  • Perform synthesis processing and display with the video frame at the time of the target frame, and the virtual information image in the resulting synthesized video frame can be synchronized with the target object in the image frame of the multi-angle free-view video, so that the synthesized video frame is more Vivid and vivid, enhance the user's sense of immersion in watching the multi-angle free-view video, and further improve user experience.
  • the corresponding parameter data and all parameters of the image combination are determined.
  • the depth data of each frame of the image in the image combination on the one hand, based on the corresponding parameter data of the image combination, the pixel data and the depth data of the preset frame image in the image combination, the frame image is performed on the preset virtual viewpoint path Reconstruction to obtain the corresponding video frame of the image of the multi-angle free-view video; on the other hand, in response to the special effect generation instruction, obtain the target object in the video frame specified by the special effect generation instruction, and obtain the augmented reality of the target object Special effect input data, and based on the augmented reality special effect input data of the target object, a corresponding virtual information image is generated, and the virtual information image and the designated video frame are synthesized to obtain and display the synthesized video frame.
  • the interception of synchronized video frames, the reconstruction of multi-angle free-view videos, the generation of virtual information images, and the synthesis of multi-angle free-view videos and virtual information images are all completed by different devices.
  • This distributed system architecture can avoid The same device performs a large amount of data processing, so the data processing efficiency can be improved and the transmission delay can be reduced.
  • the virtual information image corresponding to the spliced image of the preset video frame indicated by the special effect generation interactive control instruction is acquired and sent to the interactive terminal,
  • the interactive terminal is made to synthesize the video frame of the multi-angle free-view video corresponding to the virtual viewpoint position at the interactive frame time and the virtual information image to obtain and display the synthesized video frame, which can satisfy the richness of the user's visual experience And real-time interaction needs to enhance user interaction experience.
  • Figure 1 shows a schematic structural diagram of a data processing system in a specific application scenario in an embodiment of this specification
  • Figure 2 shows a flow chart of a data processing method in an embodiment of this specification
  • Figure 3 shows a schematic structural diagram of a data processing system in an embodiment of this specification
  • FIG. 4 shows a flowchart of another data processing method in an embodiment of this specification
  • FIG. 5 shows a schematic diagram of a video frame image in an embodiment of this specification
  • Figure 6 shows a schematic diagram of a three-dimensional calibration method in an embodiment of this specification
  • FIG. 7 shows a flowchart of another data processing method in an embodiment of this specification.
  • FIG. 13 shows a schematic diagram of an interactive interface of another interactive terminal in an embodiment of this specification
  • FIG. 14 shows a flowchart of another data processing method used in the embodiment of this specification.
  • Figure 15 shows a schematic structural diagram of another data processing system in an embodiment of this specification.
  • Figure 16 shows a schematic structural diagram of another data processing system in an embodiment of this specification.
  • FIG. 17 shows a schematic diagram of a server cluster architecture in an embodiment of this specification.
  • Figures 18 to 20 show schematic diagrams of video effects of a play interface of a play terminal in an embodiment of this specification
  • Figure 21 shows a schematic structural diagram of another interactive terminal in an embodiment of the present invention.
  • Figure 22 shows a schematic structural diagram of another interactive terminal in an embodiment of the present invention.
  • Figures 23 to 26 show schematic diagrams of video effects of a display interface of an interactive terminal in an embodiment of this specification
  • Figure 27 shows a schematic structural diagram of a server in an embodiment of this specification
  • Figure 28 shows a schematic structural diagram of a server in an embodiment of this specification
  • Figure 29 shows a schematic structural diagram of another server in an embodiment of this specification.
  • 6DoF 6 Degrees of Freedom
  • Users can adjust the viewing angle of the video through interactive means during the viewing process, and watch from the free point of view they want to watch, thus greatly To enhance the viewing experience.
  • low-latency playback of multi-angle free-view video can be applied to application scenarios such as live broadcast and relay broadcast, and can also be applied to video playback based on user interaction.
  • FIG. 1 shows the layout of a data processing system for a basketball game.
  • the data processing system 10 includes a collection array composed of multiple collection devices. 11. Data processing equipment 12, cloud server cluster 13, playback control device 14, playback terminal 15 and interactive terminal 16.
  • the reconstruction of a multi-angle free-view video can be realized, and the user can watch a low-latency multi-angle free-view video.
  • the basketball hoop on the left is taken as the core point of view
  • the core point of view is taken as the center of the circle
  • the fan-shaped area on the same plane as the core point of view is used as the preset multi-angle free viewing angle range.
  • the collection devices in the collection array 11 can be fan-shaped and placed in different positions in the field collection area according to the preset multi-angle free viewing angle range, and can simultaneously collect video data streams from corresponding angles in real time.
  • the data processing device 12 may send a streaming instruction to each acquisition device in the acquisition array 11 through a wireless local area network, and each acquisition device in the acquisition array 11 will obtain a streaming instruction based on the streaming instruction sent by the data processing device 12
  • the video data stream of is transmitted to the data processing device 12 in real time.
  • the data processing device 12 When the data processing device 12 receives the video frame interception instruction, it intercepts the video frames at the specified frame time from the received multiple video data streams to obtain multiple synchronized video frames, and then obtains the obtained video frame at the specified frame time. Multiple synchronized video frames are uploaded to the server cluster 13 in the cloud.
  • the cloud server cluster 13 uses the received multiple synchronized video frames as image combinations, determines the corresponding parameter data of the image combination and the depth data of each frame of the image combination, and determines the corresponding image combination based on the image combination.
  • the parameter data, the pixel data and the depth data of the preset frame image in the image combination, the frame image reconstruction is performed on the preset virtual viewpoint path, and the corresponding video frame of the multi-angle free-view video is obtained.
  • the server cluster 13 in the cloud can store the pixel data and depth data of the image combination in the following manner:
  • a spliced image corresponding to the frame time is generated, the spliced image includes a first field and a second field, wherein the first field includes a preset frame image in the image combination
  • the second field includes the second field of the depth data of the preset frame image in the image combination.
  • the playback control device 14 can insert the received video frame of the multi-angle free-view video into the data stream to be played, and the playback terminal 15 receives the data stream to be played from the playback control device 14 and plays it in real time.
  • the playback control device 14 may be a manual playback control device or a virtual playback control device.
  • a dedicated server that can automatically switch video streams can be set as a virtual playback control device to control the data source.
  • a broadcast director control device such as a broadcast director station may be used as a broadcast control device in the embodiment of the present invention.
  • the server cluster 13 in the cloud When the server cluster 13 in the cloud receives the image reconstruction instruction from the interactive terminal 16, it can extract the spliced image of the preset video frame in the corresponding image combination and the corresponding parameter data of the corresponding image combination and transmit it to the interactive terminal 16. .
  • the interactive terminal 16 determines the interactive frame time information based on the trigger operation, sends an image reconstruction instruction containing the interactive frame time information to the server cluster 13, and receives the preset video frame in the image combination corresponding to the interactive frame time returned from the server cluster 13 in the cloud.
  • the combined rendering is used to reconstruct the video frame of the multi-angle free-view video corresponding to the virtual viewpoint position at the time of the interaction frame and play it.
  • the entities in the video are not completely static.
  • the entities collected by the collection array such as athletes, basketballs, and referees, are mostly in motion.
  • the texture data and pixel data in the image combination of the collected video frames also constantly change with time.
  • the user can directly watch the video inserted into the multi-angle free-view video frame through the playback terminal 15, such as watching a live basketball game; on the other hand, the user can watch the video through the interactive terminal 16 through interaction Operation, you can watch the multi-angle free-view video at the moment of the interactive frame.
  • the above data processing system 10 may also include only the playing terminal 15 or only the interactive terminal 16, or the same terminal device may be used as the playing terminal 15 and the interactive terminal 16.
  • the data volume of multi-angle free-view video is relatively large, and the virtual information image data corresponding to AR special effects usually has a large data volume.
  • the implantation of AR special effects into the reconstructed multi-angle free-view video will also involve the processing of a large amount of data, as well as the coordination of multiple devices. Data processing and transmission bandwidth resources are even more difficult to achieve. Therefore, how to embed AR special effects to meet the user's visual experience needs in the process of playing multi-angle free-view videos has become a difficult problem to solve.
  • S21 Acquire a target object in a video frame of a multi-angle free-view video.
  • it can be based on the parameter data corresponding to the image combination formed by multiple synchronized video frames at the specified frame time intercepted from the multiple synchronized video streams, and the pixel data and depth data of the preset frame image in the image combination , Performing frame image reconstruction on the preset virtual viewpoint path to obtain the video frame of the multi-angle free-view video, wherein the multiple synchronized video frames include frame images of different shooting angles.
  • certain objects in the images of the multi-angle free-view video can be determined as target objects based on certain indication information (for example, special effect display signs).
  • the indication information may be generated based on user interaction, or may be based on certain presets. Set trigger condition or third-party instruction to get.
  • an interactive control instruction may be generated in response to a special effect to obtain the target object in the video frame of the multi-angle free-view video, and the instruction information may be set in the interactive control instruction, and the instruction information may specifically be the target object The identification information.
  • the specific form of the indication information corresponding to the target object may be determined based on the multi-angle free-view video frame structure.
  • the target object may be a video frame in a multi-angle free-view video or a specific entity in a video frame sequence, for example, a specific person, animal, object, light beam and other environmental fields, environmental space, etc.
  • the specific form of the target object is not limited in the embodiments of this specification.
  • the multi-angle free-view video may be a 6DoF video.
  • the implanted AR special effects are presented in the form of virtual information images.
  • the virtual information image may be generated based on the augmented reality special effect input data of the target object. After the target object is determined, the virtual information image generated based on the augmented reality special effect input data of the target object can be acquired.
  • the virtual information image corresponding to the target object may be generated in advance, or may be generated instantly in response to a special effect generation instruction.
  • a virtual information image matching the position of the target object can be obtained, so that the obtained virtual information
  • the image and the position of the target object in the three-dimensional space are more matched, and the displayed virtual information image is more in line with the real state in the three-dimensional space, so the displayed composite video frame is more realistic and vivid, and the user's visual experience is enhanced.
  • the virtual information image corresponding to the target object may be generated according to a preset special effect generation method.
  • S23 Perform synthesis processing and display of the virtual information image and the corresponding video frame.
  • the synthesized video frame obtained after the synthesis processing can be displayed on the terminal side.
  • the obtained composite video frame may be a single frame or multiple frames. If it is a multi-frame, the virtual viewpoint image of the corresponding frame time and the video frame of the corresponding frame time can be synthesized and displayed according to the order of the frame time and the virtual view point position of the corresponding frame time.
  • the video frame is synthesized, so as to automatically generate a synthesized video frame matching the virtual viewpoint position of the corresponding frame time with the change of the virtual viewpoint, so that the augmented reality effects of the synthesized video frame obtained are more vivid and vivid , It can further enhance the user's visual experience.
  • Example 1 Perform fusion processing on the virtual information image and the corresponding video frame to obtain a fusion video frame, and display the fusion video frame;
  • Example 2 The virtual information image is superimposed on the corresponding video frame to obtain a superimposed composite video frame, and the superimposed composite video frame is displayed.
  • the obtained synthesized video frame can be directly displayed; or the obtained synthesized video frame can be inserted into the video stream to be played for playback and display.
  • the merged video frame may be inserted into the to-be-played video stream for playback and display.
  • the target object in the video frame of the multi-angle free-view video is obtained, and then the augmented reality special effect input data generated based on the target object is obtained.
  • the virtual information image and the corresponding video frame are synthesized and displayed. Through this process, it is only necessary to synthesize the video frame that needs to be implanted with AR special effects and the virtual information image corresponding to the target object in the video frame during the multi-angle free-view video playback process to obtain a video that incorporates AR special effects.
  • the implantation of virtual information images corresponding to AR special effects in the multi-angle free-view video is suitable for a variety of application scenarios.
  • the following interactive and The two application scenarios of non-interactive are expounded separately.
  • the non-interactive application scenario in this application scenario, the user watches the multi-angle free-view video embedded with AR special effects without user interaction triggering, and the timing, location, implant content, etc. of the AR special effects can be controlled on the server side. With the playback of the video stream on the terminal side, the user can see the automatic display of the multi-angle free-view video embedded with AR special effects.
  • the process of live broadcast or quasi-live broadcast by implanting AR special effects in the multi-angle free-view video, it is possible to generate multi-angle free-view video composite video frames with AR special effects, which can satisfy users for low-latency video playback and rich vision. The need for experience.
  • the user can actively trigger the implantation of AR special effects during the multi-angle free-view video watching process.
  • AR can be quickly implanted in the multi-angle free-view video.
  • the solution avoids the phenomenon that the video playback process is stuck due to the long duration of the generation process, so that based on user interaction, it is really annoying to generate multi-angle free-view video synthesis video with AR special effects implanted, and meet the user's low-latency video playback And enrich the visual experience.
  • interactive control instructions can be generated in response to special effects on the user side to obtain the target object in the video frame of the multi-angle free-view video.
  • the virtual information image generated based on the augmented reality special effect input data of the target object can be acquired, and the virtual information image and the corresponding video frame of the multi-angle free-view video can be synthesized and displayed.
  • the virtual information image corresponding to the target object may be generated in advance, or may be generated immediately.
  • it can be generated in response to a special effect generation instruction on the server side;
  • an interactive scene it can be generated in advance in response to a special effect generation instruction on the server side, or an interactive control instruction can be generated in response to a special effect of an interactive terminal.
  • the target object may be a specific entity in the image, for example, a specific person, animal, object, environmental space, etc.
  • the target object indication information in the interactive control instruction may be generated according to the special effect (For example, a special effect display identifier) to obtain an augmented reality special effect input data of the target object, and based on the augmented reality special effect input data of the target object, according to a preset special effect generation method, generate the corresponding target object Virtual information image.
  • the special effect for example, a special effect display identifier
  • the data for generating the multi-angle free-view video and the augmented reality special effects can be input in advance All or part of the data, such as data, is pre-downloaded to the interactive terminal.
  • Some or all of the following operations can be performed on the interactive terminal: reconstruction of multi-angle free-view videos, generating virtual information images, and rendering and virtual information of video frames of multi-angle free-view videos
  • Image superimposition and rendering can also generate multi-angle free-view video and virtual information images on the server (such as a cloud server), and only perform the composite operation of the multi-angle free-view video video frame and the corresponding virtual information image on the interactive terminal.
  • the multi-angle free-view video composite video frame can be inserted into the data stream to be played.
  • a multi-angle free-view video containing a composite video frame it can be used as one of the multiple data streams to be played, and used as the video stream to be selected to be played.
  • the video stream containing video frames with multiple angles and free viewing angles can be used as an input video stream of a playback control device (such as a director control device) for the playback control device to select and use.
  • the same user may not only need to watch multi-angle free-view videos with AR special effects in non-interactive scenes, but also have to watch multi-angle free-view videos with AR special effects in interactive scenes.
  • a user may return to watching a replay video for a certain wonderful picture or a video during a certain period of time that he missed.
  • the user may return to watching a replay video for a certain wonderful picture or a video during a certain period of time that he missed.
  • the user may return to watching a replay video for a certain wonderful picture or a video during a certain period of time that he missed.
  • the user interactive needs can be met.
  • a data processing system adopting a distributed system architecture, for a received image combination formed by multiple synchronized video frames at a specified frame time intercepted from multiple video streams, by determining the image combination
  • the corresponding parameter data and the depth data of each video frame in the image combination are based on the corresponding parameter data of the image combination, the pixel data and the depth data of the preset video frame in the image combination, and the preset
  • the virtual viewpoint path is used to reconstruct the frame image, and the corresponding multi-angle free-view video video frame can be obtained; on the other hand, in response to the special effect generation instruction, the target object in the video frame specified by the special effect generation instruction can be obtained, and all
  • the augmented reality special effect input data of the target object, and based on the augmented reality special effect input data of the target object a corresponding virtual information image is generated, and the virtual information image is synthesized with the specified video frame to obtain a composite
  • the video frame also shows a schematic structural diagram of a data processing system with reference
  • the data processing device 31 can intercept the video frames (including individual frame images) collected by the collection array in the field collection area. By intercepting the video frames that are to generate multi-angle free-view images, a large amount of data transmission and data processing can be avoided.
  • the server 32 generates the video frame of the multi-angle free view video, and in response to the special effect generation instruction, performs the generation of the virtual information image, and the synthesis of the virtual information image and the video frame of the multi-angle free view video After processing, the multi-angle free-view video composite video frame can be obtained.
  • the powerful computing power of the server 32 can be fully utilized to quickly generate the multi-angle free-view video composite video frame, which can be inserted into the to-be-played data stream of the playback control device 33 in time.
  • the playback of multi-angle free-view videos integrated with AR special effects can be realized at a low cost to meet the needs of users for low-latency video playback and rich visual experience.
  • the video data can be processed through the following steps:
  • the data processing device can intercept and upload multiple video frames at a specified frame time from the multiple synchronized video streams according to the received video frame interception instruction, for example, can be uploaded to a cloud server or a service cluster.
  • a collection array composed of multiple collection devices can be deployed in different locations in the field collection area, and the collection array can simultaneously collect multiple video data streams in real time and upload them to the data processing device.
  • the data processing device is receiving When the video frame interception instruction is reached, the video frame at the corresponding frame time can be intercepted from the multiple video data streams according to the information of the designated frame time contained in the video frame interception instruction.
  • the designated frame time may be in units of frames, the Nth to Mth frames are regarded as the designated frame time, N and M are both integers not less than 1, and N ⁇ M; or, the designated frame time may also be The unit of time is the X to Y second as the designated frame time, X and Y are both positive numbers, and X ⁇ Y. Therefore, the multiple synchronized video frames may include all frame-level synchronized video frames corresponding to a specified frame moment, and the pixel data of each video frame forms a corresponding frame image.
  • the data processing equipment can obtain the specified frame time as the second frame of the multiple video data streams, and the data processing equipment intercepts the video frames of the second frame of each video data stream. And the frame-level synchronization between the video frames of the second frame of each intercepted video data stream is used as the obtained multiple synchronized video frames.
  • the data processing device can obtain the video frames within the first second of the multi-channel video data stream at the specified frame time according to the received video frame interception instruction ,
  • the data processing equipment can respectively intercept 25 video frames in the first second in each video data stream, and the first video frame in the first second in each intercepted video data stream can be synchronized at the frame level.
  • the frame-level synchronization between the second video frame in the first second in each video data stream, until the frame-level synchronization between the 25th video frame in the first second in each video data stream is taken as the acquisition Multiple synchronized video frames obtained.
  • the data processing device can obtain the second and third frames of the multiple video data streams at the specified frame time according to the received video frame interception instruction, and the data processing device can respectively intercept the first and third frames in each video data stream. 2 frames of video frames and 3rd frame of video frames, and frame-level synchronization between the 2nd frame of each video data stream and the 3rd frame of video frames respectively, as multiple synchronized video frames .
  • the multiple video data streams may be video data streams in a compressed format, or video data streams in a non-compressed format.
  • the parameter data corresponding to the image combination can be obtained through a parameter matrix
  • the parameter matrix can include an internal parameter matrix, an external parameter matrix, a rotation matrix and a translation matrix, and the like.
  • a structure from motion (SFM) algorithm can be used to perform feature extraction, feature matching, and global optimization on the acquired image combination based on a parameter matrix, and the obtained parameter estimation value is used as the image combination The corresponding parameter data.
  • the algorithm used for feature extraction can include any of the following: Scale-Invariant Feature Transform (SIFT) algorithm, Speeded-Up Robust Features (SURF) algorithm, and accelerated segment test features ( Features from Accelerated Segment Test, FAST) algorithm.
  • Algorithms used for feature matching may include: Euclidean distance calculation method, Random Sample Consensus (RANSC) algorithm, etc.
  • Algorithms for global optimization may include: Bundle Adjustment (BA) and so on.
  • S43 Determine the depth data of each frame of image in the image combination.
  • the depth data of each frame image may be determined based on multiple frame images in the image combination.
  • the depth data may include depth values corresponding to pixels of each frame of image in the image combination.
  • the distance from the collection point to each point in the scene can be used as the aforementioned depth value, and the depth value can directly reflect the geometric shape of the visible surface in the area to be viewed.
  • the depth value may be the distance from each point in the scene to the optical center along the shooting optical axis.
  • the above-mentioned distance may be a relative value, and multiple frames of images may use the same reference.
  • a binocular stereo vision algorithm may be used to calculate the depth data of each frame of image.
  • the depth data can also be indirectly estimated by analyzing the characteristics of the frame image, such as luminosity characteristics, light and dark characteristics.
  • a multi-view stereo (MVS) algorithm may be used to reconstruct the frame image.
  • all pixels can be used for reconstruction, or the pixels can be down-sampled and only part of the pixels can be used for reconstruction.
  • the pixel points of each frame image can be matched, the three-dimensional coordinates of each pixel point can be reconstructed to obtain points with image consistency, and then the depth data of each frame image can be calculated.
  • the pixel points of the selected frame image can be matched, the three-dimensional coordinates of the pixel points of each selected frame image can be reconstructed to obtain points with image consistency, and then the depth data of the corresponding frame image can be calculated.
  • the pixel data of the frame image corresponds to the calculated depth data.
  • the method of selecting the frame image can be set according to the specific situation. For example, the distance between the frame image of the depth data and other frame images can be calculated according to the needs, and the selected part Frame image.
  • the pixel data of the frame image can be any of YUV data or RGB data, or other data that can express the frame image;
  • the depth data can include a one-to-one correspondence with the pixel data of the frame image
  • the depth value or, may be a partial value selected from a set of depth values corresponding to the pixel data of the frame image one-to-one, and the specific selection method depends on the specific scenario;
  • the virtual viewpoint is selected from a range of multi-angle free viewing angles,
  • the multi-angle free viewing angle range is a range that supports switching of viewpoints of the area to be viewed.
  • the preset frame image may be all the frame images in the image combination, or may be a selected partial frame image.
  • the selection method can be set according to the specific situation, for example, according to the positional relationship between the collection points, part of the frame images in the corresponding position in the image combination can be selected; for example, according to the desired frame time or frame period , Select part of the frame image of the corresponding frame time in the image combination.
  • each virtual viewpoint in the virtual viewpoint path can be corresponded to each frame time, and the corresponding frame image can be obtained according to the frame time corresponding to each virtual viewpoint, and then based on The image combines the corresponding parameter data, the depth data and the pixel data of the frame image corresponding to the frame time of each virtual view point, and the frame image reconstruction is performed on each virtual view point to obtain the corresponding video frame of the multi-angle free view video. Therefore, in specific implementation, in addition to realizing a multi-angle free-view image at a certain moment, it can also realize a sequential or non-continuous multi-angle free-view video.
  • the pixel data and depth data of the frame images of a2 synchronized video frames at the second frame time, the second frame image reconstruction is performed on the path composed of b2 virtual viewpoints, and the corresponding video frame of the multi-angle free-view video is finally obtained.
  • the designated frame time and the virtual viewpoint can be divided into more fine-grained divisions, thereby obtaining more synchronized video frames and virtual viewpoints corresponding to different frame moments, realizing free conversion of viewpoints over time, and increasing multiple angles The smoothness of free-view video viewpoint switching.
  • a depth map-based image rendering (DIBR) algorithm may be used to combine the corresponding parameter data and the preset virtual viewpoint path according to the image, and the pixels of the preset frame image The data and depth data are combined and rendered, so as to realize the frame image reconstruction based on the preset virtual viewpoint path, and obtain the corresponding video frames of the multi-angle free-view video.
  • DIBR depth map-based image rendering
  • the augmented reality special effect input data of the target object can be used as input, and the target object is obtained by three-dimensional calibration in the video frame of the multi-angle free-view video. Position, and use the preset first special effect generation method to generate a virtual information image matching the target object in the corresponding video frame.
  • a preset number of pixels can be selected from it, based on the parameter data of the video frame and the video
  • the real physical space parameter corresponding to the frame determines the spatial position of the preset number of pixels, and then the accurate position of the target object in the video frame can be determined.
  • the video frame P50 shown in Figure 5 shows an image during a basketball game.
  • the pixel points A, B, C, and D corresponding to the four vertices of the restricted area of the basketball court are selected, combined with reality
  • the parameters of the basketball court can be calibrated through the parameters of the camera corresponding to the video frame.
  • the three-dimensional position information of the court in the corresponding virtual camera can be obtained according to the parameters of the virtual camera, so as to realize the video frame containing the basketball court Accurate calibration of the three-dimensional space position relationship.
  • pixels in the video frame can also be selected for three-dimensional calibration to determine the position of the target object corresponding to the special effect generation instruction in the video frame.
  • the pixels corresponding to static objects in the image are preferentially selected for three-dimensional calibration.
  • the selected pixel can be one or multiple.
  • the contour points or vertices of regular objects in the image can be selected first for three-dimensional calibration.
  • the generated virtual three-dimensional virtual information image can be accurately integrated with the multi-angle free-view video itself describing the real world at any position, any angle of view, and any point of view in the three-dimensional space.
  • the seamless integration of virtual and reality can be realized, and the virtual information image and the video frame of the multi-angle free-view video can be dynamically synchronized and unified during the playback process. Therefore, the multi-angle free-view composite video frame obtained by the synthesis process can be combined. It is more natural and lifelike, so it can greatly enhance the user's visual experience.
  • a server (such as a cloud server) can automatically generate special effect generation instructions, or can generate corresponding server-side special effect generation interactive control instructions in response to server-side user interaction operations.
  • the cloud server can automatically select the image combination to be implanted with the AR special effect as the image combination specified by the special effect generation instruction through a preset AI recognition algorithm, and obtain the virtual information image corresponding to the specified image combination.
  • a server user can specify an image combination through an interactive operation, and when the server receives a server special effect generation interactive control instruction based on the server special effect generation interactive control operation, it can obtain it from the server special effect generation interactive command
  • the specified image combination can further obtain the virtual information image corresponding to the image combination specified by the special effect generation instruction.
  • the virtual information image corresponding to the image combination specified by the special effect generation instruction can be directly obtained from the preset storage space, or the matching virtual information image can be generated instantly according to the image combination specified by the special effect generation instruction .
  • the target object may be taken as the center, the target object in the video frame may be identified first, and then the augmented reality special effect input data of the target object may be obtained, and then the The augmented reality special effect input data is used as input, and a preset first special effect generation method is used to generate a virtual information image matching the target object in the video frame.
  • the target object in the video frame can be recognized by image recognition technology, for example, the target object in the special effect area is recognized as a person (such as a basketball player), an object (such as a basketball, a scoreboard) ), an animal (such as a cat or a lion), etc.
  • an interactive control instruction may be generated in response to a special effect on the server side, and the augmented reality special effect input data of the target object may be obtained.
  • the server special effect generation interactive control instruction corresponding to the interactive operation can be generated correspondingly, and the interactive control instruction is generated according to the server special effect.
  • an interactive control instruction may be generated according to the special effect of the server first to determine the output type of the special effect, and then the historical data of the target object is obtained, and the historical data is processed according to the special effect data type to obtain and The augmented reality special effect input data corresponding to the special effect output type.
  • an interactive control instruction is generated according to the server-side special effects, and the shooting percentage of the position where the target object is to be obtained by the server-side user is obtained, and the distance from the position of the target object to the Nets can be calculated The distance from the center of the ground projection position, and the historical shooting data within this distance of the target object is acquired as the augmented reality special effect input data of the target object.
  • the server user can perform interactive control operations through the corresponding interactive control device, generate interactive control operations based on the special effects of the server user, and obtain corresponding server special effects to generate interactive control instructions.
  • the server user can select the target object for which special effects are to be generated through interactive operations.
  • the user can also select the augmented reality special effect input data of the target object, such as the data type and data range of the augmented reality special effect input data (which can be selected based on time or geographic space).
  • server-side special effect generation interactive control instructions can also be automatically generated by the server-side, and the server-side can realize autonomous decision-making through machine learning, and select the image combination, target object, and target object of the video frame to be implanted with the special effect
  • the augmented reality special effect input data may be input to a preset three-dimensional model for processing, to obtain a virtual information image matching the target object in the video frame.
  • the augmented reality special effect input data After inputting the augmented reality special effect input data into a preset three-dimensional model, three-dimensional graphic elements matching the augmented reality special effect input data can be obtained and combined, and the display metadata in the augmented reality special effect data can be combined. And the three-dimensional graphic element data is output as a virtual information image matching the target object in the video frame.
  • the three-dimensional model may be a three-dimensional model obtained by performing three-dimensional scanning of an actual item, or a constructed virtual model.
  • the virtual model may include a virtual item model and an avatar model, where the virtual item may be a virtual magic
  • the avatar model can be an imaginary character or animal model, such as the three-dimensional model of the legendary Nezha, and the three-dimensional model of the virtual unicorn and dragon.
  • the augmented reality special effect input data may be used as input data and input to a preset machine learning model for processing to obtain a virtual information image matching the target object in the video frame.
  • the preset machine learning model can be a supervised learning model, an unsupervised learning model, or a semi-supervised learning model (combined model of a supervised learning model and an unsupervised learning model) ,
  • the specific model used is not limited in the embodiments of this specification.
  • the use of a machine learning model to generate the virtual information image includes two stages: a model training stage and a model application stage.
  • the training sample data can be used as input data, input to a preset machine learning model for training, and the parameters of the machine learning model can be adjusted.
  • the training sample data can include images and videos collected in various real physical spaces, or virtual images or videos generated by manual modeling.
  • the machine learning model after training can automatically generate corresponding 3D images and 3D images based on the input data. Video, and corresponding sound effects, etc.
  • the model application stage use the augmented reality special effect input data as input data and input it into the trained machine learning model, which can automatically generate an augmented reality special effect model that matches the input data, that is, the augmented reality special effect model that matches the video frame.
  • the form of the virtual information image generated is different according to the three-dimensional model used or the machine learning model used.
  • the generated virtual information image may be a static image, or a dynamic video frame such as an animation, or even a video frame containing audio data.
  • S46 Perform synthesis processing on the virtual information image and the designated video frame to obtain a synthesized video frame.
  • the virtual information image and the designated video frame may be fused to obtain a fused video frame implanted with AR special effects.
  • the synthesized composite video frame is inserted into the to-be-played video stream of the playback control device for playback through the playback terminal.
  • the playback control device can take multiple video streams as input, where the video stream can come from each collection device in the collection array or from other collection devices.
  • the playback control device can select one input video stream as the video stream to be played according to needs.
  • the composite video frame of the multi-angle free-view video obtained in step S46 can be inserted into the video stream to be played, or a video stream from another input interface. Switch to the input interface containing the multi-angle free-view video composite video frame, and the playback control device will output the selected video stream to be played to the playback terminal, which can then be played through the playback terminal. Therefore, the user can watch more than that through the playback terminal.
  • Video frames with free angles of view can also be viewed through the playback terminal to view composite video frames with multiple angles of free angles implanted with AR special effects.
  • the playback terminal may be a video playback device such as a TV, a mobile phone, a tablet, a computer, or other types of electronic devices including a display screen or a projection device.
  • a video playback device such as a TV, a mobile phone, a tablet, a computer, or other types of electronic devices including a display screen or a projection device.
  • the multi-angle free-view video composite video frame of the to-be-played video stream inserted into the playback control device can be retained in the playback terminal to facilitate time-shift viewing by the user, where the time-shift can be a pause while the user is watching , Rewind, fast forward to the current moment and other operations.
  • the parameter data corresponding to the image combination and the images of each frame in the image combination are determined.
  • the preset virtual viewpoint path is reconstructed to obtain the corresponding multi-angle
  • the video frame of the free-view video in response to the special effect generation instruction, the target object in the video frame obtains the augmented reality special effect input data of the target object, and generates based on the augmented reality special effect input data of the target object
  • the corresponding virtual information image, and the virtual information image and the designated video frame are synthesized to obtain a synthesized video frame, and then the synthesized video frame is inserted into the to-be-played video stream of the playback control device for use Through the playback terminal to play, you can realize the multi-angle free viewing
  • This distributed system architecture can save a lot of transmission resources and server processing resources, and under the condition of limited network transmission bandwidth, it can realize synthesis with augmented reality special effects Video frames are generated in real-time or near real-time, so it can realize low-latency playback of multi-angle free-view composite video frames implanted with AR special effects, so it can take into account the dual needs of users for rich visual experience and low-latency during video watching.
  • the interception of synchronized video frames in multiple video streams, and the generation of video frames of multi-angle free-view videos based on the combination of images formed by multiple synchronized video frames, are obtained and the special effect generation instructions are obtained.
  • the virtual information image corresponding to the specified image combination, and the steps of synthesizing the virtual information image and the specified image combination to obtain a synthesized video frame can be completed by different hardware devices in cooperation, that is, using a distributed processing architecture .
  • the image combination may be preset according to the relationship between the virtual parameter data of each virtual viewpoint in the preset virtual viewpoint path and the parameter data corresponding to the image combination
  • the depth data of the video frames are respectively mapped to the corresponding virtual viewpoints; according to the pixel data and depth data of the preset video frames respectively mapped to the corresponding virtual viewpoints, and the preset virtual viewpoint path, the frame image is reconstructed to obtain the corresponding The video frame of the multi-angle free-view video.
  • the virtual parameter data of the virtual viewpoint may include: virtual viewing position data and virtual viewing angle data; the parameter data corresponding to the image combination may include: collecting position data, shooting angle data, and the like.
  • the forward mapping can be used first, and then the reverse mapping method can be used to obtain the reconstructed video frame.
  • the collected position data and shooting angle data may be referred to as external parameter data
  • the parameter data may also include internal parameter data
  • the internal parameter data may include attribute data of the collection device, so that the mapping relationship can be determined more accurately.
  • the internal parameter data may include distortion data. Since distortion factors are taken into consideration, the mapping relationship can be further accurately determined spatially.
  • the following steps can be adopted to obtain the multi-angle free-view video composite video frame implanted with AR special effects:
  • S71 Display video frames of the multi-angle free-view video in real time.
  • the video frame of the multi-angle free-view video is reconstructed based on the parameter data of the image combination formed by multiple synchronized video frames at the specified frame time, the pixel data and the depth data of the image combination, and the multiple Synchronized video frames include frame images of different shooting angles.
  • the reconstruction method of the multi-angle free-view video frame reference may be made to the introduction of the foregoing embodiment, and the description is not repeated here.
  • S73 Perform synthesis processing and display of the virtual information image and the corresponding video frame.
  • the superimposition position of the virtual information image in the video frame of the multi-angle free-view video can be determined based on the special effect display identifier, and then the virtual information image can be placed at the determined superimposition position Perform overlay display.
  • the interactive terminal T80 plays the video in real time, where, as described in step S71, referring to FIG. 8, a video frame P80 is displayed.
  • the interactive terminal The displayed video frame P81 contains multiple special effect display identifiers such as the special effect display identifier I1.
  • the video frame P80 is represented by an inverted triangle symbol pointing to the target object, as shown in FIG. 9. It is understandable that the special effect display logo can also be displayed in other ways.
  • the system automatically obtains the virtual information image corresponding to the special effect display identifier I1, and superimposes the virtual information image on the video frame P81 of the multi-angle free-view video, as shown in the figure As shown in 10, a three-dimensional ring R1 is rendered with the position of the athlete Q1 standing on the field as the center.
  • the end user touches and clicks on the special effect display identifier I2 in the video frame P81 of the multi-angle free-view video.
  • the system automatically obtains the virtual information image corresponding to the special effect display identifier I2, and then The virtual information image is superimposed and displayed on the video frame P81 of the multi-angle free-view video, and the multi-angle free-view video superimposed video frame P82 is obtained, in which the hit rate information display board M0 is displayed.
  • the hit rate information display board M0 displays the number, name, and hit rate information of the target object, athlete Q2.
  • the end user can continue to click on other special effect display signs displayed in the video frame to watch a video showing the AR special effects corresponding to each special effect display sign.
  • the special effect display logo can be displayed in the playback screen, but also in other places. For example, for a video frame that can display AR special effects, you can set the progress position corresponding to the corresponding frame on the playback progress bar.
  • the special effect display logo is used to inform the end user.
  • the interactive terminal T130 displays the playback interface Sr131 and the position of the currently played video frame in the entire progress bar L131. From the information displayed by the progress bar L131, it can be seen that according to the current The position of the playing video frame in the entire video.
  • the progress bar L131 is divided into a played segment L131a and an unplayed segment L131b.
  • the progress bar L131 displays special effect display identifiers D1 to D4, where the special effect display identifier D1 is located in the played segment.
  • Segment L131a the special effect display identifier D2 is the current video frame, located at the junction of the played segment L131a and the unplayed segment L131b, the special effect display identifiers D3 and D4 are located in the unplayed segment L131b, and the end user can use the special effects on the progress bar L131 Display the logo, you can go back or fast forward to the corresponding video frame, and watch the corresponding picture of the multi-angle free-view composite video frame with AR special effects implanted.
  • S141 In response to the image reconstruction instruction from the interactive terminal, determine the interactive frame time information at the interactive time, acquire the spliced image of the preset frame image in the image combination corresponding to the interactive frame time and the parameter data corresponding to the image combination, and send it to the office.
  • the interactive terminal enables the interactive terminal to select corresponding pixel data and depth data and corresponding parameter data in the stitched image based on the virtual viewpoint position information determined by the interactive operation, and set the selected pixel data and depth The data is combined and rendered, and the video frame of the multi-angle free-view video corresponding to the virtual viewpoint position at the time of the interaction frame is reconstructed and played.
  • the stitched image of the preset frame image is generated based on the pixel data and the depth data of the image combination at the moment of the interaction frame, the stitched image includes a first field and a second field, wherein, the The first field includes the pixel data of the preset frame image in the image combination, and the second field includes the depth data of the image combination.
  • the image combination at the interactive frame time is obtained based on intercepting multiple synchronized video frames at a specified frame time from multiple synchronized video streams, and the multiple synchronized video frames include frame images with different shooting angles of view.
  • S142 In response to the special effect generation interactive control instruction, obtain a virtual information image corresponding to the spliced image of the preset video frame indicated by the special effect generation interactive control instruction.
  • the target object in a preset video frame indicated by the special effect generation interactive control instruction may be read; based on the target object, the acquisition is based on the target object in advance.
  • the virtual information image generated by the augmented reality special effect input data may be read; based on the target object, the acquisition is based on the target object in advance.
  • Example 1 Taking the augmented reality special effect data of the target object as input data and inputting it into a preset three-dimensional model for processing to obtain a virtual information image matching the target object;
  • Example 2 Taking the augmented reality special effect data of the target object as input data, and inputting it into a preset machine learning model for processing, to obtain a virtual information image matching the target object.
  • the data processing system 150 may include a server 151 and an interactive terminal 152, where:
  • the server 151 may determine the interaction frame time information at the interaction time in response to the image reconstruction instruction from the interaction terminal 152, and obtain the spliced image of the preset video frame in the image combination corresponding to the interaction frame time and the corresponding parameter data of the image combination And send to the interactive terminal 152, and in response to the special effect generation interactive control instruction, generate a virtual information image corresponding to the spliced image of the preset video frame indicated by the special effect generation interactive control instruction;
  • the interactive terminal 152 selects corresponding pixel data and depth data and corresponding parameter data in the stitched image based on the virtual viewpoint position information determined by the interactive operation, and combines the selected pixel data and depth data according to preset rules Rendering, reconstructing the image of the multi-angle free-view video corresponding to the virtual viewpoint position at the interactive frame time and playing it; and combining the video frame of the multi-angle free-view video corresponding to the virtual viewpoint position at the interactive frame time
  • the information image is synthesized to obtain a synthesized video frame and played.
  • the server 151 may store virtual information images corresponding to the spliced images of the preset frame images, or augmented reality special effects input data based on the spliced images of the preset frame images, and obtain all the images from a third party.
  • the virtual information image corresponding to the spliced image of the preset frame image, or the virtual information image corresponding to the spliced image of the preset frame image is generated instantly.
  • the image combination at the interactive frame time is obtained based on intercepting multiple synchronized video frames at a specified frame time from multiple synchronized video streams, and the multiple synchronized video frames include frame images with different shooting angles of view.
  • the data processing system may also include a data processing device 153.
  • the data processing device 153 can perform video frame interception on the video frames collected by the collection array in the on-site collection area. By intercepting the video frames to be generated from the multi-angle free-view video, a large amount of data transmission and data processing can be avoided.
  • the collection device in the field collection array can synchronously collect frame images of different shooting angles, and the data processing device can intercept multiple synchronized video frames at a specified frame time from multiple synchronized video streams.
  • the data processing device 153 may upload the captured frame image to the server 151.
  • the server 151 may store the spliced image of the image combination of the preset video frame and the parameter data of the image combination.
  • the data processing system suitable for non-interactive scenes and the data processing system suitable for interactive scenes can be merged.
  • the server 32 can obtain the video frame of the multi-angle free-view video and the virtual information image, and for the image combination formed by multiple synchronized video frames at a specified frame time, in order to Later, data can be easily obtained.
  • the server 32 can generate a spliced image corresponding to the image combination based on the pixel data and depth data of the image combination, and the spliced image can include a first field and a second field.
  • the first field includes the pixel data of the image combination
  • the second field includes the depth data of the image combination.
  • a spliced image corresponding to the preset video frame in the image combination may be generated based on the pixel data and depth data of the preset video frame in the image combination, and the spliced image corresponding to the preset video frame may include The first field and the second field, wherein the first field includes the pixel data of the preset video frame, the second field includes the depth data of the preset video frame, and then only the preset is stored.
  • the spliced image and the corresponding parameter data corresponding to the video frame are sufficient.
  • the stitched image can be divided into an image area and a depth map area, the pixel field of the image area stores the pixel data of the multiple frame images, and the depth map area
  • the pixel field stores the depth data of the multiple frame images; the pixel field in the image area storing the pixel data of the frame image is used as the first field, and the pixel field in the depth map area storing the depth data of the frame image is used as The second field; the stitched image of the acquired image combination and the corresponding parameter data of the image combination can be stored in the data file.
  • the header file of the data file may contain The storage address is read from the corresponding storage space.
  • the storage format of the image combination may be a video format
  • the number of image combinations may be multiple
  • each image combination may be a combination of images corresponding to different frame moments after the video is decapsulated and decoded.
  • the user in addition to watching a multi-angle free-view video through the playback terminal, the user can also actively choose to play a multi-angle free-view video through interactive operations during the video watching process in order to further improve the interactive experience.
  • the following methods are adopted for implementation:
  • the interaction frame time information at the interaction time In response to the image reconstruction instruction from the interactive terminal, determine the interaction frame time information at the interaction time, obtain the spliced image of the preset video frame in the image combination corresponding to the interaction frame time and the parameter data corresponding to the image combination, and send it to the interaction
  • the terminal enables the interactive terminal to select corresponding pixel data and depth data and corresponding parameter data in the stitched image based on the virtual viewpoint position information determined by the interactive operation, and perform the selected pixel data and depth data
  • the combined rendering is used to reconstruct the video frame of the multi-angle free-view video corresponding to the virtual viewpoint position at the time of the interaction frame and play it.
  • the preset rules can be set according to specific scenarios. For example, based on the position information of the virtual viewpoint determined by the interactive operation, the position information of the W adjacent virtual viewpoints that are closest to the virtual viewpoint at the moment of interaction may be selected in order of distance. And obtain the pixel data and depth data corresponding to the above-mentioned total W+1 virtual viewpoints including the virtual viewpoints of the interaction moments that satisfy the information of the interaction frame moments in the stitched image.
  • the interactive frame time information is determined based on a trigger operation from the interactive terminal, and the trigger operation may be a trigger operation input by a user of the interactive terminal, or a trigger operation automatically generated by the interactive terminal.
  • the interactive terminal can automatically initiate a trigger operation when it detects the presence of the identifier of the multi-angle free viewpoint data frame.
  • the user triggers manually it can be the time information when the user chooses to trigger the interaction after the interactive terminal displays the interactive prompt information, or it can be the historical moment information that the interactive terminal receives the user operation to trigger the interaction, and the historical moment information can be the current playback moment. The previous moment information.
  • the interactive terminal 35 may preset the spliced image of the video frame and the corresponding parameter data, the interactive frame time information, and the virtual viewpoint position information at the interactive frame time based on the acquired image combination of the interactive frame time, using and
  • the above step S44 is the same method to perform combined rendering on the pixel data and depth data of the spliced image of the preset video frame in the image combination of the acquired interactive frame time, to obtain the video of the multi-angle free-view video corresponding to the interactive virtual viewpoint position. Frame, and start playing a multi-angle free-view video at the interactive virtual viewpoint position.
  • the video frame of the multi-angle free-view video corresponding to the interactive virtual viewpoint position can be instantly generated based on the image reconstruction instruction from the interactive terminal, which can further enhance the user interaction experience.
  • the interactive terminal and the playback terminal may be the same terminal device.
  • an interactive control instruction may be generated in response to the server special effect, and a virtual information image corresponding to the spliced image of the preset frame image indicated by the server special effect generation interactive control instruction may be generated and stored.
  • the virtual information image may be superimposed and rendered on the spliced image of the preset frame image to obtain a multi-angle image with AR special effects implanted.
  • the superimposed video frame of the angle free view video can be specifically implemented in the scenes such as the multi-angle free view video recording or on-demand.
  • the implantation of the virtual information image can be triggered according to the preset setting or the virtual information can be triggered according to the user interaction operation Implantation of images.
  • AR special effects can be implanted in the multi-angle free-view videos.
  • the following methods can be used for implementation:
  • the image reconstruction instruction After receiving the image reconstruction instruction, it is also possible to generate an interactive instruction in response to user-side special effects from the interactive terminal, to obtain the virtual information image corresponding to the spliced image of the preset video frame, and to change the value of the preset video frame
  • the virtual information image corresponding to the spliced image is sent to the interactive terminal, so that the interactive terminal superimposes and renders the virtual information image on the video frame of the multi-angle free-view video corresponding to the virtual viewpoint position at the time of the interactive frame, and is implanted
  • the multi-angle free perspective of AR special effects superimposes video frames and plays them.
  • the preset video frame may be a video frame indicated by the user's second interactive operation, for example, may be a frame image clicked by the user, or a frame sequence corresponding to the user's sliding operation.
  • the acquisition of the virtual information image corresponding to the spliced image of the preset frame image can be stopped. Accordingly, there is no need to superimpose the virtual information image during the rendering process of the interactive terminal, only Play multi-angle free-view videos.
  • the user-side special effect corresponding to the third interactive operation of the user exits the interactive instruction, and the virtual image corresponding to the spliced image of the subsequent video frame is stopped.
  • Information image acquisition and rendering display if during the playback of the multi-angle free-view superimposed video frame with the AR special effect data implanted, the user-side special effect corresponding to the third interactive operation of the user exits the interactive instruction, and the virtual image corresponding to the spliced image of the subsequent video frame is stopped.
  • part of the video stream may contain multi-angle free-view video data.
  • one or more of the sequences correspond to the virtual information Image, when the user-side special effect exit interactive instruction is detected, the implantation of all subsequent AR special effects in the video stream can be exited, or only the display of subsequent AR special effects in a multi-angle free video sequence can be exited.
  • the virtual information image can be generated based on the special effect generation instruction of the server.
  • a server such as a cloud server
  • the virtual information image first determine the spliced image of a preset frame corresponding to the virtual information image, and secondly, generate a virtual information image matching the spliced image of the preset frame.
  • the cloud server may automatically select the stitched image of the preset video frame through the preset AI recognition algorithm as the stitched image of the AR special effect data to be implanted.
  • the server user can specify the spliced image of the preset video frame through interactive operation.
  • the server receives the server special effect generation interactive control instruction based on the server special effect generation interactive control operation, it can use the server special effect to generate interactive control instructions.
  • the generated interactive instruction acquires the spliced image of the specified preset video frame, and then the virtual information image corresponding to the spliced image of the preset video frame specified by the special effect generation instruction can be generated.
  • the object in the video frame can be recognized as a target object matching the AR special effect to be implanted through image recognition technology, for example, it is recognized that the target object is a person (such as a basketball player) or an object. (E.g. basketball, scoreboard), an animal (e.g. cat or lion), etc.
  • the target object is a person (such as a basketball player) or an object. (E.g. basketball, scoreboard), an animal (e.g. cat or lion), etc.
  • an interactive control instruction may be generated in response to a special effect on the server side, and the augmented reality special effect input data of the target object may be obtained.
  • the server special effect generation interactive control instruction corresponding to the interactive operation can be generated correspondingly, and the interactive control instruction is generated according to the server special effect.
  • the athlete data, goal data, etc. can be obtained, where the athlete data can include basic data associated with the players, such as names, names of positions in a basketball game (specific numbers, or names of positions such as center forward, forward, defender, etc.),
  • scoring data can include shooting percentage, etc., which can be used as input data for augmented reality special effects.
  • an interactive control instruction may be generated according to the special effect of the server first, the output type of the special effect may be determined first, and then the historical data of the target object may be obtained, and the historical data may be processed according to the special effect data type to obtain Augmented reality special effect input data corresponding to the special effect output type.
  • an interactive control instruction is generated according to the server special effect, and the server user wants to obtain the shooting percentage of the target object in the special effect area, then the target can be calculated The distance between the position of the object and the ground projection position of the center of the net, and the historical shooting data within this distance of the target object is acquired as the augmented reality special effect input data of the target object.
  • the special effect generation method of the virtual information image can be selected and set as required.
  • the augmented reality special effect input data can be used as input data and input to a preset three-dimensional model for processing to obtain a virtual object matching the target object in the stitched image of the preset video frame. Information image.
  • the augmented reality special effect input data as input data and inputting it into a preset three-dimensional model
  • three-dimensional graphic elements matching the input data can be obtained and combined, and the display metadata in the input data and the
  • the three-dimensional graphic element data is output as a virtual information image matching the target object in the video frame.
  • the augmented reality special effect input data may be used as input data and input to a preset machine learning model for processing to obtain a virtual information image matching the target object in the video frame.
  • the preset machine learning model can be a supervised learning model, an unsupervised learning model, or a semi-supervised learning model (combined model of a supervised learning model and an unsupervised learning model) ,
  • the specific model used is not limited in the embodiments of this specification.
  • the generated virtual information image can be a static image, a dynamic image, or a dynamic image containing audio special effects, where the dynamic image or the dynamic image containing audio special effects can be based on the target object and one or more videos. Frame matching.
  • the server may also directly save the virtual information image obtained during the live broadcast or quasi-live broadcast process as the virtual information image obtained through the interactive terminal during the user interaction process.
  • the composite video frame displayed on the playback terminal is not essentially different from the composite video frame displayed on the interactive terminal.
  • the two can actually use the same virtual information image, but also use different virtual information images.
  • the corresponding special effect generation methods can be the same or different.
  • the three-dimensional model or machine learning model used in the special effect generation process can be the same model, or the same model, or a completely different model. .
  • the playback terminal and the interactive terminal may also be the same terminal device, that is, the user can directly broadcast or quasi-live the multi-angle free-view video through the terminal device, and the multi-angle video with AR special effects can be automatically played.
  • Free-view synthetic video frames users can also interact through the terminal device, play multi-angle free-view video data based on user interactive operations, and play multi-angle free-view synthetic video frames implanted with AR special effects. Through interaction, users can independently choose which target AR special effects, that is, virtual information images, to watch in recorded, rebroadcast, and on-demand videos.
  • the data processing method of the above embodiment can realize the low-latency playback of the multi-angle free-view video with AR special effects implanted.
  • the following are the methods that can realize the above method Corresponding introduction to the system and key equipment.
  • the data processing system 160 may include: a target object acquisition unit 161, a virtual information image acquisition unit 162, an image synthesis unit 163 and a display unit 164, in:
  • the target object obtaining unit 161 is adapted to obtain a target object in a video frame of a multi-angle free-view video
  • the virtual information image acquiring unit 162 is adapted to acquire a virtual information image generated based on the augmented reality special effect input data of the target object;
  • the image synthesis unit 163 is adapted to perform synthesis processing on the virtual information image and the corresponding video frame to obtain a synthesized video frame;
  • the display unit 164 is adapted to display the obtained composite video frame.
  • each unit may be distributed in different devices, or some units may be located in the same device. Based on different specific application scenarios, the implemented solutions are different.
  • each unit can be implemented by corresponding hardware or hardware, or a combination of software and hardware.
  • a processor specifically, a CPU or FPGA, etc.
  • the target object acquisition unit 161 virtual information image acquisition The unit 162, the image synthesis unit 163, etc., can be used as the display unit 164 through a display.
  • the data processing system 30 may include: a data processing device 31, a server 32, a playback control device 33, and a playback terminal 34, in:
  • the data processing device 31 is adapted to obtain multiple synchronized video frames from multiple video data streams simultaneously collected in real time from different locations in the field collection area based on a video frame interception instruction. Upload multiple synchronized video frames at the designated frame time to the server 12;
  • the server 32 is adapted to receive multiple synchronized video frames uploaded by the data processing device 31 as an image combination, determine the parameter data corresponding to the image combination and the depth data of each frame image in the image combination, and based on all The parameter data corresponding to the image combination, the pixel data and the depth data of the preset frame image in the image combination, the frame image reconstruction is performed on the preset virtual viewpoint path, and the corresponding video frame of the multi-angle free-view video is obtained; and response
  • the special effect generation instruction the target object in the video frame specified by the special effect generation instruction is acquired, the augmented reality special effect input data of the target object is acquired, and the corresponding virtual reality is generated based on the augmented reality special effect input data of the target object.
  • Information image combining the virtual information image and the designated video frame to obtain a composite video frame, and inputting the composite video frame to the playback control device 34;
  • the playback control device 33 is adapted to insert the synthesized video frame data into the video stream to be played;
  • the playback terminal 34 is adapted to receive the to-be-played video stream from the playback control device 33 and perform real-time playback.
  • the playback control terminal 33 may output the to-be-played video stream based on the control instruction.
  • the playback control device 33 may select one of the multiple data streams as the to-be-played video stream, or continuously switch the selection among the multiple-channels of video streams to continuously output the to-be-played video stream.
  • the broadcast control device can be used as a playback control device in the embodiment of the present invention.
  • the guide control device can be a manual or semi-manual guide control device that performs playback control based on external input control instructions, or it can be a virtual guide control device that can automatically perform guide control based on artificial intelligence or big data learning or a preset algorithm.
  • this distributed system architecture can save a lot of transmission resources and server processing resources, and under the condition of limited network transmission bandwidth, it can realize multi-angle free-view composite video with augmented reality special effects Frames can be generated in real time, so low-latency playback of videos with multi-angle and free-view augmented reality special effects can be realized, and thus the dual needs of users for rich visual experience and low-latency during video viewing can be taken into account.
  • the data processing device 31 performs the interception of synchronized video frames, and the server performs multi-angle free-view video reconstruction, virtual information image acquisition, and multi-angle free-view video and virtual information image synthesis processing (such as fusion processing),
  • the playback control device selects the video stream to be played, and the playback device plays it.
  • This distributed system architecture can avoid a large amount of data processing performed by the same device, thereby improving data processing efficiency and reducing transmission delay.
  • the server 32 may be implemented by a server cluster composed of multiple servers, where the server cluster may include multiple homogeneous or heterogeneous server single devices or server clusters. If a heterogeneous server cluster is used, each server device in the heterogeneous server cluster can be configured according to the characteristics of the different data to be processed.
  • the heterogeneous server cluster 170 used is composed of a three-dimensional deep reconstruction service cluster 171 and a cloud augmented reality special effect generation and rendering server cluster 172, in which:
  • the three-dimensional depth reconstruction service cluster 171 is adapted to reconstruct a corresponding multi-angle free-view video based on multiple synchronized video frames intercepted from multiple synchronized video streams;
  • the cloud augmented reality special effect generation and rendering server cluster 172 is adapted to obtain a virtual information image corresponding to the image combination specified by the special effect generation instruction in response to the special effect generation instruction, and combine the specified image combination with the virtual information image.
  • the information image is fused to obtain a multi-angle free view fusion video frame.
  • the three-dimensional deep reconstruction service cluster 171 and the cloud augmented reality special effect generation and rendering server cluster 172 may respectively include multiple server sub-clusters or server groups, and different server clusters or servers.
  • the groups perform different functions separately, and work together to complete the reconstruction of multi-angle free video frames.
  • the heterogeneous server cluster 170 may further include an augmented reality special effect input data storage database 173, which is suitable for storing the augmented reality special effect input data that matches the target object in the specified image combination.
  • a cloud service system composed of a cloud server cluster obtains the first multi-angle free-view fusion video frame based on a plurality of uploaded synchronized video frames, and the cloud service system adopts a heterogeneous server cluster.
  • the following is still an example of how to implement a specific application scenario shown in FIG. 1.
  • the data processing system 10 includes: a collection array 11 composed of a plurality of collection devices, a data processing device 12, The server cluster 13, the playback control device 14 and the playback terminal 15 in the cloud.
  • the basketball hoop on the left is taken as the core point of view
  • the core point of view is taken as the center of the circle
  • the fan-shaped area on the same plane as the core point of view is used as the preset multi-angle free viewing angle range.
  • the collection devices in the collection array 11 can be fan-shaped and placed in different positions in the field collection area according to the preset multi-angle free viewing angle range, and can collect video streams synchronously in real time from corresponding angles respectively.
  • the collection equipment in the collection array 11 can also be set on the ceiling area of the basketball stadium, on the basketball stand, and so on.
  • the collection devices can be arranged and distributed along a straight line, a fan shape, an arc line, a circle, or an irregular shape.
  • the specific arrangement can be set according to one or more factors such as the specific site environment, the number of acquisition equipment, the characteristics of the acquisition equipment, and the imaging effect requirements.
  • the collection device may be any device with a camera function, for example, a common camera, a mobile phone, a professional camera, and the like.
  • the data processing device 12 can be placed in a non-collection area on site, which can be regarded as a site server.
  • the data processing device 12 may send a streaming instruction to each acquisition device in the acquisition array 11 through a wireless local area network, and each acquisition device in the acquisition array 11 will obtain a streaming instruction based on the streaming instruction sent by the data processing device 12
  • the video data stream of is transmitted to the data processing device 12 in real time.
  • each collection device in the collection array 11 can transmit the obtained video stream to the data processing device 12 through the switch 17 in real time.
  • the data processing device 12 When the data processing device 12 receives the video frame interception instruction, it intercepts the video frames at the specified frame time from the received multiple video data streams to obtain multiple synchronized video frames, and then obtains the obtained video frame at the specified frame time. Multiple synchronized video frames are uploaded to the server cluster 13 in the cloud.
  • the cloud server cluster 13 uses the received multiple synchronized video frames as image combinations, determines the corresponding parameter data of the image combination and the depth data of each frame of the image combination, and determines the corresponding image combination based on the image combination.
  • Parameter data, pixel data and depth data of the preset frame image in the image combination reconstruct the frame image of the preset virtual viewpoint path to obtain the image data of the corresponding multi-angle free-view video; and respond to the special effect generation instruction Acquire a virtual information image corresponding to the image combination specified by the special effect generation instruction, and perform fusion processing on the specified image combination and the virtual information image to obtain a multi-angle free-view fusion video frame.
  • the server can be placed in the cloud, and in order to process data faster in parallel, a cloud server cluster 13 can be composed of multiple different servers or server groups according to different data processing.
  • the cloud server cluster 13 may include: a first cloud server 131, a second cloud server 132, a third cloud server 133, a fourth cloud server 134, and a fifth cloud server 135.
  • the first cloud server 131 can be used to determine the corresponding parameter data of the image combination; the second cloud server 132 can be used to determine the depth data of each frame of the image in the image combination; the third cloud server 133 can be based on the The parameter data corresponding to the image combination, the pixel data and the depth data of the image combination are reconstructed using a depth image-based virtual view point reconstruction (DIBR) algorithm to reconstruct a preset virtual view point path;
  • DIBR virtual view point reconstruction
  • the fourth cloud server 134 can be used to generate a multi-angle free-view video;
  • the fifth cloud server 135 can be used to respond to a special effect generation instruction to obtain a virtual information image corresponding to the image combination specified by the special effect generation instruction, and The image combination is fused with the virtual information image to obtain a multi-angle free-view fusion video frame.
  • first cloud server 131, the second cloud server 132, the third cloud server 133, the fourth cloud server 134, and the fifth cloud server 135 may also be a server group composed of a server array or server sub-clusters.
  • the embodiment of the present invention does not limit it.
  • each cloud server or cloud server cluster can use devices with different hardware configurations. For example, for devices that need to process a large number of images, such as the fourth cloud server 134 and the fifth cloud server 135, A device including a Graphics Processing Unit (GPU) or a GPU group may be used.
  • GPU Graphics Processing Unit
  • the GPU may adopt a unified device architecture (Compute Unified Device Architecture, CUDA) parallel programming architecture to perform combined rendering on the pixels in the corresponding group of texture maps and depth maps in the selected image combination.
  • CUDA is a new hardware and software architecture that is used to allocate and manage calculations on GPUs as data parallel computing devices without mapping them to a graphical application programming interface (API).
  • API graphical application programming interface
  • the GPU When programming with CUDA, the GPU can be regarded as a computing device capable of executing a large number of threads in parallel. It runs as a central processing unit (CPU) or a co-processor of the host. In other words, the data-parallel and computationally intensive part of the application running on the host is delegated to the GPU.
  • CPU central processing unit
  • co-processor of the host.
  • the server cluster 13 in the server cloud can store the pixel data and depth data of the image combination in the following manner:
  • a spliced image corresponding to the frame time is generated, the spliced image includes a first field and a second field, wherein the first field includes a preset frame image in the image combination
  • the second field includes the second field of the depth data of the preset frame image in the image combination; and stores the stitched image of the image combination and the parameter data corresponding to the image combination.
  • the acquired spliced image and corresponding parameter data can be stored in a data file. When the spliced image or parameter data needs to be acquired, it can be read from the corresponding storage space according to the corresponding storage address in the header file of the data file.
  • the playback control device 14 may insert the received data of the multi-angle free-view video fusion video frame into the to-be-played video stream, and the playback terminal 15 receives the to-be-played video stream from the playback control device 14 and performs real-time playback.
  • the playback control device 14 may be a manual playback control device or a virtual playback control device.
  • a dedicated server that can automatically switch video streams can be set as a virtual playback control device to control the data source.
  • a broadcast director control device such as a broadcast director station may be used as a broadcast control device in the embodiment of the present invention.
  • the data processing device 12 can be placed in the on-site non-collection area or the cloud according to specific scenarios, and the server (cluster) and playback control device can be placed in the on-site non-collection area, cloud or terminal access according to the specific scenario.
  • the foregoing embodiments are not used to limit the specific implementation and protection scope of the present invention.
  • the data processing system used in the embodiments of this specification can not only realize the playback of multi-angle free-view videos in low-latency scenes such as live broadcasts and quasi-live broadcasts, but also realize multiple scenes such as recording and rebroadcasting based on user interaction operations. Free viewing angle video playback.
  • the data processing system 30 may also include an interactive terminal 35, the server 32 may respond to an image reconstruction instruction from the interactive terminal 35, determine the interactive frame time information at the interactive time, and store the corresponding interactive frame The spliced image of the corresponding image combination preset frame image at the time and the parameter data corresponding to the corresponding image combination are sent to the interactive terminal 35.
  • the interactive terminal 35 sends the image reconstruction instruction to the server based on the interactive operation, and selects the corresponding pixel data, depth data, and corresponding pixel data and depth data in the stitched image according to preset rules based on the virtual viewpoint position information determined by the interactive operation.
  • the preset rule can be set according to a specific scenario, and for details, refer to the introduction in the foregoing method embodiment.
  • the interactive frame time information may be determined based on a trigger operation from the interactive terminal 35.
  • the trigger operation may be a trigger operation input by the user or a trigger operation automatically generated by the interactive terminal.
  • the interactive terminal detects the presence of The trigger operation can be initiated automatically when the multi-angle free viewpoint data frame is identified.
  • the user triggers manually it can be the time information when the user chooses to trigger the interaction after the interactive terminal displays the interactive prompt information, or it can be the historical moment information that the interactive terminal receives the user operation to trigger the interaction, and the historical moment information can be the current playback moment. The previous moment information.
  • the interactive terminal 35 can use the image combination of the preset frame image and the corresponding parameter data, the interactive frame time information and the virtual viewpoint position information at the interactive frame time based on the acquired image combination of the interactive frame time.
  • the above-mentioned step S44 is the same method to perform combined rendering on the pixel data and depth data of the stitched image of the preset frame image in the image combination of the acquired interactive frame time, to obtain the image of the multi-angle free-view video corresponding to the interactive virtual viewpoint position , And start playing a multi-angle free-view video at the interactive virtual viewpoint position.
  • the multi-angle free-view video corresponding to the interactive virtual viewpoint position can be instantly generated based on the image reconstruction instruction from the interactive terminal, which can further enhance the user's interactive experience.
  • the server 32 may also generate an interactive control instruction according to the server special effect, and generate a mosaic image corresponding to the preset video frame indicated by the server special effect generation interactive control instruction.
  • Virtual information image and storage Through the above solution, the virtual information image corresponding to the spliced image of the preset frame image is generated in advance, and then when there is a need for playback, the rendering and playback can be directly performed, which can reduce the time delay, further enhance the user's interactive experience, and improve the user Visual experience.
  • the data processing system can be used in live and quasi-live scenes to achieve low-latency and multi-angle free-view video playback with AR special effects, and can also realize recording, broadcasting, and broadcasting based on user interaction.
  • Multi-angle free viewing angle video playback with AR special effects in any video playback scene such as rebroadcasting.
  • the user can interact with the server through the interactive terminal to obtain the virtual information image corresponding to the spliced image of the preset video frame and render it on the interactive terminal, so as to realize the multi-angle free-view composite video frame with AR special effects Play.
  • the server 32 is further adapted to generate interactive instructions in response to user-side special effects from the interactive terminal, obtain virtual information images corresponding to the spliced images of the preset video frames, and splice the preset video frames
  • the virtual information image corresponding to the image is sent to the interactive terminal 35.
  • the interactive terminal 35 is adapted to obtain and play a composite video frame from the video frame of the image of the multi-angle free-view video corresponding to the virtual viewpoint position at the time of the interactive frame and the virtual information image.
  • the playback interface Sr1 of the playback terminal T1 shown in Figure 18 displays the T-1 video frame, which can be seen from the right side of the athlete From the perspective, the athlete is sprinting to the finish line.
  • the data processing device intercepts multiple synchronized video frames from the Tth frame to the T+1th frame in the video stream and uploads them to the server.
  • the server uses the received synchronized video frames from the T to T+1th frame as the image combination.
  • the server performs frame image reconstruction on the preset virtual viewpoint path based on the corresponding parameter data of the image combination, the pixel data and the depth data of the preset frame image in the image combination, and obtains the corresponding multi-angle free-view video
  • a virtual information image corresponding to the image combination specified by the special effect generation instruction is acquired.
  • the virtual information image is superimposed and rendered on the specified image combination, and the multi-angle free-view fusion video frame corresponding to the T ⁇ T+1 frame is displayed in the playback terminal T1 as shown in Figure 19 and Figure 20.
  • the playback interface Sr2 in Figure 19 shows the effect diagram of the T-th video frame.
  • the viewing angle is switched to the front of the athlete, and it can be seen from the screen that it is embedded with an AR special effect image on top of the actual image. It shows the real picture of the athlete sprinting to the finish line and the implanted AR special effect images, including the athlete’s basic information board M1 and two virtual generated footprints M2 matching the athlete’s footsteps, in order to distinguish the virtual information image corresponding to the AR special effect and The real image corresponding to the multi-angle free-view video frame.
  • the solid line represents the real image
  • the dotted line represents the virtual information image corresponding to the AR special effect. You can see the athlete’s name, nationality, and competition number from the basic information board M1. , Historical best results and other information.
  • Figure 20 shows the effect diagram of the T+1 frame video frame.
  • the viewing angle is further switched to the left side of the athlete. From the screen displayed on the playback interface Sr3, it can be seen that the athlete has crossed the finish line.
  • the basic information board M1 contains specific information. It can be updated in real time as time goes by. It can be seen from Figure 19 that the athlete's current performance has been added, the position and shape of the footprint M2 follow the athlete's footsteps, and the pattern mark M3 for the athlete to win the first place is added.
  • the playback terminal in the embodiments of this specification may specifically be any one or more types of terminal devices, such as a TV, a computer, a mobile phone, a vehicle-mounted device, and a projection device.
  • the interactive terminal 210 may include a first display unit 211, a virtual information image acquisition unit 212, and a second display unit 213, in:
  • the first display unit 211 is adapted to display images of a multi-angle free-view video in real time, wherein the images of the multi-angle free-view video are combined by an image formed by a plurality of synchronized video frame images at a specified frame time
  • the parameter data, the pixel data and the depth data of the image combination are reconstructed, and the multiple synchronized video frames include frame images of different shooting angles;
  • the virtual information image acquisition unit 212 is adapted to obtain a virtual information image corresponding to a specified frame time of the special effect display identifier in response to a triggering operation on the special effect display identifier in the multi-angle free-view video image;
  • the second display unit 213 is adapted to superimpose and display the virtual information image on a video frame of the multi-angle free-view video.
  • terminal users can watch multi-angle free-perspective video images embedded with AR special effects through interactive interaction, which can enrich the user's visual experience.
  • the interactive terminal 220 may include:
  • the video stream obtaining unit 221 is adapted to obtain a to-be-played video stream from a playback control device in real time, the to-be-played data stream includes video data and an interactive identifier, and the interactive identifier is associated with a specified frame moment of the to-be-played data stream;
  • the play and display unit 222 is adapted to play and display the video and interactive identification of the to-be-played video stream in real time;
  • the interactive data obtaining unit 223 is adapted to obtain interactive data corresponding to the designated frame moment in response to a trigger operation on the interactive identifier, and the interactive data includes a multi-angle free-view video frame and a video frame of the preset video frame.
  • the interactive display unit 224 is adapted to display, based on the interactive data, a composite video frame of a multi-angle free view at the specified frame time;
  • the switching unit 225 is adapted to trigger switching to the to-be-played video stream acquired by the video stream acquiring unit 221 in real time from the playback control device in real time when the interaction end signal is detected, and the playback and display unit 222 performs real-time playback and display .
  • the interactive data may be generated by the server and transmitted to the interactive terminal, or may be generated by the interactive terminal.
  • the interactive terminal can obtain the data stream to be played from the playing control device in real time, and can display the corresponding interactive identifier at the corresponding frame time.
  • the interactive logo can be displayed on the progress bar, or for example, the interactive logo can be displayed directly on the display screen.
  • the display interface Sr20 of the interactive terminal T2 displays the interactive identifier V1.
  • the interactive terminal T2 can continue to read subsequent video data.
  • the interactive terminal T2 When the user slides to select the trigger in the direction indicated by the arrow of the interactive identifier V1, the interactive terminal T2 generates an image reconstruction instruction at a specified frame time corresponding to the interactive identifier after receiving the feedback, and sends it to the server 32.
  • the interactive terminal T2 receives the feedback and generates an image reconstruction instruction corresponding to the designated frame time Ti ⁇ Ti+2 of the interactive logo V1, and sends it to the server 32.
  • the server 32 may send multiple frame images corresponding to the designated frame time Ti ⁇ Ti+1.
  • the display interface Sr20 displays the interactive logo Ir.
  • the interactive terminal T2 can obtain the corresponding virtual information image from the server.
  • the multi-angle free-view fusion image corresponding to frame Ti+2 can be displayed on the interactive terminal T2, as shown in FIG. 25 and FIG. 26.
  • the image of frame Ti+1 shown in the image shows the real image of the athlete sprinting to the finish line, as well as the virtual information image, including the athlete’s basic information board M4 and the footprint M5 matching the athlete’s footsteps.
  • the figure In 25 and Figure 26 the real image is marked with a solid line, and the dotted line represents a virtual information image. From the basic information board M4, you can see the athlete's name, nationality, competition number, historical best score and other information.
  • Figure 26 shows the effect of the Ti+2 video frame. The viewing angle is further switched to the left side of the athlete. It can be seen from the screen that the athlete has crossed the finish line.
  • the specific information contained in the basic information board M4 can be changed over time. Real-time update, as can be seen from Figure 26, the athlete’s current performance is added, the position and shape of the footprint M5 follow the athlete’s footsteps, and the pattern mark M6 for the athlete to win the first place is added.
  • the interactive terminal T2 may generate interactive data for interaction based on the multiple video frames, and may use an image reconstruction algorithm to perform image processing on the multi-angle free view data of the interactive data, and obtain virtual information images from the server, Then, the playback of the multi-angle free-view video at the designated frame time and the playback of the multi-angle free-view composite video frame with AR special effects implanted in the designated frame are performed.
  • the interactive terminal in the embodiment of the present invention may be an electronic device with touch screen function, a head-mounted virtual reality (Virtual Reality, VR) terminal, an edge node device connected to a display, and an IoT (The Any one or more types of Internet of Things) devices.
  • VR Virtual Reality
  • IoT The Any one or more types of Internet of Things
  • the target object corresponding to the stitched image of the preset video frame image can be identified, and the target object can be obtained Input data for augmented reality special effects.
  • the interactive data may also include augmented reality special effect input data of the target object, and the augmented reality special effect input data may include at least one of the following: field analysis data, collection of information data of the target object, and collection of target object Information data of related equipment, information data of items deployed on site, and information data of logos displayed on site.
  • the virtual information image can be generated, and then the multi-angle free-view composite video frame can be generated, so that the implanted AR special effects are richer and more targeted.
  • the end user can be more in-depth, A comprehensive and professional understanding of the content being watched will further enhance the user's visual experience.
  • the server 270 may include: an image reconstruction unit 271 , The virtual information image generation unit 272 and the data transmission unit 273, wherein:
  • the image reconstruction unit 271 is adapted to determine the interactive frame time information at the interactive time in response to the image reconstruction instruction from the interactive terminal, and obtain the spliced image of the preset frame image in the image combination corresponding to the interactive frame time and the corresponding image combination Parameter data;
  • the virtual information image generating unit 272 is adapted to generate a virtual information image corresponding to a spliced image of a video frame indicated by the special effect generation interactive control instruction in response to a special effect generation interactive control instruction;
  • the data transmission unit 273 is adapted to perform data interaction with the interactive terminal, and includes: transmitting the spliced image of the preset video frame in the image combination at the corresponding interactive frame time and the parameter data corresponding to the image combination to the interactive
  • the terminal enables the interactive terminal to select corresponding pixel data and depth data and corresponding parameter data in the stitched image based on the virtual viewpoint position information determined by the interactive operation, and perform the selected pixel data and depth data Combined rendering, reconstructing and playing the multi-angle free-view video image corresponding to the virtual viewpoint position at the time of the interactive frame; and transmitting the virtual information image corresponding to the spliced image of the preset frame image indicated by the special effect generation interactive control instruction To the interactive terminal, so that the interactive terminal synthesizes the video frame of the multi-angle free-view video corresponding to the virtual viewpoint position at the time of the interactive frame and the virtual information image to obtain a multi-angle free-view synthesized video frame and perform Play.
  • the server 280 may include:
  • the data receiving unit 281 is adapted to receive multiple synchronized video frames at a specified frame time intercepted from multiple synchronized video streams as an image combination, and the multiple synchronized video frames include frame images of different shooting angles;
  • the parameter data calculation unit 282 is adapted to determine parameter data corresponding to the image combination
  • the depth data calculation unit 283 is adapted to determine the depth data of each frame of the image in the image combination
  • the video data acquisition unit 284 is adapted to perform frame image reconstruction on the preset virtual viewpoint path based on the corresponding parameter data of the image combination, the pixel data and the depth data of the preset frame image in the image combination, to obtain the corresponding multiple The video frame of the angle free view video;
  • the first virtual information image generation unit 285 is adapted to obtain a target object in a video frame specified by the special effect generation instruction in response to a special effect generation instruction, obtain augmented reality special effect input data of the target object, and based on the target
  • the augmented reality special effect input data of the object generates a corresponding virtual information image; correspondingly, the first virtual information image generating unit 285 may include a special effect area determining subunit 2851 and a special effect data generating subunit 2852.
  • the image synthesis unit 286 is adapted to perform synthesis processing on the virtual information image and the designated video frame to obtain a synthesized video frame;
  • the first data transmission unit 287 is adapted to output the synthesized video frame to be inserted into the video stream to be played.
  • the embodiment of this specification also provides another server.
  • the difference between the server 290 and the server 280 is that the server 290 may also include a stitched image generating unit 291 and a first data storage unit 292, wherein:
  • the stitched image generating unit 291 is adapted to generate a stitched image corresponding to the image combination based on the pixel data and depth data of the image combination, the stitched image includes a first field and a second field, wherein the first field Includes pixel data of a preset frame image in the image combination, and the second field includes depth data of the image combination;
  • the first data storage unit 292 is adapted to store the stitched image of the image combination and the parameter data corresponding to the image combination.
  • the server 290 may further include: a data extraction unit 293 and a second data transmission unit 294, wherein:
  • the data extraction unit 293 is adapted to determine the interactive frame time information at the interactive time in response to the image reconstruction instruction from the interactive terminal, and obtain the spliced image of the preset frame image in the image combination corresponding to the interactive frame time and the corresponding parameters of the image combination data;
  • the second data transmission unit 294 is adapted to send the spliced image of the corresponding image combination preset frame image and the corresponding parameter data of the corresponding image combination at the corresponding interactive frame moment to the interactive terminal, so that the interactive terminal is based on the interactive operation
  • the corresponding pixel data, depth data and corresponding parameter data in the stitched image are selected according to preset rules, and the selected pixel data and depth data are combined for rendering, and the interactive frame time is reconstructed
  • the video frame of the multi-angle free-view video corresponding to the virtual viewpoint position is played.
  • the server 290 may further include: a second virtual information image generating unit 295 and a second data storage unit 296, where:
  • the second virtual information image generating unit 295 is adapted to generate a virtual information image corresponding to a stitched image of a preset frame image indicated by the server special effect generation interactive control instruction in response to the server special effect generation interactive control instruction;
  • the second data storage unit 296 is adapted to store the virtual information image corresponding to the stitched image of the preset frame image.
  • the server 290 may further include: a second virtual information image acquisition unit 297 and a third data transmission unit 298, where:
  • the second virtual information image acquisition unit 297 is adapted to, after receiving the image reconstruction instruction, generate an interactive instruction in response to user-side special effects from the interactive terminal, and acquire the virtual information image corresponding to the stitched image of the preset frame image;
  • the third data transmission unit 298 is adapted to send the virtual information image corresponding to the spliced image of the preset frame image to the interactive terminal, so that the interactive terminal frees multiple angles corresponding to the virtual viewpoint position at the time of the interactive frame
  • the video frame of the perspective video is combined with the virtual information image to obtain a multi-angle free perspective composite video frame for playback.
  • the augmented reality special effect input data in the embodiment of this specification can be the player special effect data and the goal special effect data in the basketball game scene mentioned above. It is understandable that the augmented reality special effect in the embodiment of this specification
  • the input data is not limited to the above example types. For basketball game scenes, it is also possible to generate corresponding augmented reality special effect input data based on various target objects contained in the scene images collected by images such as coaches and advertising logos.
  • one or more of them can be based on specific application scenarios, characteristics of target objects, related objects of target objects, and specific special effect generation models (such as preset three-dimensional models, preset machine learning models, etc.). These factors generate corresponding virtual information images.
  • each electronic device in the embodiments of this specification can be implemented by corresponding circuits.
  • the data acquisition unit involved in each of the foregoing embodiments can be implemented by a processor, a CPU, an input interface, etc.
  • the data storage unit involved in each of the foregoing embodiments can be implemented by various storage devices such as disks, EPROMs, and ROMs.
  • the data transmission units involved in the foregoing embodiments can be implemented by communication interfaces, communication lines (wired/wireless), etc., and will not be listed here.
  • the embodiments of this specification also provide a computer-readable storage medium on which computer instructions are stored, and the computer instructions can execute the depth map processing method described in any of the foregoing embodiments or the depth map processing method described in any of the foregoing embodiments when the computer instructions are run.
  • the steps of the video reconstruction method For specific steps, please refer to the introduction of the foregoing embodiment, which will not be repeated here.
  • the computer-readable storage medium may include, for example, any suitable type of memory unit, storage device, storage item, storage medium, storage device, storage item, storage medium and/or storage unit, for example, memory, Removable or non-removable media, erasable or non-erasable media, writable or rewritable media, digital or analog media, hard disk, floppy disk, compact disc read-only memory (CD-ROM), recordable optical disc ( CD-R), rewritable disc (CD-RW), optical disc, magnetic medium, magneto-optical medium, removable memory card or magnetic disk, various types of digital versatile disc (DVD), magnetic tape, cassette tape, etc.
  • any suitable type of memory unit for example, memory, Removable or non-removable media, erasable or non-erasable media, writable or rewritable media, digital or analog media, hard disk, floppy disk, compact disc read-only memory (CD-ROM), recordable optical disc ( CD-R), rew
  • Computer instructions may include any suitable type of code implemented by using any suitable high-level, low-level, object-oriented, visual, compiled and/or interpreted programming language, for example, source code, compiled code, interpreted code, Execution code, static code, dynamic code, encrypted code, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

数据处理方法、系统、相关设备和存储介质,其中一种数据处理方法包括:获取多角度自由视角视频的视频帧中的目标对象;获取基于所述目标对象的增强现实特效输入数据所生成的虚拟信息图像;将所述虚拟信息图像与对应的视频帧进行合成处理并展示。本说明书实施例方案能够兼顾用户视频观看过程中对丰富视觉体验和低时延的需求。

Description

数据处理方法、系统、相关设备和存储介质
本申请要求2020年06月10日递交的申请号为202010522454.0、发明名称为“数据处理方法、系统、相关设备和存储介质”中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本说明书实施例涉及数据处理技术领域,尤其涉及一种数据处理方法、系统及相关设备和存储介质。
背景技术
随着互联技术的不断发展,越来越多的视频平台不断通过提供更高清晰度或者观看流畅度更高的视频,来提高用户的观看体验。然而,针对现场体验感比较强的视频,例如一场体育比赛的视频,用户在观看过程中往往只能通过一个视点位置观看比赛,无法自己自由切换视点位置,来观看不同视角位置处的比赛画面或比赛过程,也就无法体验在现场一边移动视点一边看比赛的感觉。
6自由度(6Degree of Freedom,6DoF)技术就是为了提供高自由度观看体验的一种技术,用户可以在观看中通过交互手段,来调整视频观看的视角,从用户想观看的自由视点角度进行观看,从而大幅度的提升观看体验。
为进一步增强6DoF视频的观看体验,目前存在基于多角度自由视角技术的增强现实(Augmented Reality,AR)特效植入方案,然而现有将AR特效植入多角度自由视角视频的方案难以实现低时延播放,因此无法兼顾用户视频观看过程中对丰富视觉体验和低时延的需求。
发明内容
为满足用户视频观看过程中对丰富视觉体验的需求,本说明书实施例提供了一种数据处理方法、系统及相关设备和存储介质。
本说明书实施例提供了一种数据处理方法,包括:
获取多角度自由视角视频的视频帧中的目标对象;
获取基于所述目标对象的增强现实特效输入数据所生成的虚拟信息图像;
将所述虚拟信息图像与对应的视频帧进行合成处理并展示。
可选地,所述多角度自由视角视频基于从多路同步视频流中截取的指定帧时刻的多个同步视频帧所形成的图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建得到,其中,所述多个同步视频帧包含不同拍摄视角的帧图像。
可选地,所述获取基于所述目标对象的增强现实特效输入数据所生成的虚拟信息 图像,包括:
基于三维标定得到的所述目标对象在所述多角度自由视角视频的视频帧中的位置,得到与所述目标对象位置匹配的虚拟信息图像。
可选地,所述将所述虚拟信息图像与对应的视频帧进行合成处理并展示,包括:按照帧时刻排序以及相应帧时刻的虚拟视点位置,将相应帧时刻的虚拟信息图像与对应帧时刻的视频帧进行合成处理并展示。
可选地,所述将所述虚拟信息图像与对应的视频帧进行合成处理并展示,包括如下至少一种:
将所述虚拟信息图像与对应的视频帧进行融合处理,得到融合视频帧,对所述融合视频帧进行展示;
将所述虚拟信息图像叠加在对应的视频帧之上,得到叠加合成视频帧,对所述叠加合成视频帧进行展示。
可选地,所述对所述融合视频帧进行展示,包括:将所述融合视频帧插入待播放视频流进行播放展示。
可选地,所述获取多角度自由视角视频的视频帧中的目标对象,包括:响应于特效生成交互控制指令,获取所述多角度自由视角视频的视频帧中的目标对象。
可选地,所述获取基于所述目标对象的增强现实特效输入数据所生成的虚拟信息图像,包括:基于所述目标对象的增强现实特效输入数据,按照预设的特效生成方式,生成所述目标对象对应的虚拟信息图像。
本说明书实施例还提供了另一种数据处理方法,包括:
接收从多路同步视频流中截取的指定帧时刻的多个同步视频帧作为图像组合,所述多个同步视频帧包含不同拍摄视角的帧图像;
确定所述图像组合相应的参数数据;
确定所述图像组合中各帧图像的深度数据;
基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频的视频帧;
响应于特效生成指令,获取与所述特效生成指令指定的视频帧中的目标对象,获取所述目标对象的增强现实特效输入数据,并基于所述目标对象的增强现实特效输入数据,生成对应的虚拟信息图像;
将所述虚拟信息图像与所述指定的视频帧进行合成处理,得到合成视频帧;
将所述合成视频帧进行展示。
可选地,所述基于所述目标对象的增强现实特效输入数据,生成对应的虚拟信息 图像,包括:
将所述目标对象的增强现实特效输入数据作为输入,基于三维标定得到的所述目标对象在所述多角度自由视角视频的视频帧中的位置,采用预设的第一特效生成方式,生成对应视频帧中与所述目标对象匹配的虚拟信息图像。
可选地,所述响应于特效生成指令,获取与所述特效生成指令指定的视频帧中的目标对象,并获取所述目标对象的增强现实特效输入数据,包括:
根据服务端特效生成交互控制指令,确定特效输出类型;
获取所述目标对象的历史数据,根据所述特效输出类型对所述历史数据进行处理,得到与所述特效输出类型对应的增强现实特效输入数据。
可选地,所述基于所述目标对象的增强现实特效输入数据,生成对应的虚拟信息图像,包括以下至少一种:
将所述目标对象的增强现实特效输入数据输入至预设的三维模型,基于三维标定得到的所述目标对象在所述多角度自由视角视频的视频帧中的位置,输出与所述目标对象匹配的虚拟信息图像;
将所述目标对象的增强现实特效输入数据,输入至预设的机器学习模型,基于三维标定得到的所述目标对象在所述多角度自由视角视频的视频帧中的位置,输出与所述目标对象匹配的虚拟信息图像。
可选地,所述将所述虚拟信息图像与所述指定的视频帧进行合成处理,得到合成视频帧,包括:
基于三维标定得到的所述目标对象在所述指定的视频帧中的位置,将所述虚拟信息图像与所述指定的视频帧进行融合处理,得到融合视频帧。
可选地,所述将所述合成视频帧进行展示,包括:将所述合成视频帧插入至播放控制设备的待播放视频流以通过播放终端进行播放。
可选地,所述方法还包括:
基于所述图像组合的像素数据和深度数据,生成所述图像组合相应的拼接图像,所述拼接图像包括第一字段和第二字段,其中,所述第一字段包括所述图像组合中预设帧图像的像素数据,所述第二字段包括所述图像组合的深度数据;
存储所述图像组合的拼接图像及所述图像组合相应的参数数据;
响应于来自交互终端的图像重建指令,确定交互时刻的交互帧时刻信息,获取对应交互帧时刻的图像组合中预设帧图像的拼接图像及所述图像组合相应的参数数据并发送至所述交互终端,使得所述交互终端基于交互操作所确定的虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据及对应的参数数据,将选择的像素数据和深度数据进行组合渲染,重建得到所述交互帧时刻虚拟视点 位置对应的多角度自由视角视频的视频帧并进行播放。
可选地,所述方法还包括:
响应于服务端特效生成交互控制指令,生成所述服务端特效生成交互控制指令指示的预设视频帧的拼接图像对应的虚拟信息图像;
存储所述预设视频帧的拼接图像对应的虚拟信息图像。
可选地,在接收到所述图像重建指令后,还包括:
响应于来自交互终端的用户端特效生成交互指令,获取所述预设视频帧的拼接图像对应的虚拟信息图像;
将所述预设视频帧的拼接图像对应的虚拟信息图像发送至所述交互终端,使得所述交互终端将所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧与所述虚拟信息图像进行合成处理,得到合成视频帧并展示。
可选地,所述方法还包括:响应于用户端特效退出交互指令,停止获取所述预设视频帧的拼接图像对应的虚拟信息图像。
可选地,所述响应于来自交互终端的用户端特效生成交互指令,获取所述预设视频帧的拼接图像对应的虚拟信息图像,包括:
基于所述用户端特效生成交互指令,确定所述预设视频帧的拼接图像中对应的目标对象;
获取与所述预设视频帧中的目标对象匹配的虚拟信息图像。
可选地,所述获取与所述预设视频帧中的目标对象匹配的虚拟信息图像,包括:
获取预先基于三维标定得到的所述目标对象在所述预设视频帧中的位置所生成的与所述目标对象匹配的虚拟信息图像。
可选地,所述将所述预设视频帧的拼接图像对应的虚拟信息图像发送至所述交互终端,使得所述交互终端在所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧与所述虚拟信息图像进行合成处理,得到合成视频帧,包括:
将所述预设视频帧的拼接图像对应的虚拟信息图像发送至所述交互终端,使得所述交互终端在所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧之上叠加所述虚拟信息图像,得到叠加合成视频帧。
本说明书实施例还提供了另一种数据处理方法,包括:
响应于来自交互终端的图像重建指令,确定交互时刻的交互帧时刻信息,获取对应交互帧时刻的图像组合中预设帧图像的拼接图像及所述图像组合相应的参数数据并发送至所述交互终端,使得所述交互终端基于交互操作所确定的虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据及对应的参数数据,将选择的像素数据和深度数据进行组合渲染,重建得到所述交互帧时刻虚拟视点 位置对应的多角度自由视角视频的视频帧并进行播放;
响应于特效生成交互控制指令,获取所述特效生成交互控制指令指示的预设视频帧的拼接图像对应的虚拟信息图像;
将所述预设视频帧的拼接图像对应的虚拟信息图像发送至所述交互终端,使得所述交互终端将在所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧与所述虚拟信息图像进行合成处理,得到合成视频帧;
将所述合成视频帧进行展示。
可选地,所述预设视频帧的拼接图像基于所述交互帧时刻的图像组合的像素数据和深度数据生成,所述拼接图像包括第一字段和第二字段,其中,所述第一字段包括所述图像组合中所述预设帧图像的像素数据,所述第二字段包括所述图像组合的深度数据;
所述交互帧时刻的图像组合基于从多路同步视频流中截取指定帧时刻的多个同步视频帧得到,所述多个同步视频帧包含不同拍摄视角的帧图像。
可选地,所述响应于特效生成交互控制指令,获取所述特效生成交互控制指令指示的预设视频帧的拼接图像对应的虚拟信息图像,包括:
响应于特效生成交互控制指令,获取所述特效生成交互控制指令指示的视频帧中的目标对象;
获取预先基于所述目标对象的增强现实特效输入数据所生成的虚拟信息图像。
本说明书实施例还提供了另一种数据处理方法,包括:
实时进行多角度自由视角视频的视频帧的展示;
响应于对所述多角度自由视角视频的视频帧中特效展示标识的触发操作,获取对应于所述特效展示标识的指定帧时刻的视频帧的虚拟信息图像;
将所述虚拟信息图像与对应的视频帧进行合成处理并展示。
可选地,所述响应于对所述多角度自由视角视频的图像中特效展示标识的触发操作,获取对应于所述特效展示标识的指定帧时刻的视频帧的虚拟信息图像,包括:
获取与所述特效展示标识对应的指定帧时刻的视频帧中目标对象的虚拟信息图像。
可选地,所述将所述虚拟信息图像与对应的视频帧进行合成处理并展示,包括:
基于三维标定确定的所述目标对象在所述指定帧时刻的视频帧中的位置,将所述虚拟信息图像叠加在所述指定帧时刻的视频帧之上,得到叠加合成视频帧并展示。
本说明书实施例提供来一种数据处理系统,包括:
目标对象获取单元,适于获取多角度自由视角视频的视频帧中目标对象;
虚拟信息图像获取单元,适于获取基于所述目标对象的增强现实特效输入数据所 生成的虚拟信息图像;
图像合成单元,适于将将所述虚拟信息图像与对应的视频帧进行合成处理,得到合成视频帧;
展示单元,适于展示得到的合成视频帧。
本说明书实施例提供了另一种数据处理系统,包括:数据处理设备、服务器、播放控制设备以及播放终端,其中:
所述数据处理设备,适于基于视频帧截取指令,从现场采集区域不同位置实时同步采集的多路视频数据流中对指定帧时刻的视频帧截取得到多个同步视频帧,将获得的所述指定帧时刻的多个同步视频帧上传至所述服务器;
所述服务器,适于接收所述数据处理设备上传的多个同步视频帧作为图像组合,确定所述图像组合相应的参数数据以及所述图像组合中各帧图像的深度数据,并基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频的视频帧;以及响应于特效生成指令,获取与所述特效生成指令指定的视频帧中的目标对象,获取所述目标对象的增强现实特效输入数据,并基于所述目标对象的增强现实特效输入数据,生成对应的虚拟信息图像,将所述虚拟信息图像与所述指定的视频帧进行合成处理,得到合成视频帧,并将所述合成视频帧输入至播放控制设备;
所述播放控制设备,适于将所述合成视频帧插入至待播放视频流;
所述播放终端,适于接收来自所述播放控制设备的待播放视频流并进行实时播放。
可选地,所述系统还包括交互终端;其中:
所述服务器,还适于基于所述图像组合的像素数据和深度数据,生成所述图像组合相应的拼接图像,所述拼接图像包括第一字段和第二字段,其中,所述第一字段包括所述图像组合中预设帧图像的像素数据,所述第二字段包括所述图像组合的深度数据;以及存储所述图像组合的拼接图像及所述图像组合相应的参数数据;以及响应于来自所述交互终端的图像重建指令,确定交互时刻的交互帧时刻信息,获取对应交互帧时刻的图像组合中预设帧图像的拼接图像及所述图像组合相应的参数数据并发送至所述交互终端;
所述交互终端,适于基于交互操作,向所述服务器发送所述图像重建指令,并基于交互操作所确定的虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据以及对应的参数数据,将选择的像素数据和深度数据进行组合渲染,重建得到所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧并进行播放。
可选地,所述服务器,还适于根据服务端特效生成交互控制指令,生成所述服务端特效生成交互控制指令指示的预设视频帧的拼接图像对应的虚拟信息图像并存储。
可选地,所述服务器,还适于响应于来自交互终端的用户端特效生成交互指令,获取所述预设视频帧的拼接图像对应的虚拟信息图像,将所述预设视频帧的拼接图像对应的虚拟信息图像发送至所述交互终端;
所述交互终端,适于将所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧与所述虚拟信息图像进行合成处理,得到合成视频帧并进行播放展示。
本说明书实施例提供了一种服务器,包括:
数据接收单元,适于接收从多路同步视频流中截取的指定帧时刻的多个同步视频帧作为图像组合,所述多个同步视频帧包含不同拍摄视角的帧图像;
参数数据计算单元,适于确定所述图像组合相应的参数数据;
深度数据计算单元,适于确定所述图像组合中各帧图像的深度数据;
视频数据获取单元,适于基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频的视频帧;
第一虚拟信息图像生成单元,适于响应于特效生成指令,获取与所述特效生成指令指定的视频帧中的目标对象,获取所述目标对象的增强现实特效输入数据,并基于所述目标对象的增强现实特效输入数据,生成对应虚拟信息图像;
图像合成单元,适于将所述虚拟信息图像与所述指定的视频帧进行合成处理,得到合成视频帧;
第一数据传输单元,适于将所述合成视频帧输出以插入待播放视频流。
可选地,所述第一虚拟信息图像生成单元,适于将所述目标对象的增强现实特效输入数据作为输入,基于三维标定得到的所述目标对象在所述多角度自由视角视频的视频帧中的位置,采用预设的第一特效生成方式,生成对应视频帧中与所述目标对象匹配的虚拟信息图像。
本说明书实施例提供了另一种服务器,包括:
图像重建单元,适于响应于来自交互终端的图像重建指令,确定交互时刻的交互帧时刻信息,获取对应交互帧时刻的图像组合中预设帧图像的拼接图像及所述图像组合相应的参数数据;
虚拟信息图像生成单元,适于响应于特效生成交互控制指令,生成所述特效生成交互控制指令指示的视频帧的图像组合的拼接图像对应的虚拟信息图像;
数据传输单元,适于与交互终端进行数据交互,包括:将所述对应交互帧时刻的图像组合中预设视频帧的拼接图像及所述图像组合相应的参数数据传输至所述交互 终端,使得所述交互终端基于交互操作所确定的虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据及对应的参数数据,将选择的像素数据和深度数据进行组合渲染,重建得到所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的图像并进行播放;以及将所述特效生成交互控制指令指示的预设帧图像的拼接图像对应的虚拟信息图像传输至所述交互终端,使得所述交互终端将所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧与所述虚拟信息图像进行合成处理,得到多角度自由视角合成视频帧并进行播放。
本说明书实施例还提供了一种交互终端,包括:
第一展示单元,适于实时进行多角度自由视角视频的图像的展示,其中,所述多角度自由视角视频的图像是通过指定帧时刻的多个同步视频帧图像形成的图像组合的参数数据、所述图像组合的像素数据和深度数据重建得到,所述多个同步视频帧包括不同拍摄视角的帧图像;
特效数据获取单元,适于响应于对所述多角度自由视角视频图像中特效展示标识的触发操作,获取对应于所述特效展示标识的指定帧时刻的虚拟信息图像;
第二展示单元,适于将所述虚拟信息图像叠加展示在所述多角度自由视角视频的视频帧上。
本说明书实施例提供了一种电子设备,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机指令,所述处理器运行所述计算机指令时执行前述任一实施例所述方法的步骤。
本说明书实施例提供了一种计算机可读存储介质,其上存储有计算机指令,所述计算机指令运行时执行前述任一实施例所述方法的步骤。
与现有技术相比,本说明书实施例的技术方案具有以下有益效果:
采用本说明书一些实施例中数据处理方案,在多角度自由视角视频的实时播放过程中,通过获取所述多角度自由视角视频的视频帧中的目标对象,进而获取基于所述目标对象的增强现实特效输入数据所生成的虚拟信息图像,并将所述虚拟信息图像与对应的视频帧进行合成处理并展示。通过这一过程,只需要在多角度自由视角视频播放过程中,对于需要植入AR特效的视频帧与所述视频帧中目标对象所对应的虚拟信息图像合成即可得到融合了AR特效的视频帧,无须先为一个多角度自由视角视频预先生成所有的多角度自由视角视频融合AR特效的视频帧后再去播放,因此可以实现在多角度自由视角视频中AR特效的精准而迅速地植入,可以满足用户观看低时延视频和视觉体验丰富性的需求。
进一步地,由于所述多角度自由视角视频是基于从多路同步视频流中截取的指定帧时刻的不同拍摄视角的多个同步视频帧所形成的图像组合相应的参数数据、所述图 像组合中预设帧时刻的像素数据和深度数据,对预设的虚拟视点路径进行重建得到,不需要基于所述多路同步视频流中所有的视频帧进行重建,因此可以减小数据处理量和数据传输量,降低多角度自由视角视频的传输时延。
进一步地,基于三维标定得到的所述目标对象在所述多角度自由视角视频的视频帧中的位置,得到与所述目标对象位置匹配的虚拟信息图像,可以使得到的虚拟信息图像与所述目标对象在三维空间中的位置更加匹配,进而所展示的虚拟信息图像更加符合三维空间中的真实状态,因而所展示的合成视频帧更加真实生动,故可以增强用户的视觉体验。
进一步地,随着虚拟视点的变化,所述目标对象在所述多角度自由视角视频中的动态变化,因此,按照帧时刻排序以及相应帧时刻的虚拟视点位置,将相应帧时刻的虚拟信息图像与对象帧时刻的视频帧进行合成处理并展示,则所得到的合成视频帧中虚拟信息图像可以与所述多角度自由视角视频的图像帧中的目标对象同步变化,从而使得合成的视频帧更加逼真生动,增强用户观看所述多角度自由视角视频的沉浸感,进一步提高用户体验。
采用本说明书一些实施例中的数据处理方案,对于接收到从多路视频流中截取的指定帧时刻的多个同步视频帧所形成的图像组合,通过确定所述图像组合相应的参数数据和所述图像组合中各帧图像的深度数据,一方面,基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频的图像的视频帧;另一方面,响应于特效生成指令,获取与所述特效生成指令指定的视频帧中的目标对象,获取所述目标对象的增强现实特效输入数据,并基于所述目标对象的增强现实特效输入数据,生成对应的虚拟信息图像,并将所述虚拟信息图像与所述指定的视频帧进行合成处理,得到合成视频帧并展示。在这一数据处理过程中,由于仅从多路同步视频流中截取指定帧时刻的同步视频帧进行多角度自由视角视频的重建,以及生成与特效生成指令指定的视频帧中的目标对象对应的虚拟信息图像,因此无需巨量的同步视频流数据的上传,这一分布式系统架构可以节省大量的传输资源及服务器处理资源,且在网络传输带宽有限的条件下,可以实现具有增强现实特效的合成视频帧的实时生成,故能够实现多角度自由视角增强现实特效的视频的低时延播放,因而可以兼顾用户视频观看过程中对丰富视觉体验和低时延的双重需求。
此外,同步视频帧的截取、多角度自由视角视频的重建,虚拟信息图像的生成,以及多角度自由视角视频和虚拟信息图像的合成等均由不同的设备完成,这一分布式系统架构可以避免同一设备进行大量的数据处理,因此可以提高数据处理效率,减小传输时延。
采用本说明书实施例中的一些数据处理方案,响应于特效生成交互控制指令,获取所述特效生成交互控制指令指示的预设视频帧的拼接图像对应的虚拟信息图像并发送至所述交互终端,使得所述交互终端将在所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧与所述虚拟信息图像进行合成处理,得到合成视频帧并展示,可以满足用户对视觉体验丰富性的需求和实时互动需求,提升用户互动体验。
附图说明
图1示出了本说明书实施例中一种具体应用场景中的数据处理系统的结构示意图;
图2示出了本说明书实施例中一种数据处理方法的流程图;
图3示出了本说明书实施例中一种数据处理系统的结构示意图图;
图4示出了本说明书实施例中另一种数据处理方法的流程图;
图5示出了本说明书实施例中一视频帧图像示意图;
图6示出了本说明书实施例中一种三维标定方式示意图;
图7示出了本说明书实施例中另一种数据处理方法的流程图;
图8至图12示出了本说明书实施例中一种交互终端的交互界面示意图;
图13示出了本说明书实施例中另一种交互终端的交互界面示意图;
图14示出了本说明书实施例中用另一种数据处理方法的流程图;
图15示出了本说明书实施例中另一种数据处理系统的结构示意图;
图16示出了本说明书实施例中另一种数据处理系统的结构示意图;
图17示出了本说明书实施例中一种服务器集群架构示意图;
图18至图20示出了本说明书实施例中一种播放终端的播放界面的视频效果示意图;
图21示出了本发明实施例中另一种交互终端的结构示意图;
图22示出了本发明实施例中另一种交互终端的结构示意图;
图23至图26示出了本说明书实施例中一种交互终端的显示界面的视频效果示意图;
图27示出了本说明书实施例中一种服务器的结构示意图;
图28示出了本说明书实施例中一种服务器的结构示意图;
图29示出了本说明书实施例中另一种服务器的结构示意图。
具体实施方式
在传统的直播、转播和录播等播放场景中,用户在观看过程中往往只能通过一个视点位置观看比赛,无法自己自由切换视点位置,来观看不同视角位置处的比赛画面或比赛过程,也就无法体验在现场一边移动视点一边看比赛的感觉。
采用6自由度(6Degree of Freedom,6DoF)技术可以提供高自由度观看体验,用户可以在观看过程中通过交互手段,来调整视频观看的视角,从想观看的自由视点角度进行观看,从而大幅度的提升观看体验。
伴随着用户对丰富视觉体验的需求,出现了在视频中植入AR特效的需求。目前,有在二维或者三维视频中植入AR特效的方案,然而,由于多角度自由视角的视频及AR特效数据均会涉及大量的图像处理、渲染操作以及巨量视频数据的传输,由于人们在视频观看体验中对时延的高敏感度,如直播或准直播场景,需要实现低时延的视频播放,因此难以兼顾用户对视频低时延播放和丰富视觉体验的需求。
为使本领域技术人员更好的理解低时延的多角度自由视角视频的播放场景,以下介绍一种能够实现多角度自由视角视频播放的数据处理系统。采用所述数据处理系统,可以多角度自由视角视频的低时延播放,可以应用于直播、转播等应用场景,也可以应用于基于用户交互的视频播放。
参见图1所示的一种具体应用场景中的数据处理系统的结构示意图,其中示出了一场篮球赛的数据处理系统的布置场景,数据处理系统10包括由多个采集设备组成的采集阵列11、数据处理设备12、云端的服务器集群13、播放控制设备14,播放终端15和交互终端16。采用数据处理系统10,可以实现多角度自由视角视频的重建,用户可以观看低时延的多角度自由视角视频。
具体而言,参照图1,以左侧的篮球框作为核心看点,以核心看点为圆心,与核心看点位于同一平面的扇形区域作为预设的多角度自由视角范围。所述采集阵列11中各采集设备可以根据所述预设的多角度自由视角范围,成扇形置于现场采集区域不同位置,可以分别从相应角度实时同步采集视频数据流。
所述数据处理设备12可以通过无线局域网向所述采集阵列11中各采集设备分别发送拉流指令,所述采集阵列11中各采集设备基于所述数据处理设备12发送的拉流指令,将获得的视频数据流实时传输至所述数据处理设备12。
当所述数据处理设备12接收到视频帧截取指令时,从接收到的多路视频数据流中对指定帧时刻的视频帧截取得到多个同步视频帧,并将获得的所述指定帧时刻的多个同步视频帧上传至云端的服务器集群13。
相应地,云端的服务器集群13将接收的多个同步视频帧作为图像组合,确定所述图像组合相应的参数数据及所述图像组合中各帧图像的深度数据,并基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频的视频帧。
在具体实施中,云端的服务器集群13可以采用如下方式存储所述图像组合的像素数据及深度数据:
基于所述图像组合的像素数据及深度数据,生成对应帧时刻的拼接图像,所述拼接图像包括第一字段和第二字段,其中,所述第一字段包括所述图像组合中预设帧图像的像素数据,所述第二字段包括所述图像组合中预设帧图像的深度数据的第二字段。获取的拼接图像和相应的参数数据可以存入数据文件中,当需要获取拼接图像或参数数据时,可以根据数据文件的头文件中相应的存储地址,从相应的存储空间中读取。
然后,播放控制设备14可以将接收到的所述多角度自由视角视频的视频帧插入待播放数据流中,播放终端15接收来自所述播放控制设备14的待播放数据流并进行实时播放。其中,播放控制设备14可以为人工播放控制设备,也可以为虚拟播放控制设备。在具体实施中,可以设置专门的可以自动切换视频流的服务器作为虚拟播放控制设备进行数据源的控制。导播控制设备如导播台可以作为本发明实施例中的一种播放控制设备。
当云端的服务器集群13收到的来自交互终端16的图像重建指令时,可以提取所述相应图像组合中预设视频帧的拼接图像及相应图像组合相应的参数数据并传输至所述交互终端16。
交互终端16基于触发操作,确定交互帧时刻信息,向服务器集群13发送包含交互帧时刻信息的图像重建指令,接收从云端的服务器集群13返回的对应交互帧时刻的图像组合中预设视频帧的拼接图像及对应的参数数据,并基于交互操作确定虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据及对应的参数数据,将选择的像素数据和深度数据进行组合渲染,重建得到所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧并进行播放。
通常而言,视频中的实体不会是完全静止的,例如采用上述数据处理系统,在篮球比赛过程中,采集阵列采集到的实体如运动员、篮球、裁判员等大都处于运动状态。相应地,采集到的视频帧的图像组合中的纹理数据和像素数据均也随着时间变化而不断地变动。
采用上述数据处理系统,一方面,用户通过播放终端15可以直接观看插入了多角度自由视角视频帧的视频,例如观看篮球赛直播;另一方面,用户通过交互终端16观看视频过程中,通过交互操作,可以观看到交互帧时刻的多角度自由视角视频。可以理解的是,以上数据处理系统10中也可以仅包含播放终端15或仅包含交互终端16,或者通过同一终端设备作为所述播放终端15和交互终端16。
本领域技术人员可以理解,多角度自由视角视频的数据量相对很大,AR特效对应的虚拟信息图像数据通常数据量也较大,此外,由上述数据处理系统的工作机制可知,若要在实现多角度自由视角视频的重建的同时,对重建的多角度自由视角视频植 入AR特效,则更会涉及到大量数据的处理,以及多个设备的协同配合,复杂度以及数据处理量对于网络中数据处理及传输带宽资源而言更是难以实现,因此在多角度自由视角视频的播放过程中,如何植入AR特效以满足用户的视觉体验需求成为一个难以解决的问题。
有鉴于此,本说明书实施例提供一种方案,参照图2所示的数据处理方法的流程图,具体可以包括如下步骤:
S21,获取多角度自由视角视频的视频帧中的目标对象。
在具体实施中,可以基于从多路同步视频流中截取的指定帧时刻的多个同步视频帧所形成的图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建得到所述多角度自由视角视频的视频帧,其中,所述多个同步视频帧包含不同拍摄视角的帧图像。
在具体实施中,可以基于某些指示信息(例如特效展示标识)确定多角度自由视角视频的图像中的某些对象作为目标对象,所述指示信息可以基于用户交互生成,也可以基于某些预设触发条件或第三方指令得到。例如,可以响应于特效生成交互控制指令,获取所述多角度自由视角视频的视频帧中的目标对象,可以在所述交互控制指令中设置所述指示信息,所述指示信息具体可以为目标对象的标识信息。作为具体示例,可以基于多角度自由视角视频框架结构,确定所述目标对象对应的指示信息的具体形式。
在具体实施中,目标对象可以是多角度自由视角视频中的视频帧或视频帧序列中的特定实体,例如,特定的人物、动物、物体、光束等环境场、环境空间等。本说明书实施例中并不限定目标对象的具体形态。
在本说明书一些实施例中,所述多角度自由视角视频可以为6DoF视频。
S22,获取基于所述目标对象的增强现实特效输入数据所生成的虚拟信息图像。
在本说明书实施例中,所植入的AR特效以虚拟信息图像的形式呈现。所述虚拟信息图像可以基于所述目标对象的增强现实特效输入数据生成。在确定目标对象后,可以获取基于所述目标对象的增强现实特效输入数据所生成的虚拟信息图像。
在本说明书实施例中,所述目标对象对应的虚拟信息图像可以预先生成,也可以响应于特效生成指令即时生成。
在具体实施中,可以基于三维标定得到的所述目标对象在所述多角度自由视角视频的视频帧中的位置,得到与所述目标对象位置匹配的虚拟信息图像,从而可以使得到的虚拟信息图像与所述目标对象在三维空间中的位置更加匹配,进而所展示的虚拟信息图像更加符合三维空间中的真实状态,因而所展示的合成视频帧更加真实生动,增强用户的视觉体验。
在具体实施中,可以基于目标对象的增强现实特效输入数据,按照预设的特效生成方式,生成所述目标对象对应的虚拟信息图像。
S23,将所述虚拟信息图像与对应的视频帧进行合成处理并展示。
在具体实施中,可以在终端侧展示合成处理后得到的合成视频帧。
其中,基于所述虚拟信息图像对应的视频帧,所得到的合成视频帧可以为单帧,也可以为多帧。若为多帧,则可以按照帧时刻排序以及相应帧时刻的虚拟视点位置,将相应帧时刻的虚拟视点图像与对应帧时刻的视频帧进行合成处理并展示。
由于可以根据相应帧时刻的虚拟视点位置,生成与所述虚拟视点位置匹配的虚拟信息图像,进而按照帧时刻排序以及相应帧时刻的虚拟视点位置,将相应帧时刻的虚拟信息图像与相应帧时刻的视频帧进行合成处理,从而可以随着虚拟视点的变化而自动地生成与所述相应帧时刻的虚拟视点位置匹配的合成视频帧,从而使得所得到的合成视频帧的增强现实特效更加逼真生动,故可以进一步增强用户的视觉体验。
在具体实施中,可以有多种方式将所述虚拟信息图像与对应的视频帧进行合成处理并展示,以下给出两种具体可实现示例:
示例一:将所述虚拟信息图像与对应的视频帧进行融合处理,得到融合视频帧,对所述融合视频帧进行展示;
示例二:将所述虚拟信息图像叠加在对应的视频帧之上,得到叠加合成视频帧,对所述叠加合成视频帧进行展示。
在具体实施中,可以将得到的合成视频帧直接展示;也可以将得到的合成视频帧插入待播放的视频流进行播放展示。例如,可以将所述融合视频帧插入待播放视频流进行播放展示。
采用本说明书实施例,在多角度自由视角视频的实时播放过程中,通过获取所述多角度自由视角视频的视频帧中的目标对象,进而获取基于所述目标对象的增强现实特效输入数据所生成的虚拟信息图像,并将所述虚拟信息图像与对应的视频帧进行合成处理并展示。通过这一过程,只需要在多角度自由视角视频播放过程中,对于需要植入AR特效的视频帧与所述视频帧中目标对象所对应的虚拟信息图像合成即可得到融合了AR特效的视频帧,无须先为一个多角度自由视角视频预先生成所有的多角度自由视角视频融合AR特效的视频帧后再去播放,因此可以实现在多角度自由视角视频中AR特效的精准而迅速地植入,可以满足用户观看低时延视频和视觉体验丰富性的需求。
如前所述,在多角度自由视角视频中植入AR特效对应的虚拟信息图像适用于多种应用场景,为使本领域技术人员更好地理解和实施本说明书实施例,以下通过交互式和非交互式两种应用场景分别展开进行阐述。
其中,非交互式应用场景,在此应用场景中,用户观看植入AR特效的多角度自由视角视频无须用户交互触发,可以在服务端控制植入AR特效的时机、位置、植入内容等,用户在终端侧随着视频流的播放即可看到植入AR特效的多角度自由视角视频的自动展示。例如,在直播或准直播过程中,通过在多角度自由视角视频中植入AR特效,可以生成植入AR特效的多角度自由视角视频合成视频帧,满足用户对视频低时延播放和丰富视觉体验的需求。
而交互式应用场景,用户可以在多角度自由视角视频观看过程中,主动触发AR特效的植入,由于采用本说明书实施例中的方案,可以在多角度自由视角视频中快速地植入AR的方案,避免由于生成过程持续较长而出现视频播放过程卡顿等现象,从而可以实现基于用户交互,生成植入AR特效的多角度自由视角视频合成视频真烦,满足用户对视频低时延播放和丰富视觉体验的需求。
在具体实施中,对应于交互式场景,可以响应于用户端的特效生成交互控制指令,获取所述多角度自由视角视频的视频帧中的目标对象。之后,可以获取基于所述目标对象的增强现实特效输入数据所生成的虚拟信息图像,并将所述虚拟信息图像与对应的多角度自由视角视频的视频帧进行合成入力并展示。
其中,所述目标对象对应的虚拟信息图像可以预先生成,也可以即时生成。例如,在非交互式场景,可以响应于服务端特效生成指令生成;对于交互场景,可以响应于服务端特效生成指令预先生成,或者响应于交互终端的特效生成交互控制指令,即时生成。
在本说明书一些实施例中,所述目标对象可以为图像中的特定实体,例如,特定的人物、动物、物体、环境空间等,则可以根据所述特效生成交互控制指令中的目标对象指示信息(例如特效展示标识)所指示的目标对象,获取所述目标对象的增强现实特效输入数据,基于所述目标对象的增强现实特效输入数据,按照预设的特效生成方式,生成所述目标对象对应的虚拟信息图像。具体的特效生成方式可以参见后续实施例中的一些示例,此处不作详细描述。
在具体实施中,为将数据处理将多角度自由视角视频的视频帧与所述视频帧中目标图像对应的虚拟信息图像合成处理,可以预先将生成多角度自由视角视频的数据以及增强现实特效输入数据等所有或部分数据预先下载至交互终端,在交互终端可以执行如下部分或全部操作:多角度自由视角视频的重建,生成虚拟信息图像,以及多角度自由视角视频的视频帧的渲染和虚拟信息图像的叠加渲染,也可以在服务端(如云端服务器)生成多角度自由视角视频、虚拟信息图像,仅在交互终端执行多角度自由视角视频的视频帧和对应的虚拟信息图像的合成操作。
此外,在非交互式场景中,可以将所述多角度自由视角视频合成视频帧插入至待 播放数据流中。具体而言,对于包含合成视频帧的多角度自由视角视频,可以作为多个待播放数据流中的其中一个视频流,作为待选择播放的视频流。例如,可以将所述包含多角度自由视角视频帧的视频流,作为播放控制设备(如:导播控制设备)的一个输入视频流,供所述播放控制设备选择使用。
需要说明的是,在某些情况下,同一用户可能既存在非交互场景中观看植入AR特效的多角度自由视角视频的需求,也存在交互场景中观看植入AR特效的多角度自由视角视频的需求,例如用户在观看直播过程中,对于某个精彩画面,或者漏看的某一时间段内的视频,可能退回观看回放视频,在此过程中,可以满足用户的互动需求。相应地,则会有非互动场景下得到的植入了AR特效的多角度自由视角视频合成视频帧和互动场景下得到的植入了AR特效的多角度自由视角视频合成视频帧。
为使本领域技术人员更加清楚地了解及实施本说明书实施例,下面将结合本说明书实施例中的附图,对本说明书实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本说明书的一部分实施例,而不是全部实施例。基于本说明书中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都应属于本说明书保护的范围。
以下首先参照附图,通过具体实施例对本说明书实施例中非交互式应用场景的方案进行详细阐述。
本说明书一些实施例中,采用分布式系统架构的数据处理系统,对于接收到的从多路视频流中截取的指定帧时刻的多个同步视频帧所形成的图像组合,通过确定所述图像组合相应的参数数据和所述图像组合中各视频帧的深度数据,一方面,基于所述图像组合相应的参数数据、所述图像组合中预设视频帧的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,可以获得相应的多角度自由视角视频的视频帧;另一方面,响应于特效生成指令,可以获取与所述特效生成指令指定的视频帧中的目标对象,获取所述目标对象的增强现实特效输入数据,并基于所述目标对象的增强现实特效输入数据,生成对应的虚拟信息图像,并将所述虚拟信息图像与所述指定的视频帧进行合成处理,得到合成视频帧并展示参照图3所示的一种应用场景的数据处理系统的结构示意图,数据处理系统30包括:数据处理设备31、服务器32、播放控制设备33和播放终端34。
其中,数据处理设备31可以对现场采集区域中采集阵列采集到的视频帧(包括单个的帧图像)进行视频帧的截取。通过对待生成多角度自由视角图像的视频帧进行截取,可以避免大量的数据传输及数据处理。之后,由服务器32进行多角度自由视角视频的视频帧的生成,以及响应于特效生成指令,进行虚拟信息图像的生成,以及所述虚拟信息图像和所述多角度自由视角视频的视频帧的合成处理,得到多角度自由 视角视频合成视频帧,可以充分利用服务器32强大的计算能力,即可快速地生成多角度自由视角视频合成视频帧,从而可以及时地插入播放控制设备33的待播放数据流中,以低廉的成本实现融入AR特效的多角度自由视角视频的播放,满足用户对视频低时延播放和丰富视觉体验的需求。
参照图4所示的数据处理方法的流程图,为满足用户对视频低时延播放和丰富视觉体验的需求,具体可以通过如下步骤对视频数据进行处理:
S41,接收从多路同步视频流中截取的指定帧时刻的多个同步视频帧作为图像组合,所述多个同步视频帧包含不同拍摄视角的帧图像。
在具体实施中,可以由数据处理设备根据接收到的视频帧截取指令,从多路同步视频流中截取指定帧时刻的多个视频帧并上传,例如可以上传至云端服务器或者服务集群。
作为一具体场景示例:现场采集区域可以在不同位置部署多个采集设备组成的采集阵列,所述采集阵列可以实时同步采集多路视频数据流并上传至所述数据处理设备,数据处理设备在接收到视频帧截取指令时,可以根据所述视频帧截取指令中包含的指定帧时刻的信息,从所述多路视频数据流中截取相应帧时刻的视频帧。其中,所述指定帧时刻可以以帧为单位,将第N至M帧作为指定帧时刻,N和M均为不小于1的整数,且N≤M;或者,所述指定帧时刻也可以以时间为单位,将第X至Y秒作为指定帧时刻,X和Y均为正数,且X≤Y。因此,多个同步视频帧可以包括指定帧时刻对应的所有帧级同步的视频帧,各视频帧的像素数据形成对应的帧图像。
例如,数据处理设备根据接收到的视频帧截取指令,可以获得指定帧时刻为多路视频数据流中的第2帧,则数据处理设备分别截取各路视频数据流中第2帧的视频帧,且截取的各路视频数据流的第2帧的视频帧之间帧级同步,作为获取得到的多个同步视频帧。
又例如,假设采集帧率设置为25fps,即1秒采集25帧,数据处理设备根据接收到的视频帧截取指令,可以获得指定帧时刻为多路视频数据流中的第1秒内的视频帧,则数据处理设备可以分别截取各路视频数据流中第1秒内的25个视频帧,且截取的各路视频数据流中第1秒内的第1个视频帧之间帧级同步,截取的各路视频数据流中第1秒内的第2个视频帧之间帧级同步,直至取的各路视频数据流中第1秒内的第25个视频帧之间帧级同步,作为获取得到的多个同步视频帧。
还例如,数据处理设备根据接收到的视频帧截取指令,可以获得指定帧时刻为多路视频数据流中的第2帧和第3帧,则数据处理设备可以分别截取各路视频数据流中第2帧的视频帧和第3帧的视频帧,且截取的各路视频数据流的第2帧的视频帧之间和第3帧的视频帧之间分别帧级同步,作为多个同步视频帧。
在具体实施中,所述多路视频数据流可以是采用压缩格式的视频数据流,也可以是采用非压缩格式的视频数据流。
S42,确定所述图像组合相应的参数数据。
在具体实施中,可以通过参数矩阵来获得所述图像组合相应的参数数据,所述参数矩阵可以包括内参矩阵,外参矩阵、旋转矩阵和平移矩阵等。由此,可以确定空间物体表面指定点的三维几何位置与其在图像组合中对应点之间的相互关系。
在本发明的实施例中,可以采用运动重构(Structure From Motion,SFM)算法,基于参数矩阵,对获取到的图像组合进行特征提取、特征匹配和全局优化,获得的参数估计值作为图像组合相应的参数数据。其中,特征提取采用的算法可以包括以下任意一种:尺度不变特征变换(Scale-Invariant Feature Transform,SIFT)算法、加速稳健特征(Speeded-Up Robust Features,SURF)算法、加速段测试的特征(Features from Accelerated Segment Test,FAST)算法。特征匹配采用的算法可以包括:欧式距离计算方法、随机样本一致性(Random Sample Consensus,RANSC)算法等。全局优化的算法可以包括:光束法平差(Bundle Adjustment,BA)等。
S43,确定所述图像组合中各帧图像的深度数据。
在具体实施中,可以基于所述图像组合中的多个帧图像,确定各帧图像的深度数据。其中,深度数据可以包括与图像组合中各帧图像的像素对应的深度值。采集点到现场中各个点的距离可以作为上述深度值,深度值可以直接反映待观看区域中可见表面的几何形状。例如,以拍摄坐标系的原点作为光心,深度值可以是现场中各个点沿着拍摄光轴到光心的距离。本领域技术人员可以理解的是,上述距离可以是相对数值,多个帧图像可以采用相同的基准。
在本发明一实施例中,可以采用双目立体视觉的算法,计算各帧图像的深度数据。除此之外,深度数据还可以通过对帧图像的光度特征、明暗特征等特征进行分析间接估算得到。
在本发明另一实施例中,可以采用多视点三维重建(Mult-View Stereo,MVS)算法进行帧图像重建。重建过程中可以采用所有像素进行重建,也可以对像素进行降采样仅用部分像素重建。具体而言,可以对每个帧图像的像素点都进行匹配,重建每个像素点的三维坐标,获得具有图像一致性的点,然后计算各个帧图像的深度数据。或者,可以对选取的帧图像的像素点进行匹配,重建各选取的帧图像的像素点的三维坐标,获得具有图像一致性的点,然后计算相应帧图像的深度数据。其中,帧图像的像素数据与计算得到的深度数据对应,选取帧图像的方式可以根据具体情景来设定,比如,可以根据需要计算深度数据的帧图像与其他帧图像之间的距离,选择部分帧图像。
S44,基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频的视频帧。
在具体实施中,帧图像的像素数据可以为YUV数据或RGB数据中任意一种,或者也可以是其它能够对帧图像进行表达的数据;深度数据可以包括与帧图像的像素数据一一对应的深度值,或者,可以是对与帧图像的像素数据一一对应的深度值集合中选取的部分数值,具体的选取方式根据具体的情景而定;所述虚拟视点选自多角度自由视角范围,所述多角度自由视角范围为支持对待观看区域进行视点的切换观看的范围。
在具体实施中,预设帧图像可以是图像组合中所有的帧图像,也可以是选择的部分帧图像。其中,选取的方式可以根据具体情景来设定,例如,可以根据采集点之间的位置关系,选择图像组合中相应位置的部分帧图像;又例如,可以根据想要获取的帧时刻或帧时段,选择图像组合中相应帧时刻的部分帧图像。
由于所述预设的帧图像可以对应不同的帧时刻,因此,可以对虚拟视点路径中各虚拟视点与各帧时刻进行对应,根据各虚拟视点相对应的帧时刻获取相应的帧图像,然后基于所述图像组合相应的参数数据、各虚拟视点的帧时刻对应的帧图像的深度数据和像素数据,对各虚拟视点进行帧图像重建,获得相应的多角度自由视角视频的视频帧。因此,在具体实施中,除了可以实现某一个时刻的多角度自由视角图像,还可以实现时序上连续的或非连续的多角度自由视角视频。
在本发明一实施例中,所述图像组合包括A个同步视频帧,其中,a1个同步视频帧对应第一帧时刻,a2个同步视频帧对应第二帧时刻,a1+a2=A;并且,预设有B个虚拟视点组成的虚拟视点路径,其中b1个虚拟视点与第一帧时刻相对应,b2个虚拟视点与第二帧时刻相对应,b1+b2≤2B,则基于所述图像组合相应的参数数据、第一帧时刻的a1个同步视频帧的帧图像的像素数据和深度数据,对b1个虚拟视点组成的路径进行第一帧图像重建,基于所述图像组合相应的参数数据、第二帧时刻的a2个同步视频帧的帧图像的像素数据和深度数据,对b2个虚拟视点组成的路径进行第二帧图像重建,最终获得相应的多角度自由视角视频的视频帧。
可以理解的是,可以将指定帧时刻和虚拟视点进行更细的划分,由此得到更多的不同帧时刻对应的同步视频帧和虚拟视点,实现随着时间自由转换视点,并可提升多角度自由视角视频视点切换的平滑性。
可以理解的是,上述实施例仅为举例说明,并非对具体实施方式的限定。
在本说明书实施例中,可以采用基于深度图的图像绘制(Depth Image Based Rendering,DIBR)算法,根据所述图像组合相应的参数数据和预设的虚拟视点路径, 对预设的帧图像的像素数据和深度数据进行组合渲染,从而实现基于预设的虚拟视点路径的帧图像重建,获得相应的多角度自由视角视频的视频帧。
S45,响应于特效生成指令,获取与所述特效生成指令指定的视频帧中的目标对象,获取所述目标对象的增强现实特效输入数据,并基于所述目标对象的增强现实特效输入数据,生成对应的虚拟信息图像。
在具体实施中,可以响应于所述特效生成指令,将所述目标对象的增强现实特效输入数据作为输入,基于三维标定得到的所述目标对象在所述多角度自由视角视频的视频帧中的位置,并采用预设的第一特效生成方式,生成对应视频帧中与所述目标对象匹配的虚拟信息图像。
为准确地位特效生成指令对应的目标对象的位置,在具体实施中,对于待植入AR特效的视频帧,可以从中选取预设数量的像素点,根据所述视频帧的参数数据和所述视频帧对应的真实物理空间参数,确定所述预设数量像素点的空间位置,进而可以确定所述目标对象在所述视频帧中的准确位置。
参照图5和图6,图5所示的视频帧P50展示了一场篮球赛进行过程中的图像,篮球场地中有多名篮球运动员,其中一名篮球运动员正在做出投篮动作。为确定所述视频帧中的目标对象的在所述视频帧中的位置,如图6所示,选取篮球场地的限制区域的四个顶点对应的像素点A、B、C、D,结合真实篮球场地参数,通过一个所述视频帧对应的相机的参数可以完成标定,之后,可以根据虚拟相机的参数,得到相应虚拟相机中的球场三维位置信息,从而可以实现包含所述篮球场地的视频帧的三维空间位置关系的准确标定。
可以理解的是,也可以选取所述视频帧中的其他像素点进行三维标定,以确定所述视频帧中所述特效生成指令对应的目标对象的位置。在具体实施中,为保证图像中具体对象的三维空间位置关系更加精准,优先选取图像中静止事物对应的像素点用以三维标定。选取的像素点可以为一个,也可以为多个。为减小数据运算量,可以优先选择图像中规则物体的轮廓点或顶点用于三维标定。
通过三维标定,可以实现所生成的虚拟的三维的虚拟信息图像,与描述真实世界的多角度自由视角视频本身,在三维空间内的任意位置、任意视角、任意视点在空间位置上的准确融合,从而可以实现虚拟与现实的无缝融合,实现虚拟信息图像和多角度自由视角视频的视频帧在播放过程中的动态同步、和谐统一,因此可以使合成处理所得到的多角度自由视角合成视频帧更加自然逼真,故可以极大地增强用户的视觉体验。
在具体实施中,服务器(如云端服务器)可以自动生成特效生成指令,也可以响应于服务端用户交互操作,生成对应的服务端特效生成交互控制指令。例如,云端服 务器可以通过预设的AI识别算法自动选择待植入AR特效的图像组合作为所述特效生成指令指定的图像组合,并获取所指定的图像组合对应的虚拟信息图像。又如,服务端用户可以通过交互操作指定图像组合,服务器在接收到基于服务端特效生成交互控制操作所触发的服务端特效生成交互控制指令时,可以从所述服务端特效生成交互指令中获取指定的图像组合,进而可以获取与所述特效生成指令指定的图像组合对应的虚拟信息图像。
在具体实施中,可以直接从预设的存储空间获取与所述特效生成指令指定的图像组合对应的虚拟信息图像,也可以根据所述特效生成指令指定的图像组合,即时生成匹配的虚拟信息图像。
为生成所述虚拟信息图像,在具体实施中,可以以所述目标对象为中心,先识别所述视频帧中的目标对象,之后获取所述目标对象的增强现实特效输入数据,然后,将所述增强现实特效输入数据作为输入,采用预设的第一特效生成方式,生成与所述视频帧中所述目标对象匹配的虚拟信息图像。
在本说明书一些实施例中,可以通过图像识别技术识别出所述视频帧中的目标对象,例如识别出特效区域中的目标对象为一个人物(如篮球运动员)、一个物体(如篮球、记分牌)、一个动物(例如猫或狮子)等等。
在具体实施中,可以响应于服务端特效生成交互控制指令,获取所述目标对象的增强现实特效输入数据。例如,服务端用户通过交互操作,选中某一篮球赛直播视频中的球员,则可以相应生成与所述交互操作对应的服务端特效生成交互控制指令,根据所述服务端特效生成交互控制指令,可以获取所述球员关联的增强现实特效输入数据,例如,姓名、篮球比赛中的位置名称(可以为具体号位或者位置类型:如中锋、前锋、后卫等)和投篮命中率等曾现实特效输入数据。
在具体实施中,可以先根据所述服务端特效生成交互控制指令,确定特效输出类型,之后,获取所述目标对象的历史数据,根据所述特效数据类型对所述历史数据进行处理,得到与所述特效输出类型对应的增强现实特效输入数据。例如,对于一场篮球赛直播,根据所述服务端特效生成交互控制指令,获取到服务端用户欲获取所述目标对象所在位置的投篮命中率,则可以计算所述目标对象所在位置距离篮网中心的地面投影位置的距离,获取在所述目标对象在此距离之内的历史投篮数据作为所述目标对象的增强现实特效输入数据。
在具体实施中,服务端用户可以通过相应的交互控制设备进行交互控制操作,基于服务端用户的特效生成交互控制操作,可以得到相应的服务端特效生成交互控制指令。在具体实施中,服务端用户可以通过交互操作,选择待生成特效的目标对象。进一步地,用户还可以选择目标对象的增强现实特效输入数据,例如增强现实特效输入 数据的数据类型、数据范围(可以基于时间或地理空间的进行选择)等。
可以理解的是,所述服务端特效生成交互控制指令也可以为服务端自动产生,服务端可以通过机器学习实现自主决策,选择待植入特效的视频帧的图像组合、目标对象,以及目标对象的增强现实特效输入数据等。
以下通过一些具体实施方式说明如何采用预设的第一特效生成方式,生成与所述视频帧中所述目标对象匹配的虚拟信息图像。
在本说明书一具体实现中,可以将所述增强现实特效输入数据输入至预设的三维模型进行处理,得到与所述视频帧中所述目标对象匹配的虚拟信息图像。
例如,将所述增强现实特效输入数据输入至预设三维模型后,可以获取与所述增强现实特效输入数据匹配的三维图形元素并进行组合,并将所述增强现实特效数据中的显示元数据和所述三维图形元素数据作为与所述视频帧中与所述目标对象匹配的虚拟信息图像进行输出。
其中,所述三维模型可以为对实际物品进行三维扫描得到的三维模型,也可以为构建的虚拟模型,所述虚拟模型可以包括虚拟物品模型和虚拟形象模型,其中,虚拟物品可以是虚拟的魔法棒能现实世界中不存在的物品,虚拟形象造型可以是想象中的人物或动物造型,例如传说中的哪吒的三维模型,虚拟的独角兽、龙等造型的三维模型。
在本说明书另一具体实现中,可以将所述增强现实特效输入数据作为输入数据,输入至预设的机器学习模型进行处理,得到与所述视频帧中所述目标对象匹配的虚拟信息图像。
在具体实施中,所述预设的机器学习模型可以为有监督的学习模型,也可以为无监督的学习模型,或者是半监督学习模型(有监督学习模型和无监督学习模型的结合模型),本说明书实施例中并不限定所采用的具体模型。
采用机器学习模型生成所述虚拟信息图像,包括两个阶段:模型训练阶段和模型应用阶段。
在模型训练阶段,首先可以采用训练样本数据作为输入数据,输入至预设的机器学习模型进行训练,调整所述机器学习模型的参数,在所述机器学习模型训练完成后,可以作为所述预设的机器学习模型。训练样本数据可以包含各种现实物理空间采集到的图像、视频,或者人工建模生成的虚拟的图像或视频等,完成训练后的机器学习模型可以基于输入数据,自动生成相应的三维图像、三维视频、以及对应的音效等。
在模型应用阶段:将所述增强现实特效输入数据作为输入数据,输入至训练完成的机器学习模型,可以自动生成与所述输入数据匹配的增强现实特效模型,也即与所述视频帧中所述目标对象匹配的虚拟信息图像。
在本说明书实施例中,根据所采用的三维模型,或者根据所采用的机器学习模型,生成的虚拟信息图像的形式有所不同。具体地,所生成的虚拟信息图像可以为静态图像,也可以为动画等动态的视频帧,甚至可以为包含音频数据的视频帧。
S46,将所述虚拟信息图像与所述指定的视频帧进行合成处理,得到合成视频帧。
在具体实施中,可以将所述虚拟信息图像和所述指定的视频帧进行融合处理,得到植入了AR特效的融合视频帧。
S47,将所述合成视频帧进行展示。
将所述合成后的合成视频帧插入至播放控制设备的待播放视频流以用于通过播放终端进行播放。
在具体实施中,播放控制设备可以将多路视频流作为输入,其中,视频流可以来自采集阵列中各采集设备,也可以来自其他采集设备。播放控制设备可以根据需要选择一路输入的视频流作为待播放视频流,其中,可以选择前述步骤S46获得的多角度自由视角视频的合成视频帧插入待播放视频流,或者由其他输入接口的视频流切换至包含所述多角度自由视角视频合成视频帧的输入接口,播放控制设备将选择的待播放视频流输出至播放终端,即可通过播放终端进行播放,因此用户除了可以通过播放终端观看到多角度自由视角的视频帧,还可以通过播放终端观看到植入了AR特效的多角度自由视角的合成视频帧。
其中,播放终端可以是电视、手机、平板、电脑等视频播放设备或包含显示屏或投影设备的其他类型的电子设备。
在具体实施中,插入播放控制设备的待播放视频流的多角度自由视角视频合成视频帧可以保留在播放终端中,以便于用户进行时移观看,其中,时移可以是用户观看时进行的暂停、后退、快进到当前时刻等操作。
由上述步骤可知,对于接收到从多路视频流中截取的指定帧时刻的多个同步视频帧所形成的图像组合,通过确定所述图像组合相应的参数数据和所述图像组合中各帧图像的深度数据,一方面,基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频的视频帧;另一方面,响应于特效生成指令,视频帧中的目标对象,获取所述目标对象的增强现实特效输入数据,并基于所述目标对象的增强现实特效输入数据,生成对应的虚拟信息图像,并将所述虚拟信息图像与所述指定的视频帧进行合成处理,得到合成视频帧,之后,将所述合成视频帧插入至播放控制设备的待播放视频流以用于通过播放终端进行播放,可以实现具有AR特效的多角度自由视角的视频。
采用上述数据处理方法,仅从多路同步视频流中截取指定帧时刻的多个同步视频 帧进行多角度自由视角视频的重建,以及生成与特效生成指令指定的视频帧中的目标对象对应的虚拟信息图像,因此无需巨量的同步视频流数据的上传,这一分布式系统架构可以节省大量的传输资源及服务器处理资源,且在网络传输带宽有限的条件下,可以实现具有增强现实特效的合成视频帧实时或近乎实时的生成,故能够实现植入AR特效的多角度自由视角合成视频帧的低时延播放,因而可以兼顾用户视频观看过程中对丰富视觉体验和低时延的双重需求。
在具体实施中,上述各步骤中,多路视频流中同步视频帧的截取,以及基于多个同步视频帧所形成的图像组合生成多角度自由视角视频的视频帧,获取与所述特效生成指令指定的图像组合对应的虚拟信息图像,以及将所述虚拟信息图像和所述指定的图像组合进行合成处理得到合成视频帧等步骤均可以由不同的硬件设备协同完成,也即采用分布式处理架构。
继续参照图4,在步骤S44中,可以根据所述预设的虚拟视点路径中各虚拟视点的虚拟参数数据以及所述图像组合相应的参数数据之间的关系,将所述图像组合中预设的视频帧的深度数据分别映射至相应的虚拟视点;根据分别映射至相应的虚拟视点的预设视频帧的像素数据和深度数据,以及预设的虚拟视点路径,进行帧图像重建,获得相应的多角度自由视角视频的视频帧。
其中,所述虚拟视点的虚拟参数数据可以包括:虚拟观看位置数据和虚拟观看角度数据;所述图像组合相应的参数数据可以包括:采集位置数据和拍摄角度数据等。可以先采用前向映射,进而进行反向映射的方法,得到重建后的视频帧。
在具体实施中,采集位置数据和拍摄角度数据可以称作外部参数数据,参数数据还可以包括内部参数数据,所述内部参数数据可以包括采集设备的属性数据,从而可以更加准确地确定映射关系。例如,内部参数数据可以包括畸变数据,由于考虑到畸变因素,可以从空间上进一步准确地确定映射关系。
接下来,参照附图,通过具体实施例对本说明书实施例中交互式应用场景的方案进行详细阐述。
如图7所示的数据处理方法的流程图,在本说明书一些实施例中,在交互终端,基于用户交互操作,可以采用如下步骤,获得植入AR特效的多角度自由视角视频合成视频帧:
S71,实时进行多角度自由视角视频的视频帧的展示。
在具体实施中,所述多角度自由视角视频的视频帧基于指定帧时刻的多个同步视频帧形成的图像组合的参数数据、所述图像组合的像素数据和深度数据重建得到,所述多个同步视频帧包括不同拍摄视角的帧图像。所述多角度自由视角视频帧的重建方式可以参见前述实施例的介绍,此处不再展开描述。
S72,响应于对所述多角度自由视角视频的视频帧中特效展示标识的触发操作,获取对应于所述特效展示标识的指定帧时刻的视频帧的虚拟信息图像。
S73,将所述虚拟信息图像与对应的视频帧进行合成处理并展示。
在具体实施中,可以基于所述特效展示标识,确定所述虚拟信息图像在所述多角度自由视角视频的视频帧中的叠加位置,之后,可以将所述虚拟信息图像在所确定的叠加位置进行叠加展示。
为使本领域技术人员更好地理解和实施,以下通过一交互终端的图像展示过程进行详细说明。参照图8至图12所示的交互终端的视频播放画面示意图,交互终端T80实时地进行视频的播放,其中,如步骤S71所述,参照图8,展示视频帧P80,接下来,交互终端所展示的视频帧P81中包含特效展示标识I1等多个特效展示标识,视频帧P80中通过指向目标对象的倒三角符号表示,如图9所示。可以理解的是,也可以采用其他的方式展示所述特效展示标识。终端用户触摸点击所述特效展示标识I1,则系统自动获取对应于所述特效展示标识I1的虚拟信息图像,将所述虚拟信息图像叠加展示在多角度自由视角视频的视频帧P81中,如图10所示,以运动员Q1站立的场地位置为中心,渲染出一个立体圆环R1。接下来,如图11及图12所示,终端用户触摸点击多角度自由视角视频的视频帧P81中的特效展示标识I2,系统自动获取对应于所述特效展示标识I2的虚拟信息图像,将所述虚拟信息图像叠加展示在多角度自由视角视频的视频帧P81上,得到多角度自由视角视频叠加视频帧P82,其中展示了命中率信息展示板M0。命中率信息展示板M0上展示了目标对象即运动员Q2的号位、姓名及命中率信息。
如图8至图12所示,终端用户可以继续点击视频帧中展示的其他特效展示标识,观看展示各特效展示标识相应的AR特效的视频。
可以理解的是,可以通过不同类型的特效展示标识区分不同类型的植入特效。
在具体实施中,特效展示标识除了可以在播放画面中进行展示外,还可以在其他地方进行展示,例如对于可展示AR特效的视频帧,可以在播放进度条上相应帧所对应的进度位置设置特效展示标识用于告知终端用户。如图13所示的交互终端的交互界面示意图,交互终端T130展示了播放界面Sr131,以及当前播放的视频帧在整个进度条L131中的位置,由所述进度条L131展示的信息可知,根据当前播放视频帧在整个视频中的位置,进度条L131划分为已播放段L131a和未播放段L131b,此外,在进度条L131上展示有特效展示标识D1~D4,其中,特效展示标识D1位于已播放段L131a,特效展示标识D2即当前视频帧,位于已播放段L131a和未播放段L131b的交界点,特效展示标识D3、D4位于未播放段L131b,终端用户可以通过所述进度条L131上的特效展示标识,即可回退或快进至相应的视频帧,观看植入了AR特效 的多角度自由视角合成视频帧对应的画面。
参照图14所示的数据处理方法的流程图,在本说明书实施例一种交互场景中,为实现在交互终端植入AR特效的多角度自由视角视频合成视频帧的展示,具体可以采用如下步骤进行数据处理:
S141,响应于来自交互终端的图像重建指令,确定交互时刻的交互帧时刻信息,获取对应交互帧时刻的图像组合中预设帧图像的拼接图像及所述图像组合相应的参数数据并发送至所述交互终端,使得所述交互终端基于交互操作所确定的虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据及对应的参数数据,将选择的像素数据和深度数据进行组合渲染,重建得到所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧并进行播放。
在具体实施中,所述预设帧图像的拼接图像基于所述交互帧时刻的图像组合的像素数据和深度数据生成,所述拼接图像包括第一字段和第二字段,所述其中,所述第一字段包括所述图像组合中所述预设帧图像的像素数据,所述第二字段包括所述图像组合的深度数据。
在具体实施中,所述交互帧时刻的图像组合基于从多路同步视频流中截取指定帧时刻的多个同步视频帧得到,所述多个同步视频帧包含不同拍摄视角的帧图像。
S142,响应于特效生成交互控制指令,获取所述特效生成交互控制指令指示的预设视频帧的拼接图像对应的虚拟信息图像。
在本说明书一些实施例中,可以响应于特效生成交互控制指令,读取所述特效生成交互控制指令指示的预设视频帧中的目标对象;基于所述目标对象,获取预先基于所述目标对象的增强现实特效输入数据所生成的虚拟信息图像。
在具体实施中,可以采用多种方式生成与所述目标对象匹配的虚拟信息图像,以下给出两种可实现示例:
示例一,将所述目标对象的增强现实特效数据作为输入数据,输入至预设的三维模型进行处理,得到与所述目标对象匹配的虚拟信息图像;
示例二,将所述目标对象的增强现实特效数据作为输入数据,输入至预设的机器学习模型进行处理,得到与所述目标对象匹配的虚拟信息图像。
上述两种示例的具体实现示例可以参见前述实施例。
S143,将所述预设视频帧的拼接图像对应的虚拟信息图像发送至所述交互终端,使得所述交互终端将在所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧与所述虚拟信息图像进行合成处理,得到合成视频帧并展示。
为使本领域技术人员更好地理解及实施本说明书实施例,以下提供一种适用于交互场景的数据处理系统。
参照图15,在本说明书一些实施例中,数据处理系统150可以包括服务器151和交互终端152,其中:
所述服务器151可以响应于来自交互终端152的图像重建指令,确定交互时刻的交互帧时刻信息,获取对应交互帧时刻的图像组合中预设视频帧的拼接图像及所述图像组合相应的参数数据并发送至所述交互终端152,以及响应于特效生成交互控制指令,生成所述特效生成交互控制指令指示的预设视频帧的拼接图像对应的虚拟信息图像;
所述交互终端152,基于交互操作所确定的虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据及对应的参数数据,将选择的像素数据和深度数据进行组合渲染,重建得到所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的图像并进行播放;以及将在所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧与所述虚拟信息图像进行合成处理,得到合成视频帧并进行播放。
在具体实施中,所述服务器151可以存储与所述预设帧图像的拼接图像对应的虚拟信息图像,或者基于所述预设帧图像的拼接图像的增强现实特效输入数据,从第三方获取所述预设帧图像的拼接图像对应的虚拟信息图像,或者即时生成所述预设帧图像的拼接图像对应的虚拟信息图像。
在具体实施中,所述交互帧时刻的图像组合基于从多路同步视频流中截取指定帧时刻的多个同步视频帧得到,所述多个同步视频帧包含不同拍摄视角的帧图像。
所述数据处理系统还可以包括数据处理设备153。如前实施例所述,数据处理设备153可以对现场采集区域中采集阵列采集到的视频帧进行视频帧截取。通过对待生成多角度自由视角视频的视频帧进行截取,可以避免大量的数据传输及数据处理。现场采集阵列中的采集设备可以同步采集不同拍摄视角的帧图像,所述数据处理设备可以从多路同步视频流中截取指定帧时刻的多个同步视频帧。
之后,所述数据处理设备153可以将截取得到的帧图像上传至所述服务器151。所述服务器151可以存储预设视频帧的图像组合的拼接图像和所述图像组合的参数数据。
在具体实施中,所述适用于非交互场景中的数据处理系统和适用于交互场景的数据处理系统可以融合。
继续参照图3,作为一具体示例,所述服务器32除了可以得到多角度自由视角视频的视频帧和所述虚拟信息图像外,对于指定帧时刻的多个同步视频帧所形成的图像组合,为了后续能够方便获取数据,所述服务器32可以基于所述图像组合的像素数据及深度数据,生成所述图像组合相应的拼接图像,所述拼接图像可以包括第一字 段和第二字段,其中,所述第一字段包括所述图像组合的像素数据,所述第二字段包括所述图像组合的深度数据,然后,存储所述图像组合相应的拼接图像及所述图像组合相应的参数数据。
为了节约存储空间,可以基于所述图像组合中预设视频帧的像素数据及深度数据,生成所述图像组合中预设视频帧相应的拼接图像,所述预设视频帧相应的拼接图像可以包括第一字段和第二字段,其中,所述第一字段包括所述预设视频帧的像素数据,所述第二字段包括所述预设视频帧的深度数据,然后,仅存储所述预设视频帧相应的拼接图像及相应的参数数据即可。
其中,所述第一字段与所述第二字段相对应,所述拼接图像可以分为图像区域以及深度图区域,图像区域的像素字段存储所述多个帧图像的像素数据,深度图区域的像素字段存储所述多个帧图像的深度数据;所述图像区域中存储帧图像的像素数据的像素字段作为所述第一字段,所述深度图区域中存储帧图像的深度数据的像素字段作为所述第二字段;获取的图像组合的拼接图像和所述图像组合相应的参数数据可以存入数据文件中,当需要获取拼接图像或相应的参数数据时,可以根据数据文件的头文件中包含的存储地址,从相应的存储空间中读取。
此外,图像组合的存储格式可以为视频格式,图像组合的数量可以是多个,每个图像组合可以是对视频进行解封装和解码后,对应不同帧时刻的图像组合。
在具体实施中,用户除了可以通过播放终端观看多角度自由视角视频,为进一步提高交互体验,还可以在观看视频过程中通过交互操作,主动选择播放多角度自由视角视频。在本说明书一些实施例中,采用如下方式实施:
响应于来自交互终端的图像重建指令,确定交互时刻的交互帧时刻信息,获取对应交互帧时刻的图像组合中预设视频帧的拼接图像及所述图像组合相应的参数数据并发送至所述交互终端,使得所述交互终端基于交互操作所确定的虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据及对应的参数数据,将选择的像素数据和深度数据进行组合渲染,重建得到所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧并进行播放。
其中,所述预设规则可以根据具体情景来设定,比如,可以基于交互操作确定的虚拟视点位置信息,选择按距离排序最靠近交互时刻的虚拟视点的W个临近的虚拟视点的位置信息,并在拼接图像中获取包括交互时刻的虚拟视点的上述共W+1个虚拟视点对应的满足交互帧时刻信息的像素数据和深度数据。
其中,所述交互帧时刻信息基于来自交互终端的触发操作确定,所述触发操作可以是交互终端用户输入的触发操作,也可以是交互终端自动生成的触发操作。例如,交互终端在检测到存在多角度自由视点数据帧的标识时可以自动发起触发操作。在用 户手动触发时,可以是交互终端显示交互提示信息后用户选择触发交互的时刻信息,也可以是交互终端接收到用户操作触发交互的历史时刻信息,所述历史时刻信息可以为位于当前播放时刻之前的时刻信息。
在具体实施中,所述交互终端35可以基于获取的交互帧时刻的图像组合中预设视频帧的拼接图像及对应的参数数据,交互帧时刻信息以及交互帧时刻的虚拟视点位置信息,采用与上述步骤S44相同的方法对获取的交互帧时刻的图像组合中预设视频帧的拼接图像的像素数据和深度数据进行组合渲染,获得所述交互的虚拟视点位置对应的多角度自由视角视频的视频帧,并在所述交互的虚拟视点位置开始播放多角度自由视角视频。
采用上述方案,可以基于来自交互终端的图像重建指令即时生成交互的虚拟视点位置对应的多角度自由视角视频的视频帧,可以进一步提升用户互动体验。
在具体实施中,交互终端与播放终端可以为同一终端设备。
在具体实施中,为方便后续获取数据,可以响应于服务端特效生成交互控制指令,生成所述服务端特效生成交互控制指令指示的预设帧图像的拼接图像对应的虚拟信息图像并存储。
之后,在所述预设帧图像的拼接图像对应的多角度自由视角视频播放过程中,可以在所述预设帧图像的拼接图像叠加渲染所述虚拟信息图像,得到植入了AR特效的多角度自由视角视频叠加视频帧,具体可以在所述多角度自由视角视频录播或者点播等场景中实现,可以根据预先设置触发所述虚拟信息图像的植入或者根据用户交互操作触发所述虚拟信息图像的植入。
以用户交互场景为例,在用户观看多角度自由视角视频过程中,为进一步提升用户视觉体验的丰富性,可以在多角度自由视角视频中植入AR特效。在本说明书一些实施例中,可以采用如下方式实施:
在接收到所述图像重建指令后,还可以响应于来自交互终端的用户端特效生成交互指令,获取所述预设视频帧的拼接图像对应的虚拟信息图像,并将所述预设视频帧的拼接图像对应的虚拟信息图像发送至所述交互终端,使得所述交互终端在所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧上叠加渲染所述虚拟信息图像,得到植入AR特效的多角度自由视角叠加视频帧并进行播放。
作为一具体示例,用户在视频观看过程中,若用户的第一交互操作触发了多角度自由视角视频的播放,在播放过程中,基于用户的第二交互操作对应的用户端特效生成交互指令,可以获取所述预设帧图像的拼接图像对应的虚拟信息图像,也即待植入所述预设视频帧的多角度自由视角视频的AR特效图像。其中,所述预设视频帧可以为用户的第二交互操作所指示的视频帧,例如可以为用户所点击的帧图像,或者为用 户滑动操作所对应的帧序列。
在具体实施中,响应于用户端特效退出交互指令,可以停止获取所述预设帧图像的拼接图像对应的虚拟信息图像,相应地,在交互终端渲染过程中无须叠加所述虚拟信息图像,仅播放多角度自由视角视频。
继续以上示例,若在植入了AR特效数据的多角度自由视角叠加视频帧播放过程中,基于用户的第三交互操作对应的用户端特效退出交互指令,停止后续视频帧的拼接图像对应的虚拟信息图像的获取及渲染展示。
在具体实施中,作为连续的视频流,可能是部分视频流包含多角度自由视角视频数据,在其中一个或多个多角度自由视角视频序列中,其中一个或多个序列对应有所述虚拟信息图像,则可以在检测到所述用户端特效退出交互指令时,退出所述视频流中后续所有AR特效的植入,也可以仅退出一个多角度自由视频序列中后续AR特效的展示。
如生成前述虚拟信息图像的生成方式类似,可以基于服务端的特效生成指令,生成虚拟信息图像。在具体实施中,可以由服务器(如云端服务器)自动生成特效生成指令,也可以响应于服务端用户交互操作,生成对应的服务端特效生成交互控制指令。
同样地,为生成所述虚拟信息图像,首先确定所述虚拟信息图像对应的预设帧的拼接图像,其次,生成所述预设帧的拼接图像匹配的虚拟信息图像。
对于如何确定所述虚拟信息图像对应的预设视频帧的拼接图像,在具体实施中可以有多种方式。例如,云端服务器可以通过预设的AI识别算法自动选择的预设视频帧的拼接图像作为待植入AR特效数据的拼接图像。又如,服务端用户可以通过交互操作指定预设视频帧的拼接图像,服务器在接收到基于服务端特效生成交互控制操作所触发的服务端特效生成交互控制指令时,可以从所述服务端特效生成交互指令中获取指定的预设视频帧的拼接图像,进而可以生成与所述特效生成指令指定的预设视频帧的拼接图像对应的虚拟信息图像。
在本说明书一些实施例中,可以通过图像识别技术识别出所述视频帧中的对象作为与待植入AR特效匹配的目标对象,例如识别出目标对象为一个人物(如篮球运动员)、一个物体(如篮球、记分牌)、一个动物(例如猫或狮子)等等。
在具体实施中,可以响应于服务端特效生成交互控制指令,获取所述目标对象的增强现实特效输入数据。例如,服务端用户通过交互操作,选中某一篮球赛直播视频中的球员,则可以相应生成与所述交互操作对应的服务端特效生成交互控制指令,根据所述服务端特效生成交互控制指令,可以获取所述运动员数据和进球数据等,其中运动员数据可以包括球员关联的基本数据,例如,姓名、篮球比赛中的位置名称(具体号位,或者为中锋、前锋、后卫等位置名称)、和进球数据可以包括投篮命中率等 均可以作为增强现实特效输入数据。
在具体实施中,可以先根据所述服务端特效生成交互控制指令,先确定特效输出类型,之后,获取所述目标对象的历史数据,根据所述特效数据类型对所述历史数据进行处理,得到与所述特效输出类型对应的增强现实特效输入数据。
例如,对于一场篮球赛直播,根据所述服务端特效生成交互控制指令,获取到服务端用户欲获取所述目标对象在所述特效区域内所在位置的投篮命中率,则可以计算所述目标对象所在位置距离篮网中心的地面投影位置的距离,获取在所述目标对象在此距离之内的历史投篮数据作为所述目标对象的增强现实特效输入数据。
对于虚拟信息图像的特效生成方式,可以根据需要进行选择和设置。在本说明书一具体实现中,可以将所述增强现实特效输入数据作为输入数据,输入至预设的三维模型进行处理,得到与所述预设视频帧的拼接图像中所述目标对象匹配的虚拟信息图像。
例如,将所述增强现实特效输入数据作为输入数据,输入至预设三维模型后,可以获取与所述输入数据匹配的三维图形元素并进行组合,并将输入数据中的显示元数据和所述三维图形元素数据作为与所述视频帧中所述目标对象匹配的虚拟信息图像进行输出。所述三维模型的具体实现可以参见前述实施例。
在本说明书另一具体实现中,可以将所述增强现实特效输入数据作为输入数据,输入至预设的机器学习模型进行处理,得到与所述视频帧中所述目标对象匹配的虚拟信息图像。在具体实施中,所述预设的机器学习模型可以为有监督的学习模型,也可以为无监督的学习模型,或者是半监督学习模型(有监督学习模型和无监督学习模型的结合模型),本说明书实施例中并不限定所采用的具体模型。采用机器学习模型生成所述虚拟信息图像的具体方式可以参见前述实施例,此处不再赘述。
在本说明书实施例中,生成的虚拟信息图像可以为静态图像、动态图像,或者为包含音频特效的动态图像,其中,动态图像或包含音频特效的动态图像可以基于目标对象与一个或多个视频帧匹配。
在具体实施中,服务器也可以直接将所述用于直播或准直播过程中得到的虚拟信息图像进行保存,作为所述用户交互过程中通过交互终端获取的虚拟信息图像。
需要说明的是,在本说明书实施例中,在播放终端展示的合成视频帧与在交互终端展示的合成视频帧并无本质的不同。二者实际可以采用相同的虚拟信息图像,也采用不同的虚拟信息图像。相应地,对应的特效生成方式可以相同,也可以不同,类似地,在特效生成过程中所采用的三维模型或者机器学习模型可以为同一模型,或者为同一种模型,也可以为完全不同的模型。
此外,所述播放终端和所述交互终端也可以为同一终端设备,即用户可以直接通 过所述终端设备直播或准直播的多角度自由视角视频,其中可以自动播放植入了AR特效的多角度自由视角合成视频帧;用户也可以通过所述终端设备进行互动,基于用户的互动操作进行多角度自由视角视频数据的播放,以及植入了AR特效的多角度自由视角合成视频帧的播放。用户通过互动,可以在录播、转播、点播视频中自主选择观看哪些目标对象的AR特效,即虚拟信息图像。
通过以上实施例的数据处理方法可以实现植入AR特效的多角度自由视角视频的低时延播放,为使本领域技术人员更好地理解和实施本说明书实施例,以下对可以实现上述方法的系统及关键设备进行对应介绍。
在本说明书一些实施例中,参照图16所示的数据处理系统的结构示意图,数据处理系统160可以包括:目标对象获取单元161、虚拟信息图像获取单元162和图像合成单元163和展示单元164,其中:
所述目标对象获取单元161,适于获取多角度自由视角视频的视频帧中目标对象;
所述虚拟信息图像获取单元162,适于获取基于所述目标对象的增强现实特效输入数据所生成的虚拟信息图像;
所述图像合成单元163,适于将将所述虚拟信息图像与对应的视频帧进行合成处理,得到合成视频帧;
所述展示单元164,适于展示得到的合成视频帧。
在具体实施中,各单元可能分布在不同的设备中,也可能部分单元位于同一设备中,基于具体应用场景的不同,实现的方案有所不同。
本领域技术人员可以理解,各单元可以通过相应的硬件或硬件,或者软硬件结合的方式实现,例如可以通过处理器(具体可以为CPU或FPGA等)作为目标对象获取单元161、虚拟信息图像获取单元162和图像合成单元163等,可以通过显示器作为展示单元164。
以下通过一些具体的应用场景进行说明。
参照图3所示的数据处理系统的结构示意图,在本发明实施例中,如图3所示,数据处理系统30可以包括:数据处理设备31、服务器32、播放控制设备33以及播放终端34,其中:
所述数据处理设备31,适于基于视频帧截取指令,从所述现场采集区域不同位置实时同步采集的多路视频数据流中对指定帧时刻的视频帧截取得到多个同步视频帧,将获得的所述指定帧时刻的多个同步视频帧上传至所述服务器12;
所述服务器32,适于接收所述数据处理设备31上传的多个同步视频帧作为图像组合,确定所述图像组合相应的参数数据以及所述图像组合中各帧图像的深度数据, 并基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频的视频帧;以及响应于特效生成指令,获取与所述特效生成指令指定的视频帧中的目标对象,获取所述目标对象的增强现实特效输入数据,并基于所述目标对象的增强现实特效输入数据,生成对应的虚拟信息图像,将所述虚拟信息图像与所述指定的视频帧进行合成处理,得到合成视频帧,并将所述合成视频帧输入至所述播放控制设备34;
所述播放控制设备33,适于将所述合成视频帧据插入待播放视频流;
所述播放终端34,适于接收来自所述播放控制设备33的待播放视频流并进行实时播放。
在具体实施中,播放控制终端33可以基于控制指令输出待播放视频流。
作为可选示例,播放控制设备33可以从多路数据流中选择一路作为待播放视频流,或者在多路视频流中不断地切换选择以持续地输出所述待播放视频流。导播控制设备可以作为本发明实施例中的一种播放控制设备。其中导播控制设备可以为基于外部输入控制指令进行播放控制的人工或半人工导播控制设备,也可以为基于人工智能或大数据学习或预设算法能够自动进行导播控制的虚拟导播控制设备。
采用上述数据处理系统,由于仅从多路同步视频流中截取指定帧时刻的同步视频帧进行多角度自由视角视频的重建,以及生成与特效生成指令指定的图像组合对应的虚拟信息图像,因此无需巨量的同步视频流数据的上传,这一分布式系统架构可以节省大量的传输资源及服务器处理资源,且在网络传输带宽有限的条件下,可以实现具有增强现实特效的多角度自由视角合成视频帧可以实时生成,故能够实现多角度自由视角增强现实特效的视频的低时延播放,因而可以兼顾用户视频观看过程中对丰富视觉体验和低时延的双重需求。
并且,由数据处理设备31进行同步视频帧的截取,由服务器进行多角度自由视角视频的重建、虚拟信息图像的获取,以及多角度自由视角视频和虚拟信息图像的合成处理(如融合处理),由播放控制设备进行待播放视频流的选择,由播放设备进行播放,这一分布式系统架构可以避免同一设备进行大量的数据处理,因此可以提高数据处理效率,减小传输时延。
在具体实施中,所述服务器32可以通过多个服务器组成的服务器集群完成,其中,所述服务器集群可以包括多个同构或异构的服务器单体设备或服务器集群组成。若采用异构服务器集群,可以根据待处理的不同数据特点配置异构服务器集群中的各服务器设备。
参照图17所示的服务器集群架构示意图,在本说明书一实施例中,所采用的异构服务器集群170由三维深度重建服务集群171以及云端增强现实特效生成和渲染服 务器集群172组成,其中:
所述三维深度重建服务集群171,适于基于从多路同步视频流中截取的多个同步视频帧,重建得到相应的多角度自由视角视频;
所述云端增强现实特效生成和渲染服务器集群172,适于响应于特效生成指令,得到与所述特效生成指令指定的图像组合对应的虚拟信息图像,并将所述指定的图像组合与所述虚拟信息图像进行融合处理,得到多角度自由视角融合视频帧。
其中,基于处理数据以及具体数据的处理机制不同,所述三维深度重建服务集群171和云端增强现实特效生成和渲染服务器集群172可以分别包括多个服务器子集群或者服务器组,不同的服务器集群或者服务器组分别执行不同的功能,一起协同完成多角度自由视频帧的重建。
在具体实施中,所述异构服务器集群170还可以包括增强现实特效输入数据存储数据库173,适于存储与指定的图像组合中的目标对象匹配的增强现实特效输入数据。
在本说明书一实施例中,由云端服务器集群组成的云端服务系统基于上传的多个同步视频帧得到所述第一多角度自由视角融合视频帧,所述云端服务系统采用异构服务器集群。以下仍以图1所示的一个具体应用场景示例如何进行实施。
参照图1所示的数据处理系统的结构示意图,针对一场篮球赛的数据处理系统的布置场景,所述数据处理系统10包括:由多个采集设备组成的采集阵列11、数据处理设备12、云端的服务器集群13、播放控制设备14和播放终端15。
参照图1,以左侧的篮球框作为核心看点,以核心看点为圆心,与核心看点位于同一平面的扇形区域作为预设的多角度自由视角范围。所述采集阵列11中各采集设备可以根据所述预设的多角度自由视角范围,成扇形置于现场采集区域不同位置,可以分别从相应角度实时同步采集视频流。
在具体实施中,采集阵列11中的采集设备还可以设置在篮球场馆的顶棚区域、篮球架上等。各采集设备可以沿直线、扇形、弧线、圆形或者不规则形状排列分布。具体排列方式可以根据具体的现场环境、采集设备数量、采集设备的特点、成像效果需求等一种或多种因素进行设置。所述采集设备可以是任何具有摄像功能的设备,例如,普通的摄像机、手机、专业摄像机等。
而为了不影响采集设备工作,所述数据处理设备12可以置于现场非采集区域,可视为现场服务器。所述数据处理设备12可以通过无线局域网向所述采集阵列11中各采集设备分别发送拉流指令,所述采集阵列11中各采集设备基于所述数据处理设备12发送的拉流指令,将获得的视频数据流实时传输至所述数据处理设备12。其中,所述采集阵列11中各采集设备可以通过交换机17将获得的视频流实时传输至所述数 据处理设备12。
当所述数据处理设备12接收到视频帧截取指令时,从接收到的多路视频数据流中对指定帧时刻的视频帧截取得到多个同步视频帧,并将获得的所述指定帧时刻的多个同步视频帧上传至云端的服务器集群13。
相应地,云端的服务器集群13将接收的多个同步视频帧作为图像组合,确定所述图像组合相应的参数数据及所述图像组合中各帧图像的深度数据,并基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频的图像的数据;以及响应于特效生成指令,获取与所述特效生成指令指定的图像组合对应的虚拟信息图像,将所述指定的图像组合与所述虚拟信息图像进行融合处理,得到多角度自由视角融合视频帧。
服务器可以置于云端,并且为了能够更快速地并行处理数据,可以按照处理数据的不同,由多个不同的服务器或服务器组组成云端的服务器集群13。
例如,所述云端的服务器集群13可以包括:第一云端服务器131,第二云端服务器132,第三云端服务器133,第四云端服务器134和第五云端服务器135。
其中,第一云端服务器131可以用于确定所述图像组合相应的参数数据;第二云端服务器132可以用于确定所述图像组合中各帧图像的深度数据;第三云端服务器133可以基于所述图像组合相应的参数数据、所述图像组合的像素数据和深度数据,使用基于深度图的虚拟视点重建(Depth Image Based Rendering,DIBR)算法,对预设的虚拟视点路径进行帧图像重建;所述第四云端服务器134可以用于生成多角度自由视角视频;第五云端服务器135可以用于响应于特效生成指令,获取与所述特效生成指令指定的图像组合对应的虚拟信息图像,并将所述图像组合与所述虚拟信息图像进行融合处理,得到多角度自由视角融合视频帧。
可以理解的是,所述第一云端服务器131、第二云端服务器132、第三云端服务器133、第四云端服务器134以及第五云端服务器135也可以为服务器阵列或服务器子集群组成的服务器组,本发明实施例不做限制。
基于处理数据以及具体数据的处理机制不同,各云端服务器或者云端服务器集群可以采用不同硬件配置的设备,例如,对于所述第四云端服务器134、第五云端服务器135等需要处理大量图像的设备,可以采用包括图形处理器(Graphics Processing Unit,GPU)或GPU组的设备。
在本说明书一些实施例中,GPU可以采用统一设备体系结构(Compute Unified Device Architecture,CUDA)并行编程架构对选择的图像组合中相应组的纹理图和深度图中的像素点进行组合渲染。CUDA是一种新的硬件和软件体系结构,用于将GPU 上的计算作为数据并行计算设备进行分配和管理,而无须将它们映射至图形应用程序编程接口(Application Programming Interface,API)。
通过CUDA编程时,GPU可以被视为能够并行执行大量线程的计算设备。它作为主中央处理器(Central Processing Unit,CPU)或者主机的协处理器运行,换言之,在主机上运行的应用程序中的数据并行、计算密集型的部分被下放到GPU上。
在具体实施中,所述服务器云端的服务器集群13可以采用如下方式存储所述图像组合的像素数据及深度数据:
基于所述图像组合的像素数据及深度数据,生成对应帧时刻的拼接图像,所述拼接图像包括第一字段和第二字段,其中,所述第一字段包括所述图像组合中预设帧图像的像素数据,所述第二字段包括所述图像组合中预设帧图像的深度数据的第二字段;以及存储所述图像组合的拼接图像及所述图像组合相应的参数数据。获取的拼接图像和相应的参数数据可以存入数据文件中,当需要获取拼接图像或参数数据时,可以根据数据文件的头文件中相应的存储地址,从相应的存储空间中读取。
然后,播放控制设备14可以将接收到的所述多角度自由视角视频融合视频帧的数据插入待播放视频流中,播放终端15接收来自所述播放控制设备14的待播放视频流并进行实时播放。其中,播放控制设备14可以为人工播放控制设备,也可以为虚拟播放控制设备。在具体实施中,可以设置专门的可以自动切换视频流的服务器作为虚拟播放控制设备进行数据源的控制。导播控制设备如导播台可以作为本发明实施例中的一种播放控制设备。
可以理解的是,所述数据处理设备12可以根据具体情景置于现场非采集区域或云端,所述服务器(集群)和播放控制设备可以根据具体情景置于现场非采集区域,云端或者终端接入侧,上述实施例并不用于限制本发明的具体实现和保护范围。
本说明书实施例中所采用的数据处理系统,除了可以实现直播、准直播等低时延场景的多角度自由视角视频的播放外,还可以基于用户交互操作,实现录播、转播等场景的多角度自由视角视频的播放。
继续参照图3,在具体实施中,数据处理系统30还可以包括交互终端35,服务器32可以响应于来自交互终端35的图像重建指令,确定交互时刻的交互帧时刻信息,将存储的对应交互帧时刻的相应的图像组合预设帧图像的拼接图像及相应图像组合对应的参数数据发送至所述交互终端35。
所述交互终端35基于交互操作,向服务器发送所述图像重建指令,并基于交互操作所确定的虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据及对应的参数数据,将选择的像素数据和深度数据与所述参数数据进行组合渲染,重建得到所述待交互的虚拟视点位置对应的多角度自由视角视频的视频帧并 进行播放。
其中,所述预设规则可以根据具体情景来设定,具体可以参见前述方法实施例中的介绍。
此外,所述交互帧时刻信息可以基于来自交互终端35的触发操作确定,所述触发操作可以是用户输入的触发操作,也可以是交互终端自动生成的触发操作,例如,交互终端在检测到存在多角度自由视点数据帧的标识时可以自动发起触发操作。在用户手动触发时,可以是交互终端显示交互提示信息后用户选择触发交互的时刻信息,也可以是交互终端接收到用户操作触发交互的历史时刻信息,所述历史时刻信息可以为位于当前播放时刻之前的时刻信息。
在具体实施中,所述交互终端35可以基于获取的交互帧时刻的图像组合中预设帧图像的拼接图像及对应的参数数据,交互帧时刻信息以及交互帧时刻的虚拟视点位置信息,采用与上述步骤S44相同的方法对获取的交互帧时刻的图像组合中预设帧图像的拼接图像的像素数据和深度数据进行组合渲染,获得所述交互的虚拟视点位置对应的多角度自由视角视频的图像,并在所述交互的虚拟视点位置开始播放多角度自由视角视频。
采用上述方案,可以基于来自交互终端的图像重建指令即时生成交互的虚拟视点位置对应的多角度自由视角视频,可以进一步提升用户互动体验。
在本说明书一些数据处理系统中,继续参照图3,所述服务器32还可以根据服务端特效生成交互控制指令,生成所述服务端特效生成交互控制指令指示的预设视频帧的拼接图像对应的虚拟信息图像并存储。通过以上方案,通过预先生成预设帧图像的拼接图像对应的虚拟信息图像,后续在有播放需求时,可以直接进行渲染播放,从而可以减小时间延迟,进一步增强用户的互动体验,并提升用户视觉体验。
就具体应用场景而言,数据处理系统除了可以应用于直播、准直播场景中实现低时延的具有AR特效的多角度自由视角视频的播放外,还可以根据用户的互动操作,实现录播、转播等任意视频播放场景下的具有AR特效的多角度自由视角视频的播放。作为一种实现示例,用户可以通过交互终端与服务器进行交互,获取预设视频帧的拼接图像对应的虚拟信息图像并在交互终端进行渲染,从而实现具有AR特效的多角度自由视角合成视频帧的播放。以下通过一些应用场景进行详细描述。
基于参照图3,所述服务器32还适于响应于来自交互终端的用户端特效生成交互指令,获取所述预设视频帧的拼接图像对应的虚拟信息图像,将所述预设视频帧的拼接图像对应的虚拟信息图像发送至所述交互终端35。
所述交互终端35,适于将所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的图像的视频帧与所述虚拟信息图像,得到合成视频帧并进行播放。
服务器具体获取和生成所述虚拟信息图像和虚拟信息图像的方法可以参见前述方法实施例,此处不再详述。
为使本领域技术人员更好地理解和实现,以下首先通过具体应用场景介绍本本说明书实施例中的播放终端展示的视频效果示意图。
参照图18至图20所示的播放终端的显示界面的视频效果示意图,设如图18所示播放终端T1的播放界面Sr1展示的为第T-1帧视频帧,可以看到从运动员右侧视角观看到的运动员正在向终点冲刺的画面。假设数据处理设备截取了第视频流中第T帧至第T+1帧的多个同步视频帧并上传到服务器,服务器基于接收到的第T~T+1帧的同步视频帧作为图像组合,一方面,服务器基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频的视频帧;另一方面,响应于服务端用户的特效生成指令,获取与所述特效生成指令指定的图像组合对应的虚拟信息图像。之后,在所述指定的图像组合叠加渲染所述虚拟信息图像,得到第T~T+1帧对应的多角度自由视角融合视频帧在播放终端T1展示效果依次如图19和图20所示,其中,图19中播放界面Sr2展示的为第T帧视频帧的效果图,视角切换至运动员的正面,且由画面可以看出,其在现实的图像之上,植入了AR特效图像,其中展示了运动员正在向终点冲刺的真实画面,以及植入的AR特效图像,包括运动员的基本信息板M1和与运动员脚步匹配的两个虚拟生成的脚印M2,为区分AR特效对应的虚拟信息图像与多角度自由视角视频帧对应的真实图像,图19和图20中用实线表示真实图像,虚线表示AR特效对应的虚拟信息图像,由基本信息板M1可以看到运动员的姓名、国籍、参赛号码、历史最好成绩等信息。图20示出的为第T+1帧视频帧的效果图,视角又进一步切换至运动员左侧,由播放界面Sr3展示的画面可知,运动员已冲过终点线,基本信息板M1包含的具体信息随着时间推移可以实时更新,由图19可知,添加了运动员的本次成绩,脚印M2的位置和形状跟随运动员脚步变化,并增加了运动员获得第一名的图案标识M3。
本说明书实施例中的播放终端具体可以是电视、电脑、手机、车载设备、投影设备等其中任意一种或多种类型的终端设备。
为使本领域技术人员更好地理解和实现本发明实施例中交互终端的运行原理,以下参照附图,通过具体应用场景进行详细介绍。
参照图21所示的交互终端的结构示意图,在本说明书一些实施例中,如图21所示,交互终端210可以包括第一展示单元211、虚拟信息图像获取单元212和第二展示单元213,其中:
所述第一展示单元211,适于实时进行多角度自由视角视频的图像的展示,其中, 所述多角度自由视角视频的图像是通过指定帧时刻的多个同步视频帧图像形成的图像组合的参数数据、所述图像组合的像素数据和深度数据重建得到,所述多个同步视频帧包括不同拍摄视角的帧图像;
所述虚拟信息图像获取单元212,适于响应于对所述多角度自由视角视频图像中特效展示标识的触发操作,获取对应于所述特效展示标识的指定帧时刻的虚拟信息图像;
所述第二展示单元213,适于将所述虚拟信息图像叠加展示在所述多角度自由视角视频的视频帧上。
采用上述交互终端,终端用户可以通过互动交互,观看植入AR特效的多角度自由视角视频图像,可以丰富用户视觉体验。
参照图22示出的另一种交互终端的结构示意图,在本说明书另一些实施例中,交互终端220可以包括:
视频流获取单元221,适于从播放控制设备实时获取待播放视频流,所述待播放数据流包括视频数据及互动标识,所述互动标识与所述待播放数据流的指定帧时刻关联;
播放展示单元222,适于实时播放展示所述待播放视频流的视频及互动标识;
交互数据获取单元223,适于响应于对所述互动标识的触发操作,获取对应于所述指定帧时刻的交互数据,所述交互数据包括多角度自由视角视频帧和所述预设视频帧的拼接图像对应的虚拟信息图像;
交互展示单元224,适于基于所述交互数据,进行所述指定帧时刻的多角度自由视角的合成视频帧的展示;
切换单元225,适于在检测到交互结束信号时,触发切换至由所述视频流获取单元221从所述播放控制设备实时获取的待播放视频流并由所述播放展示单元222进行实时播放展示。
其中,所述交互数据可以由服务器生成并传输给交互终端,也可以由交互终端生成。
交互终端在播放视频的过程中,可以从播放控制设备实时获取待播放数据流,在相应的帧时刻的时候,可以显示相应的互动标识。例如可以在进度条上展示互动标识,又例如,可以直接在显示画面上展示互动标识。
参照图3和图23,交互终端T2的显示界面Sr20上展示互动标识V1,当用户未选择触发时,交互终端T2可以继续读取后续视频数据。当用户按照互动标识V1的箭头指示方向滑动选择触发时,交互终端T2接收到反馈后生成相应互动标识的指定帧时刻的图像重建指令,并发送至所述服务器32。
例如,当用户选择触发当前展示的互动标识V1时,交互终端T2接收到反馈后生成互动标识V1相应指定帧时刻Ti~Ti+2的图像重建指令,并发送至所述服务器32。所述服务器32根据图像重建指令可以发送指定帧时刻Ti~Ti+1相应的多个帧图像。
并且,在播放至Ti+1帧时刻,如图24所示,显示界面Sr20展示出互动标识Ir。当用户点击互动标识Ir后,所述交互终端T2可以向服务器获取对应的虚拟信息图像。
之后,可以在交互终端T2展示Ti+2帧时刻对应的多角度自由视角融合图像,如图25和图26所示的交互终端的交互界面的视频效果示意图,其中,图25中交互界面Sr20为第Ti+1帧图像植入AR之后的效果图,视角切换至运动员的正面,且由画面可以看出,其在现实的图像之上,植入了AR特效对应的虚拟信息图像,交互界面Sr20中展示的第Ti+1帧的图像中运动员正在向终点冲刺的真实画面,以及虚拟信息图像,包括运动员的基本信息板M4和与运动员脚步匹配的脚印M5,为区分AR特效与真实图像,图25和图26中用实线标识真实图像,虚线表示虚拟信息图像,由基本信息板M4可以看到运动员的姓名、国籍、参赛号码、历史最好成绩等信息。图26示出的为第Ti+2帧视频帧的效果图,视角又进一步切换至运动员左侧,由画面可知,运动员已冲过终点线,基本信息板M4包含的具体信息随着时间推移可以实时更新,由图26可知,添加了运动员的本次成绩,脚印M5的位置和形状跟随运动员脚步变化,并增加了运动员获得第一名的图案标识M6。
交互终端T2可以基于所述多个视频帧,生成用于进行交互的交互数据,并可以采用图像重建算法对所述交互数据的多角度自由视角数据进行图像处理,以及从服务器获取虚拟信息图像,然后进行所述指定帧时刻的多角度自由视角的视频的播放,以及所述指定帧植入AR特效的多角度自由视角合成视频帧的播放。
在具体实施中,本发明实施例的交互终端可以是具有触屏功能的电子设备、头戴式虚拟现实(Virtual Reality,VR)终端、与显示器连接的边缘节点设备、具有显示功能的IoT(The Internet of Things,物联网)设备等其中任意一种或多种类型。
如前实施例所述,为更加精准地生成与多角度自由视角视频的视频帧匹配的虚拟信息图像,可以识别所述预设视频帧图像的拼接图像对应的目标对象,并获取所述目标对象的增强现实特效输入数据。在具体实施中,所述交互数据还可以包括目标对象的增强现实特效输入数据,所述增强现实特效输入数据可以包括以下至少一种:现场分析数据、采集目标对象的信息数据、与采集目标对象关联的装备的信息数据、现场部署的物品的信息数据、现场展示的徽标的信息数据。基于所述交互数据,可以生成所述虚拟信息图像,进而可以生成所述多角度自由视角合成视频帧,从而使得植入的AR特效更加丰富而有针对性,由此,终端用户可以更加深入、全面、专业地了解所观看的内容,进一步提升用户的视觉体验。
本说明书实施例还提供了相应的服务器的实施方案,参照图27所示的一种服务器的结构示意图,在本说明书一些实施例中,如图27所示,服务器270可以包括:图像重建单元271、虚拟信息图像生成单元272和数据传输单元273,其中:
所述图像重建单元271,适于响应于来自交互终端的图像重建指令,确定交互时刻的交互帧时刻信息,获取对应交互帧时刻的图像组合中预设帧图像的拼接图像及所述图像组合相应的参数数据;
所述虚拟信息图像生成单元272,适于响应于特效生成交互控制指令,生成所述特效生成交互控制指令指示的视频帧的拼接图像对应的虚拟信息图像;
所述数据传输单元273,适于与交互终端进行数据交互,包括:将所述对应交互帧时刻的图像组合中预设视频帧的拼接图像及所述图像组合相应的参数数据传输至所述交互终端,使得所述交互终端基于交互操作所确定的虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据及对应的参数数据,将选择的像素数据和深度数据进行组合渲染,重建得到所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的图像并进行播放;以及将所述特效生成交互控制指令指示的预设帧图像的拼接图像对应的虚拟信息图像传输至所述交互终端,使得所述交互终端将所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧与所述虚拟信息图像进行合成处理,得到多角度自由视角合成视频帧并进行播放。
本说明书实施例还提供了另一种服务器,参照图28所示的服务器的结构示意图,服务器280可以包括:
数据接收单元281,适于接收从多路同步视频流中截取的指定帧时刻的多个同步视频帧作为图像组合,所述多个同步视频帧包含不同拍摄视角的帧图像;
参数数据计算单元282,适于确定所述图像组合相应的参数数据;
深度数据计算单元283,适于确定所述图像组合中各帧图像的深度数据;
视频数据获取单元284,适于基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频的视频帧;
第一虚拟信息图像生成单元285,适于响应于特效生成指令,获取与所述特效生成指令指定的视频帧中的目标对象,获取所述目标对象的增强现实特效输入数据,并基于所述目标对象的增强现实特效输入数据,生成对应虚拟信息图像;相应的,所述第一虚拟信息图像生成单元285可以包括特效区域确定子单元2851及特效数据生成子单元2852。
图像合成单元286,适于将所述虚拟信息图像与所述指定的视频帧进行合成处理,得到合成视频帧;
第一数据传输单元287,适于将合成视频帧输出以插入待播放视频流。
参照图29,本说明书实施例还提供了另一种服务器,服务器290与服务器280不同之处在于,服务器290还可以包括:拼接图像生成单元291和第一数据存储单元292,其中:
拼接图像生成单元291,适于基于所述图像组合的像素数据和深度数据,生成所述图像组合相应的拼接图像,所述拼接图像包括第一字段和第二字段,其中,所述第一字段包括所述图像组合中预设帧图像的像素数据,所述第二字段包括所述图像组合的深度数据;
第一数据存储单元292,适于存储所述图像组合的拼接图像及所述图像组合相应的参数数据。
在本说明书一些实施例中,继续参照图29,所述服务器290还可以包括:数据提取单元293和第二数据传输单元294,其中:
数据提取单元293,适于响应于来自交互终端的图像重建指令,确定交互时刻的交互帧时刻信息,获取对应交互帧时刻的图像组合中预设帧图像的拼接图像及所述图像组合相应的参数数据;
第二数据传输单元294,适于将所述对应交互帧时刻的相应图像组合预设帧图像的拼接图像及相应图像组合相应的参数数据发送至所述交互终端,使得所述交互终端基于交互操作所确定的虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据及对应的参数数据,将选择的像素数据和深度数据进行组合渲染,重建得到所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧并进行播放。
在具体实施中,采用本说明书一些实施例中的服务器,还可以生成与预设帧图像的拼接图像对应的增强现实特效输入数据并存储,以便于后续虚拟信息图像的生成,提升用户视觉体验,也可以使数据资源得到有效利用。继续参照图29,服务器290还可以包括:第二虚拟信息图像生成单元295和第二数据存储单元296,其中:
第二虚拟信息图像生成单元295,适于响应于服务端特效生成交互控制指令,生成所述服务端特效生成交互控制指令指示的预设帧图像的拼接图像对应的虚拟信息图像;
第二数据存储单元296,适于存储预设帧图像的拼接图像对应的虚拟信息图像。
在具体实施中,继续参照图29,服务器290还可以包括:第二虚拟信息图像获取单元297和第三数据传输单元298,其中:
第二虚拟信息图像获取单元297,适于在接收到所述图像重建指令后,响应于来自交互终端的用户端特效生成交互指令,获取所述预设帧图像的拼接图像对应的虚拟 信息图像;
第三数据传输单元298,适于将所述预设帧图像的拼接图像对应的虚拟信息图像发送至所述交互终端,使得所述交互终端将所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧与所述虚拟信息图像进行合成处理,得到用于播放的多角度自由视角合成视频帧。
需要说明的是,本说明书实施例中的增强现实特效输入数据,可以为如上述篮球比赛场景中的运动员特效数据和进球特效数据等,可以理解的是,本说明书实施例中的增强现实特效输入数据并不限于以上示例类型,就篮球运动比赛场景而言,还可以基于教练、广告标识等图像所采集到的现场图像中包含的各种目标对象生成相应的增强现实特效输入数据。
在具体实施中,可以根据具体应用场景、目标对象的特性、目标对象的关联对象、以及具体的特效生成模型(如预设的三维模型、预设的机器学习模型等)等其中一种或多种因素生成相应的虚拟信息图像。
本领域技术人员可以理解,本说明书实施例中各电子设备中的具体单元均可以通过相应的电路来实现。例如,上述各实施例中涉及到的数据获取单元可以由处理器、CPU、输入接口等实现,上述各实施例中涉及到的数据存储单元可以由磁盘、EPROM、ROM等各种存储器件实现,上述实施例中涉及到的各数据传输单元可以由通信接口、通信线路(有线/无线)等实现,此处不再一一例举。
本说明书实施例还提供了一种计算机可读存储介质,其上存储有计算机指令,所述计算机指令运行时可以执行前述任一实施例所述的深度图处理方法或前述任一实施例所述视频重建方法的步骤。具体步骤可以参见前述实施例的介绍,此处不再赘述。
在具体实施中,所述计算机可读存储介质可以包括例如任何合适类型的存储器单元、存储器设备、存储器物品、存储器介质、存储设备、存储物品、存储介质和/或存储单元,例如,存储器、可移除的或不可移除的介质、可擦除或不可擦除介质、可写或可重写介质、数字或模拟介质、硬盘、软盘、光盘只读存储器(CD-ROM)、可刻录光盘(CD-R)、可重写光盘(CD-RW)、光盘、磁介质、磁光介质、可移动存储卡或磁盘、各种类型的数字通用光盘(DVD)、磁带、盒式磁带等。
计算机指令可以包括通过使用任何合适的高级、低级、面向对象的、可视化的、编译的和/或解释的编程语言来实现的任何合适类型的代码,例如,源代码、编译代码、解释代码、可执行代码、静态代码、动态代码、加密代码等。
本说明书实施例中各装置、系统、设备或系统的具体实现方式、工作原理和具体作用及效果,可以参见对应方法实施例中的具体介绍。
虽然本说明书实施例披露如上,但本发明并非限定于此。任何本领域技术人员, 在不脱离本说明书实施例的精神和范围内,均可作各种更动与修改,因此本发明的保护范围应当以权利要求所限定的范围为准。

Claims (38)

  1. 一种数据处理方法,包括:
    获取多角度自由视角视频的视频帧中的目标对象;
    获取基于所述目标对象的增强现实特效输入数据所生成的虚拟信息图像;
    将所述虚拟信息图像与对应的视频帧进行合成处理并展示。
  2. 根据权利要求1所述的数据处理方法,所述多角度自由视角视频基于从多路同步视频流中截取的指定帧时刻的多个同步视频帧所形成的图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建得到,其中,所述多个同步视频帧包含不同拍摄视角的帧图像。
  3. 根据权利要求2所述的数据处理方法,所述获取基于所述目标对象的增强现实特效输入数据所生成的虚拟信息图像,包括:
    基于三维标定得到的所述目标对象在所述多角度自由视角视频的视频帧中的位置,得到与所述目标对象位置匹配的虚拟信息图像。
  4. 根据权利要求3所述的数据处理方法,所述将所述虚拟信息图像与对应的视频帧进行合成处理并展示,包括:
    按照帧时刻排序以及相应帧时刻的虚拟视点位置,将相应帧时刻的虚拟信息图像与对应帧时刻的视频帧进行合成处理并展示。
  5. 根据权利要求1至4任一项所述的数据处理方法,所述将所述虚拟信息图像与对应的视频帧进行合成处理并展示,包括如下至少一种:
    将所述虚拟信息图像与对应的视频帧进行融合处理,得到融合视频帧,对所述融合视频帧进行展示;
    将所述虚拟信息图像叠加在对应的视频帧之上,得到叠加合成视频帧,对所述叠加合成视频帧进行展示。
  6. 根据权利要求5所述的数据处理方法,所述对所述融合视频帧进行展示,包括:
    将所述融合视频帧插入待播放视频流进行播放展示。
  7. 根据权利要求1至4任一项所述的数据处理方法,所述获取多角度自由视角视频的视频帧中的目标对象,包括:
    响应于特效生成交互控制指令,获取所述多角度自由视角视频的视频帧中的目标对象。
  8. 根据权利要求7所述的数据处理方法,所述获取基于所述目标对象的增强现实特效输入数据所生成的虚拟信息图像,包括:
    基于所述目标对象的增强现实特效输入数据,按照预设的特效生成方式,生成所 述目标对象对应的虚拟信息图像。
  9. 一种数据处理方法,包括:
    接收从多路同步视频流中截取的指定帧时刻的多个同步视频帧作为图像组合,所述多个同步视频帧包含不同拍摄视角的帧图像;
    确定所述图像组合相应的参数数据;
    确定所述图像组合中各帧图像的深度数据;
    基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频的视频帧;
    响应于特效生成指令,获取与所述特效生成指令指定的视频帧中的目标对象,获取所述目标对象的增强现实特效输入数据,并基于所述目标对象的增强现实特效输入数据,生成对应的虚拟信息图像;
    将所述虚拟信息图像与所述指定的视频帧进行合成处理,得到合成视频帧;
    将所述合成视频帧进行展示。
  10. 根据权利要求9所述的数据处理方法,所述基于所述目标对象的增强现实特效输入数据,生成对应的虚拟信息图像,包括:
    将所述目标对象的增强现实特效输入数据作为输入,基于三维标定得到的所述目标对象在所述多角度自由视角视频的视频帧中的位置,采用预设的第一特效生成方式,生成对应视频帧中与所述目标对象匹配的虚拟信息图像。
  11. 根据权利要求9或10所述的数据处理方法,所述响应于特效生成指令,获取与所述特效生成指令指定的视频帧中的目标对象,并获取所述目标对象的增强现实特效输入数据,包括:
    根据服务端特效生成交互控制指令,确定特效输出类型;
    获取所述目标对象的历史数据,根据所述特效输出类型对所述历史数据进行处理,得到与所述特效输出类型对应的增强现实特效输入数据。
  12. 根据权利要求9所述的数据处理方法,所述基于所述目标对象的增强现实特效输入数据,生成对应的虚拟信息图像,包括以下至少一种:
    将所述目标对象的增强现实特效输入数据输入至预设的三维模型,基于三维标定得到的所述目标对象在所述多角度自由视角视频的视频帧中的位置,输出与所述目标对象匹配的虚拟信息图像;
    将所述目标对象的增强现实特效输入数据,输入至预设的机器学习模型,基于三维标定得到的所述目标对象在所述多角度自由视角视频的视频帧中的位置,输出与所述目标对象匹配的虚拟信息图像。
  13. 根据权利要求9所述的数据处理方法,所述将所述虚拟信息图像与所述指定 的视频帧进行合成处理,得到合成视频帧,包括:
    基于三维标定得到的所述目标对象在所述指定的视频帧中的位置,将所述虚拟信息图像与所述指定的视频帧进行融合处理,得到融合视频帧。
  14. 根据权利要求9所述的数据处理方法,所述将所述合成视频帧进行展示,包括:
    将所述合成视频帧插入至播放控制设备的待播放视频流以通过播放终端进行播放。
  15. 根据权利要求9所述的数据处理方法,还包括:
    基于所述图像组合的像素数据和深度数据,生成所述图像组合相应的拼接图像,所述拼接图像包括第一字段和第二字段,其中,所述第一字段包括所述图像组合中预设帧图像的像素数据,所述第二字段包括所述图像组合的深度数据;
    存储所述图像组合的拼接图像及所述图像组合相应的参数数据;
    响应于来自交互终端的图像重建指令,确定交互时刻的交互帧时刻信息,获取对应交互帧时刻的图像组合中预设帧图像的拼接图像及所述图像组合相应的参数数据并发送至所述交互终端,使得所述交互终端基于交互操作所确定的虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据及对应的参数数据,将选择的像素数据和深度数据进行组合渲染,重建得到所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧并进行播放。
  16. 根据权利要求15所述的数据处理方法,还包括:
    响应于服务端特效生成交互控制指令,生成所述服务端特效生成交互控制指令指示的预设视频帧的拼接图像对应的虚拟信息图像;
    存储所述预设视频帧的拼接图像对应的虚拟信息图像。
  17. 根据权利要求16所述的数据处理方法,在接收到所述图像重建指令后,还包括:
    响应于来自交互终端的用户端特效生成交互指令,获取所述预设视频帧的拼接图像对应的虚拟信息图像;
    将所述预设视频帧的拼接图像对应的虚拟信息图像发送至所述交互终端,使得所述交互终端将所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧与所述虚拟信息图像进行合成处理,得到合成视频帧并展示。
  18. 根据权利要求17所述的数据处理方法,还包括:
    响应于用户端特效退出交互指令,停止获取所述预设视频帧的拼接图像对应的虚拟信息图像。
  19. 根据权利要求17所述的数据处理方法,所述响应于来自交互终端的用户端特效生成交互指令,获取所述预设视频帧的拼接图像对应的虚拟信息图像,包括:
    基于所述用户端特效生成交互指令,确定所述预设视频帧的拼接图像中对应的目标对象;
    获取与所述预设视频帧中的目标对象匹配的虚拟信息图像。
  20. 根据权利要求19所述的数据处理方法,所述获取与所述预设视频帧中的目标对象匹配的虚拟信息图像,包括:
    获取预先基于三维标定得到的所述目标对象在所述预设视频帧中的位置所生成的与所述目标对象匹配的虚拟信息图像。
  21. 根据权利要求17至20任一项所述的数据处理方法,所述将所述预设视频帧的拼接图像对应的虚拟信息图像发送至所述交互终端,使得所述交互终端在所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧与所述虚拟信息图像进行合成处理,得到合成视频帧,包括:
    将所述预设视频帧的拼接图像对应的虚拟信息图像发送至所述交互终端,使得所述交互终端在所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧之上叠加所述虚拟信息图像,得到叠加合成视频帧。
  22. 一种数据处理方法,包括:
    响应于来自交互终端的图像重建指令,确定交互时刻的交互帧时刻信息,获取对应交互帧时刻的图像组合中预设帧图像的拼接图像及所述图像组合相应的参数数据并发送至所述交互终端,使得所述交互终端基于交互操作所确定的虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据及对应的参数数据,将选择的像素数据和深度数据进行组合渲染,重建得到所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧并进行播放;
    响应于特效生成交互控制指令,获取所述特效生成交互控制指令指示的预设视频帧的拼接图像对应的虚拟信息图像;
    将所述预设视频帧的拼接图像对应的虚拟信息图像发送至所述交互终端,使得所述交互终端将在所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧与所述虚拟信息图像进行合成处理,得到合成视频帧;
    将所述合成视频帧进行展示。
  23. 根据权利要求22所述的数据处理方法,所述预设视频帧的拼接图像基于所述交互帧时刻的图像组合的像素数据和深度数据生成,所述拼接图像包括第一字段和第二字段,其中,所述第一字段包括所述图像组合中所述预设帧图像的像素数据,所述第二字段包括所述图像组合的深度数据;
    所述交互帧时刻的图像组合基于从多路同步视频流中截取指定帧时刻的多个同步视频帧得到,所述多个同步视频帧包含不同拍摄视角的帧图像。
  24. 根据权利要求22所述的数据处理方法,所述响应于特效生成交互控制指令, 获取所述特效生成交互控制指令指示的预设视频帧的拼接图像对应的虚拟信息图像,包括:
    响应于特效生成交互控制指令,获取所述特效生成交互控制指令指示的视频帧中的目标对象;
    获取预先基于所述目标对象的增强现实特效输入数据所生成的虚拟信息图像。
  25. 一种数据处理方法,包括:
    实时进行多角度自由视角视频的视频帧的展示;
    响应于对所述多角度自由视角视频的视频帧中特效展示标识的触发操作,获取对应于所述特效展示标识的指定帧时刻的视频帧的虚拟信息图像;
    将所述虚拟信息图像与对应的视频帧进行合成处理并展示。
  26. 根据权利要求25所述的数据处理方法,所述响应于对所述多角度自由视角视频的图像中特效展示标识的触发操作,获取对应于所述特效展示标识的指定帧时刻的视频帧的虚拟信息图像,包括:
    获取与所述特效展示标识对应的指定帧时刻的视频帧中目标对象的虚拟信息图像。
  27. 根据权利要求26所述的数据处理方法,所述将所述虚拟信息图像与对应的视频帧进行合成处理并展示,包括:
    基于三维标定确定的所述目标对象在所述指定帧时刻的视频帧中的位置,将所述虚拟信息图像叠加在所述指定帧时刻的视频帧之上,得到叠加合成视频帧并展示。
  28. 一种数据处理系统,包括:
    目标对象获取单元,适于获取多角度自由视角视频的视频帧中目标对象;
    虚拟信息图像获取单元,适于获取基于所述目标对象的增强现实特效输入数据所生成的虚拟信息图像;
    图像合成单元,适于将将所述虚拟信息图像与对应的视频帧进行合成处理,得到合成视频帧;
    展示单元,适于展示得到的合成视频帧。
  29. 一种数据处理系统,包括:数据处理设备、服务器、播放控制设备以及播放终端,其中:
    所述数据处理设备,适于基于视频帧截取指令,从现场采集区域不同位置实时同步采集的多路视频数据流中对指定帧时刻的视频帧截取得到多个同步视频帧,将获得的所述指定帧时刻的多个同步视频帧上传至所述服务器;
    所述服务器,适于接收所述数据处理设备上传的多个同步视频帧作为图像组合,确定所述图像组合相应的参数数据以及所述图像组合中各帧图像的深度数据,并基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据, 对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频的视频帧;以及响应于特效生成指令,获取与所述特效生成指令指定的视频帧中的目标对象,获取所述目标对象的增强现实特效输入数据,并基于所述目标对象的增强现实特效输入数据,生成对应的虚拟信息图像,将所述虚拟信息图像与所述指定的视频帧进行合成处理,得到合成视频帧,并将所述合成视频帧输入至播放控制设备;
    所述播放控制设备,适于将所述合成视频帧插入至待播放视频流;
    所述播放终端,适于接收来自所述播放控制设备的待播放视频流并进行实时播放。
  30. 根据权利要求29所述的数据处理系统,还包括交互终端;其中:
    所述服务器,还适于基于所述图像组合的像素数据和深度数据,生成所述图像组合相应的拼接图像,所述拼接图像包括第一字段和第二字段,其中,所述第一字段包括所述图像组合中预设帧图像的像素数据,所述第二字段包括所述图像组合的深度数据;以及存储所述图像组合的拼接图像及所述图像组合相应的参数数据;以及响应于来自所述交互终端的图像重建指令,确定交互时刻的交互帧时刻信息,获取对应交互帧时刻的图像组合中预设帧图像的拼接图像及所述图像组合相应的参数数据并发送至所述交互终端;
    所述交互终端,适于基于交互操作,向所述服务器发送所述图像重建指令,并基于交互操作所确定的虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据以及对应的参数数据,将选择的像素数据和深度数据进行组合渲染,重建得到所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧并进行播放。
  31. 根据权利要求30所述的数据处理系统,所述服务器,还适于根据服务端特效生成交互控制指令,生成所述服务端特效生成交互控制指令指示的预设视频帧的拼接图像对应的虚拟信息图像并存储。
  32. 根据权利要求31所述的数据处理系统,所述服务器,还适于响应于来自交互终端的用户端特效生成交互指令,获取所述预设视频帧的拼接图像对应的虚拟信息图像,将所述预设视频帧的拼接图像对应的虚拟信息图像发送至所述交互终端;
    所述交互终端,适于将所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧与所述虚拟信息图像进行合成处理,得到合成视频帧并进行播放展示。
  33. 一种服务器,包括:
    数据接收单元,适于接收从多路同步视频流中截取的指定帧时刻的多个同步视频帧作为图像组合,所述多个同步视频帧包含不同拍摄视角的帧图像;
    参数数据计算单元,适于确定所述图像组合相应的参数数据;
    深度数据计算单元,适于确定所述图像组合中各帧图像的深度数据;
    视频数据获取单元,适于基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频的视频帧;
    第一虚拟信息图像生成单元,适于响应于特效生成指令,获取与所述特效生成指令指定的视频帧中的目标对象,获取所述目标对象的增强现实特效输入数据,并基于所述目标对象的增强现实特效输入数据,生成对应虚拟信息图像;
    图像合成单元,适于将所述虚拟信息图像与所述指定的视频帧进行合成处理,得到合成视频帧;
    第一数据传输单元,适于将所述合成视频帧输出以插入待播放视频流。
  34. 根据权利要求33所述的服务器,所述第一虚拟信息图像生成单元,适于将所述目标对象的增强现实特效输入数据作为输入,基于三维标定得到的所述目标对象在所述多角度自由视角视频的视频帧中的位置,采用预设的第一特效生成方式,生成对应视频帧中与所述目标对象匹配的虚拟信息图像。
  35. 一种服务器,包括:
    图像重建单元,适于响应于来自交互终端的图像重建指令,确定交互时刻的交互帧时刻信息,获取对应交互帧时刻的图像组合中预设帧图像的拼接图像及所述图像组合相应的参数数据;
    虚拟信息图像生成单元,适于响应于特效生成交互控制指令,生成所述特效生成交互控制指令指示的视频帧的图像组合的拼接图像对应的虚拟信息图像;
    数据传输单元,适于与交互终端进行数据交互,包括:将所述对应交互帧时刻的图像组合中预设视频帧的拼接图像及所述图像组合相应的参数数据传输至所述交互终端,使得所述交互终端基于交互操作所确定的虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据及对应的参数数据,将选择的像素数据和深度数据进行组合渲染,重建得到所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的图像并进行播放;以及将所述特效生成交互控制指令指示的预设帧图像的拼接图像对应的虚拟信息图像传输至所述交互终端,使得所述交互终端将所述交互帧时刻虚拟视点位置对应的多角度自由视角视频的视频帧与所述虚拟信息图像进行合成处理,得到多角度自由视角合成视频帧并进行播放。
  36. 一种交互终端,包括:
    第一展示单元,适于实时进行多角度自由视角视频的图像的展示,其中,所述多角度自由视角视频的图像是通过指定帧时刻的多个同步视频帧图像形成的图像组合的参数数据、所述图像组合的像素数据和深度数据重建得到,所述多个同步视频帧包括不同拍摄视角的帧图像;
    特效数据获取单元,适于响应于对所述多角度自由视角视频图像中特效展示标识 的触发操作,获取对应于所述特效展示标识的指定帧时刻的虚拟信息图像;
    第二展示单元,适于将所述虚拟信息图像叠加展示在所述多角度自由视角视频的视频帧上。
  37. 一种电子设备,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机指令,所述处理器运行所述计算机指令时执行权利要求1至27任一项所述方法的步骤。
  38. 一种计算机可读存储介质,其上存储有计算机指令,所述计算机指令运行时执行权利要求1至27任一项所述方法的步骤。
PCT/CN2021/099047 2020-06-10 2021-06-09 数据处理方法、系统、相关设备和存储介质 WO2021249414A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010522454.0 2020-06-10
CN202010522454.0A CN113784148A (zh) 2020-06-10 2020-06-10 数据处理方法、系统、相关设备和存储介质

Publications (1)

Publication Number Publication Date
WO2021249414A1 true WO2021249414A1 (zh) 2021-12-16

Family

ID=78834879

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/099047 WO2021249414A1 (zh) 2020-06-10 2021-06-09 数据处理方法、系统、相关设备和存储介质

Country Status (2)

Country Link
CN (1) CN113784148A (zh)
WO (1) WO2021249414A1 (zh)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114401414A (zh) * 2021-12-27 2022-04-26 北京达佳互联信息技术有限公司 沉浸式直播的信息显示方法及系统、信息推送方法
CN114500773A (zh) * 2021-12-28 2022-05-13 天翼云科技有限公司 一种转播方法及装置
CN114648615A (zh) * 2022-05-24 2022-06-21 四川中绳矩阵技术发展有限公司 目标对象交互式重现的控制方法、装置、设备及存储介质
CN114866800A (zh) * 2022-03-28 2022-08-05 广州博冠信息科技有限公司 视频的播放控制方法、装置和电子设备
CN115022697A (zh) * 2022-04-28 2022-09-06 京东科技控股股份有限公司 添加有内容元素的视频的展示方法、电子设备及程序产品
CN115098000A (zh) * 2022-02-22 2022-09-23 北京字跳网络技术有限公司 图像处理方法、装置、电子设备及存储介质
CN115202485A (zh) * 2022-09-15 2022-10-18 深圳飞蝶虚拟现实科技有限公司 一种基于xr技术的姿态同步互动展馆展示系统
CN116305840A (zh) * 2023-02-21 2023-06-23 山东维创精密电子有限公司 一种虚拟现实服务器用的数据交互管理平台
WO2024001223A1 (zh) * 2022-06-27 2024-01-04 华为技术有限公司 一种显示方法、设备及系统
WO2024001661A1 (zh) * 2022-06-28 2024-01-04 北京新唐思创教育科技有限公司 视频合成方法、装置、设备和存储介质
WO2024001677A1 (zh) * 2022-06-30 2024-01-04 腾讯科技(深圳)有限公司 页面显示方法、装置、计算机设备、存储介质及程序产品
WO2024031882A1 (zh) * 2022-08-08 2024-02-15 珠海普罗米修斯视觉技术有限公司 视频处理方法、装置及计算机可读存储介质
WO2024104307A1 (zh) * 2022-11-17 2024-05-23 北京字跳网络技术有限公司 直播视频流渲染方法、装置、设备、存储介质及产品

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114302128A (zh) * 2021-12-31 2022-04-08 视伴科技(北京)有限公司 视频生成的方法、装置、电子设备及存储介质
CN114390214B (zh) * 2022-01-20 2023-10-31 脸萌有限公司 一种视频生成方法、装置、设备以及存储介质
CN114390215B (zh) * 2022-01-20 2023-10-24 脸萌有限公司 一种视频生成方法、装置、设备以及存储介质
CN114570016A (zh) * 2022-02-25 2022-06-03 阿里巴巴(中国)有限公司 云游戏处理方法、云游戏系统及电子设备
CN115361576A (zh) * 2022-07-20 2022-11-18 中国电信股份有限公司 视频数据处理方法、装置,以及,电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060028473A1 (en) * 2004-08-03 2006-02-09 Microsoft Corporation Real-time rendering system and process for interactive viewpoint video
CN103051830A (zh) * 2012-12-31 2013-04-17 北京中科大洋科技发展股份有限公司 一种对所拍目标多角度实时转播的系统和方法
CN104994369A (zh) * 2013-12-04 2015-10-21 中兴通讯股份有限公司 一种图像处理方法、用户终端、图像处理终端及系统
US20180192033A1 (en) * 2016-12-30 2018-07-05 Google Inc. Multi-view scene flow stitching
CN108629830A (zh) * 2018-03-28 2018-10-09 深圳臻迪信息技术有限公司 一种三维环境信息显示方法及设备
CN110798673A (zh) * 2019-11-13 2020-02-14 南京大学 基于深度卷积神经网络的自由视点视频生成及交互方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2413286A1 (en) * 2010-07-29 2012-02-01 LiberoVision AG Image processing method and device for instant replay
US10321117B2 (en) * 2014-04-11 2019-06-11 Lucasfilm Entertainment Company Ltd. Motion-controlled body capture and reconstruction
CN108076345A (zh) * 2016-11-09 2018-05-25 阿里巴巴集团控股有限公司 多视角视频帧的编码方法、传输方法、装置、计算机
CN107862718B (zh) * 2017-11-02 2020-01-24 深圳市自由视像科技有限公司 4d全息视频捕捉方法
CN108109209A (zh) * 2017-12-11 2018-06-01 广州市动景计算机科技有限公司 一种基于增强现实的视频处理方法及其装置
US11151791B2 (en) * 2018-04-17 2021-10-19 Edx Technologies, Inc. R-snap for production of augmented realities
CN109889914B (zh) * 2019-03-08 2021-04-02 腾讯科技(深圳)有限公司 视频画面推送方法、装置、计算机设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060028473A1 (en) * 2004-08-03 2006-02-09 Microsoft Corporation Real-time rendering system and process for interactive viewpoint video
CN103051830A (zh) * 2012-12-31 2013-04-17 北京中科大洋科技发展股份有限公司 一种对所拍目标多角度实时转播的系统和方法
CN104994369A (zh) * 2013-12-04 2015-10-21 中兴通讯股份有限公司 一种图像处理方法、用户终端、图像处理终端及系统
US20180192033A1 (en) * 2016-12-30 2018-07-05 Google Inc. Multi-view scene flow stitching
CN108629830A (zh) * 2018-03-28 2018-10-09 深圳臻迪信息技术有限公司 一种三维环境信息显示方法及设备
CN110798673A (zh) * 2019-11-13 2020-02-14 南京大学 基于深度卷积神经网络的自由视点视频生成及交互方法

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114401414B (zh) * 2021-12-27 2024-01-23 北京达佳互联信息技术有限公司 沉浸式直播的信息显示方法及系统、信息推送方法
CN114401414A (zh) * 2021-12-27 2022-04-26 北京达佳互联信息技术有限公司 沉浸式直播的信息显示方法及系统、信息推送方法
CN114500773B (zh) * 2021-12-28 2023-10-13 天翼云科技有限公司 一种转播方法、系统和存储介质
CN114500773A (zh) * 2021-12-28 2022-05-13 天翼云科技有限公司 一种转播方法及装置
CN115098000A (zh) * 2022-02-22 2022-09-23 北京字跳网络技术有限公司 图像处理方法、装置、电子设备及存储介质
CN115098000B (zh) * 2022-02-22 2023-10-10 北京字跳网络技术有限公司 图像处理方法、装置、电子设备及存储介质
CN114866800A (zh) * 2022-03-28 2022-08-05 广州博冠信息科技有限公司 视频的播放控制方法、装置和电子设备
CN115022697A (zh) * 2022-04-28 2022-09-06 京东科技控股股份有限公司 添加有内容元素的视频的展示方法、电子设备及程序产品
CN114648615A (zh) * 2022-05-24 2022-06-21 四川中绳矩阵技术发展有限公司 目标对象交互式重现的控制方法、装置、设备及存储介质
CN114648615B (zh) * 2022-05-24 2022-07-29 四川中绳矩阵技术发展有限公司 目标对象交互式重现的控制方法、装置、设备及存储介质
WO2024001223A1 (zh) * 2022-06-27 2024-01-04 华为技术有限公司 一种显示方法、设备及系统
WO2024001661A1 (zh) * 2022-06-28 2024-01-04 北京新唐思创教育科技有限公司 视频合成方法、装置、设备和存储介质
WO2024001677A1 (zh) * 2022-06-30 2024-01-04 腾讯科技(深圳)有限公司 页面显示方法、装置、计算机设备、存储介质及程序产品
WO2024031882A1 (zh) * 2022-08-08 2024-02-15 珠海普罗米修斯视觉技术有限公司 视频处理方法、装置及计算机可读存储介质
CN115202485A (zh) * 2022-09-15 2022-10-18 深圳飞蝶虚拟现实科技有限公司 一种基于xr技术的姿态同步互动展馆展示系统
WO2024104307A1 (zh) * 2022-11-17 2024-05-23 北京字跳网络技术有限公司 直播视频流渲染方法、装置、设备、存储介质及产品
CN116305840B (zh) * 2023-02-21 2023-12-15 四川物通科技有限公司 一种虚拟现实服务器用的数据交互管理平台
CN116305840A (zh) * 2023-02-21 2023-06-23 山东维创精密电子有限公司 一种虚拟现实服务器用的数据交互管理平台

Also Published As

Publication number Publication date
CN113784148A (zh) 2021-12-10

Similar Documents

Publication Publication Date Title
WO2021249414A1 (zh) 数据处理方法、系统、相关设备和存储介质
US11217006B2 (en) Methods and systems for performing 3D simulation based on a 2D video image
US10810798B2 (en) Systems and methods for generating 360 degree mixed reality environments
WO2021083176A1 (zh) 数据交互方法及系统、交互终端、可读存储介质
US9661275B2 (en) Dynamic multi-perspective interactive event visualization system and method
US20200053347A1 (en) Dynamic angle viewing system
US20130278727A1 (en) Method and system for creating three-dimensional viewable video from a single video stream
US11748870B2 (en) Video quality measurement for virtual cameras in volumetric immersive media
US20130321575A1 (en) High definition bubbles for rendering free viewpoint video
WO2021083178A1 (zh) 数据处理方法及系统、服务器和存储介质
KR20060048551A (ko) 상호작용적 시점 비디오 시스템 및 프로세스
KR20200126367A (ko) 정보 처리 장치, 정보 처리 방법, 및 프로그램
US20200388068A1 (en) System and apparatus for user controlled virtual camera for volumetric video
JPH07502385A (ja) 選択および挿入された標章を有するテレビ表示
WO2022002181A1 (zh) 自由视点视频重建方法及播放处理方法、设备及存储介质
JP2009505553A (ja) ビデオストリームへの視覚効果の挿入を管理するためのシステムおよび方法
WO2021083174A1 (zh) 虚拟视点图像生成方法、系统、电子设备及存储介质
KR101249447B1 (ko) 이동물체의 객체분석 모듈을 이용한 실시간 영상합성 시스템 및 이를 이용한 방송중계방법
Langlotz et al. AR record&replay: situated compositing of video content in mobile augmented reality
WO2022001865A1 (zh) 深度图及视频处理、重建方法、装置、设备及存储介质
CN113542721B (zh) 深度图处理方法、视频重建方法及相关装置
US20230353717A1 (en) Image processing system, image processing method, and storage medium
WO2021083175A1 (zh) 数据处理方法、设备、系统、可读存储介质及服务器
JP6450305B2 (ja) 情報取得装置、情報取得方法及び情報取得プログラム
JP6392739B2 (ja) 画像処理装置、画像処理方法及び画像処理プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21821282

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21821282

Country of ref document: EP

Kind code of ref document: A1