WO2021083174A1 - 虚拟视点图像生成方法、系统、电子设备及存储介质 - Google Patents

虚拟视点图像生成方法、系统、电子设备及存储介质 Download PDF

Info

Publication number
WO2021083174A1
WO2021083174A1 PCT/CN2020/124272 CN2020124272W WO2021083174A1 WO 2021083174 A1 WO2021083174 A1 WO 2021083174A1 CN 2020124272 W CN2020124272 W CN 2020124272W WO 2021083174 A1 WO2021083174 A1 WO 2021083174A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
image
frame
video
virtual viewpoint
Prior art date
Application number
PCT/CN2020/124272
Other languages
English (en)
French (fr)
Inventor
盛骁杰
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2021083174A1 publication Critical patent/WO2021083174A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/167Synchronising or controlling image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/161Encoding, multiplexing or demultiplexing different image signal components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/366Image reproducers using viewer tracking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/398Synchronisation thereof; Control thereof

Definitions

  • the embodiments of the present invention relate to the field of image processing technology, and in particular to a method, system, electronic device, and storage medium for generating a virtual viewpoint image.
  • 6 Degrees of Freedom (6DoF) technology is a technology to provide a high degree of freedom viewing experience.
  • the user can adjust the viewing angle of the video through interactive means while watching, and watch from the free point of view the user wants to watch. , Thereby greatly improving the viewing experience.
  • the current virtual viewpoint image generation method is slow, which severely restricts the low-latency playback and real-time interaction requirements of multi-angle free-view videos.
  • embodiments of the present invention provide a virtual viewpoint image generation method, system, electronic device, and storage medium to increase the speed of virtual viewpoint image generation and meet the requirements for low-latency playback and real-time interaction of multi-angle free-view videos.
  • the embodiment of the present invention provides a method for generating a virtual viewpoint image, including:
  • the preset virtual view point path data and the parameter data of the image combination select from the image combination the texture map and the depth map of the corresponding group of each virtual view point in the virtual view point path;
  • a plurality of threads respectively perform combined rendering of pixels in a corresponding group of texture maps and depth maps in the selected image combination, including :
  • the forward mapping of the depth maps of the corresponding group in parallel to the virtual viewpoint includes: running a first core function on the graphics processor, and converting the depth maps of the corresponding group Pixels are forward mapped in parallel to the corresponding virtual view point position, where: for multiple depth values mapped to the same pixel of the virtual view point, atomic operation is used to take the maximum pixel value to obtain the corresponding virtual view point position The first depth map; based on the first depth map of the virtual viewpoint position, create a second depth map of the virtual viewpoint position, and process each pixel in the second depth map in parallel, taking the first depth map The maximum value of the pixel points in the preset area around the corresponding pixel position in the depth map.
  • the post-processing of the depth map after forward mapping in parallel includes: running a second core function on the graphics processor, and performing the post-processing on each of the second depth maps obtained after forward mapping. For pixels, a median filtering process is performed in a preset area around the pixel position.
  • the parallel reverse mapping of the texture map of the corresponding group includes: running a third core function on the graphics processor, and performing interpolation operations on the pixels in the texture map of the selected corresponding group in parallel , Generate the corresponding virtual texture map.
  • the parallel fusion of pixels in each virtual texture map generated after reverse mapping includes: running a fourth core function on the graphics processor, and performing reverse mapping on each generated pixel The pixels at the same position in the virtual texture map are weighted and fused in parallel.
  • the method further includes: performing hole filling on each pixel in the texture map after weighted fusion in parallel to obtain the virtual The image corresponding to the viewpoint.
  • the inputting the texture map and the depth map of the corresponding group of each virtual viewpoint into the graphics processor includes: inputting the texture map and the depth map of the corresponding group of each virtual viewpoint in the virtual viewpoint path respectively Among the multiple graphics processors, each graphics processor processes in parallel to generate multiple virtual viewpoint images.
  • the embodiment of the present invention provides a virtual viewpoint image generation system, including:
  • the central processing unit is adapted to obtain image combinations of multiple angles and free viewing angles, parameter data of the image combinations, and preset virtual viewpoint path data, wherein the image combination includes multiple sets of textures with corresponding relationships that are synchronized at multiple angles Map and depth map; according to the preset virtual viewpoint path data and the parameter data of the image combination, select from the image combination the texture map and depth map of the corresponding group of each virtual viewpoint in the virtual viewpoint path;
  • the graphics processor is suitable for each virtual viewpoint in the virtual viewpoint path, using pixel points as the processing unit, and multiple threads respectively combine and render the pixels in the corresponding group of texture maps and depth maps in the selected image combination to obtain The image corresponding to the virtual viewpoint.
  • the graphics processor is adapted to forward-map the depth maps of the corresponding group in parallel to the virtual viewpoint; perform post-processing on the depth maps of the corresponding group in parallel;
  • the texture map is reversely mapped in parallel; the pixels in each virtual texture map generated after the reverse mapping are merged in parallel.
  • An embodiment of the present invention also provides an electronic device, including: a memory, a central processing unit, and a graphics processing unit.
  • the memory stores computer instructions that can run on the central processing unit and the graphics processing unit.
  • the processor and the graphics processor execute the steps of the method for generating a virtual viewpoint image according to any embodiment of the present invention when the computer instructions are executed in cooperation.
  • the embodiment of the present invention also provides a computer-readable storage medium having computer instructions stored thereon, and the computer instructions execute the steps of the virtual viewpoint image generation method according to any embodiment of the present invention when the computer instructions are run.
  • the pixel point can be used as the processing unit, Multiple threads respectively combine and render the pixels in the corresponding group of texture maps and depth maps in the selected image combination to obtain the image corresponding to the virtual viewpoint.
  • the graphics processor can correspond to each virtual viewpoint. Pixels are combined and rendered by multiple threads in parallel, which can greatly accelerate the time-efficiency performance of virtual viewpoint image generation with multi-angle free-view angles, meet the needs of low-latency playback and real-time interaction of multi-angle free-view videos, and improve user experience .
  • the pixels in the depth map of the corresponding group are forward-mapped in parallel, mapped to the corresponding virtual viewpoint position, and for a plurality of mappings to the virtual viewpoint
  • the depth value of the same pixel uses atomic operations to take the maximum value of the pixel value to obtain the first depth map of the corresponding virtual viewpoint position, which can quickly process the front background occlusion relationship in the depth map mapping, and for the created virtual
  • taking the maximum value of the pixel points in the preset area around the corresponding pixel position in the first depth map can improve the mapping gap effect. Since each pixel can be processed in parallel, Therefore, the processing speed of the forward mapping can be greatly accelerated, and the time-efficiency performance of the forward mapping can be improved.
  • each pixel in the second depth map obtained after forward mapping is subjected to median filtering processing in a preset area around the pixel position. Greatly speed up the post-processing speed and improve the time-efficiency performance of the post-processing.
  • the third core function is run on the graphics processor, and the pixels in the texture map of the selected corresponding group are interpolated in parallel to generate the corresponding virtual texture map, which can greatly accelerate the processing speed of reverse mapping. Improve the time-sensitive performance of reverse mapping.
  • the fourth core function is run on the graphics processor, and the pixels at the same position in each virtual texture map generated after reverse mapping are weighted and fused in parallel, which can greatly speed up the fusion of the virtual texture map. Speed, improve the timeliness performance of image fusion.
  • each pixel in the weighted fusion texture image is filled in parallel to obtain an image corresponding to the virtual viewpoint.
  • the quality of the generated virtual viewpoint image can be improved, and for each pixel in parallel Carrying out the hole filling can greatly accelerate the hole filling speed and improve the time-efficiency performance of the hole filling.
  • inputting the texture map and depth map of the corresponding group of each virtual view point in the virtual path into multiple graphics processors to generate multiple virtual view point images in parallel can further accelerate the speed of virtual view point image generation , Improve the timeliness performance of virtual viewpoint image generation.
  • Figure 1 is a schematic structural diagram of a data processing system in an embodiment of the present invention
  • Figure 2 is a flowchart of a data processing method in an embodiment of the present invention.
  • Figure 3 is a schematic structural diagram of a data processing system in an application scenario in an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of an interactive interface of an interactive terminal in an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a server in an embodiment of the present invention.
  • Figure 6 is a flowchart of a data exchange method in an embodiment of the present invention.
  • Figure 7 is a schematic structural diagram of another data processing system in an embodiment of the present invention.
  • Figure 8 is a schematic structural diagram of a data processing system in another application scenario in an embodiment of the present invention.
  • Figure 9 is a schematic structural diagram of an interactive terminal in an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.
  • FIG. 11 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.
  • FIG. 13 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.
  • FIG. 14 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.
  • FIG. 15 is a flowchart of another data processing method in an embodiment of the present invention.
  • 16 is a flowchart of a method for intercepting synchronized video frames in a compressed video data volume in an embodiment of the present invention
  • FIG. 17 is a flowchart of another data processing method in an embodiment of the present invention.
  • FIG. 18 is a schematic structural diagram of a data processing device in an embodiment of the present invention.
  • Figure 19 is a schematic structural diagram of another data processing system in an embodiment of the present invention.
  • Figure 20 is a flowchart of a data synchronization method in an embodiment of the present invention.
  • FIG. 21 is a timing diagram of a pull stream synchronization in an embodiment of the present invention.
  • FIG. 22 is a flowchart of another method for intercepting synchronized video frames in a compressed video data volume in an embodiment of the present invention.
  • Figure 23 is a schematic structural diagram of another data processing device in an embodiment of the present invention.
  • 24 is a schematic diagram of the structure of a data synchronization system in an embodiment of the present invention.
  • 25 is a schematic structural diagram of a data synchronization system in an application scenario in an embodiment of the present invention.
  • FIG. 26 is a flowchart of a method for generating a depth map in an embodiment of the present invention.
  • Figure 27 is a schematic structural diagram of a server in an embodiment of the present invention.
  • FIG. 28 is a schematic diagram of depth map processing performed by a server cluster in an embodiment of the present invention.
  • FIG. 29 is a flowchart of a method for generating a virtual viewpoint image in an embodiment of the present invention.
  • FIG. 30 is a flowchart of a method for GPU to perform combined rendering in an embodiment of the present invention.
  • FIG. 31 is a schematic diagram of a method for filling holes in an embodiment of the present invention.
  • Fig. 32 is a schematic structural diagram of a virtual viewpoint image generation system in an embodiment of the present invention.
  • FIG. 33 is a schematic structural diagram of an electronic device in an embodiment of the present invention.
  • Figure 34 is a schematic structural diagram of another data synchronization system in an embodiment of the present invention.
  • 35 is a schematic structural diagram of another data synchronization system in an embodiment of the present invention.
  • Fig. 36 is a schematic structural diagram of a collection device in an embodiment of the present invention.
  • Fig. 37 is a schematic diagram of a collection array in an application scenario in an embodiment of the present invention.
  • Fig. 38 is a schematic structural diagram of another data processing system in an embodiment of the present invention.
  • Figure 39 is a schematic structural diagram of another interactive terminal in an embodiment of the present invention.
  • Fig. 40 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.
  • Figure 41 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.
  • Fig. 42 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.
  • FIG. 43 is a schematic diagram of the connection of an interactive terminal in an embodiment of the present invention.
  • FIG. 44 is a schematic diagram of interactive operation of an interactive terminal in an embodiment of the present invention.
  • Fig. 45 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.
  • Fig. 46 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.
  • Fig. 47 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.
  • Fig. 48 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.
  • 6DoF 6 Degrees of Freedom
  • Users can adjust the viewing angle of the video through interactive means during the viewing process, and watch from the free point of view they want to watch, thus greatly To enhance the viewing experience.
  • the Free-D playback technology is to obtain the point cloud data of the scene through multi-angle shooting to express the 6DoF image
  • the light field rendering technology is to obtain the depth of field information and three-dimensional position information of the pixels through the focal length and spatial position changes of the dense light field. Then express the 6DoF image.
  • the 6DoF video generation method based on the depth map is based on the virtual viewpoint position and the corresponding parameter data of the texture map and depth map of the corresponding group, and the texture map and depth map of the corresponding group in the image combination of the video frame at the time of user interaction are performed. Combine rendering to reconstruct 6DoF images or videos.
  • the Free-D playback solution when used in the field, a large number of cameras need to be used for raw data collection, and collected to the on-site computer room through a digital component serial interface (SDI) capture card, and then through the on-site computer room
  • SDI digital component serial interface
  • the computing server processes the original data, obtains point cloud data that expresses the three-dimensional position and pixel information of all points in the space, and reconstructs the 6DoF scene.
  • This solution makes the amount of data collected, transmitted, and calculated on-site extremely large, especially for broadcast scenes such as live broadcast and rebroadcast that have high requirements on the transmission network and computing server.
  • the implementation cost of the 6DoF reconstruction scene is too high and there are too many restrictions.
  • the 6DoF video reconstruction method based on the depth map can reduce the amount of data calculations in the video reconstruction process, it is difficult to meet the low-latency playback of multi-angle free-view videos due to the constraints of network transmission bandwidth and device decoding capabilities. And the need for real-time interaction.
  • some embodiments of the present invention propose a multi-angle free-view image generation scheme, which adopts a distributed system architecture, in which an acquisition array composed of multiple acquisition devices is set in the field acquisition area to perform synchronous acquisition of frame images from multiple angles.
  • the frame image collected by the acquisition device is intercepted by the data processing device according to the frame interception instruction, and the server uses the frame images of multiple synchronized video frames uploaded by the data processing device as the image combination, and the image combination can be determined
  • Corresponding parameter data and the depth data of each frame of the image in the image combination based on the corresponding parameter data of the image combination, the pixel data and depth data of the preset frame image in the image combination, to the preset virtual viewpoint path
  • Frame image reconstruction is performed to obtain corresponding multi-angle free-view video data, and the multi-angle free-view video data is inserted into the to-be-played data stream of the playback control device for transmission to the playback terminal for playback.
  • the data processing system 10 includes: a data processing device 11, a server 12, a playback control device 13, and a playback terminal 14.
  • the data processing device 11 can The frame images collected by the acquisition array in the acquisition area are intercepted for video frames. By intercepting the video frames for generating multi-angle free-view images, a large amount of data transmission and data processing can be avoided.
  • the server 12 performs multi-angle free-viewing
  • the image generation can make full use of the powerful computing power of the server to quickly generate multi-angle free-view video data, which can be inserted into the data stream to be played by the playback control device in time, and realize multi-angle free-view video data at low cost. Play to meet the needs of users for low-latency playback and real-time interaction of multi-angle free-view videos.
  • S21 Receive frame images of multiple synchronized video frames uploaded by the data processing device as an image combination.
  • the multiple synchronized video frames are obtained by the data processing device based on the video frame interception instruction, which is obtained by intercepting the video frame at the specified frame time from the multiple video data streams synchronously collected and uploaded in real time from different locations in the field collection area,
  • the shooting angles of the multiple synchronized video frames are different.
  • the video frame interception instruction may include information about a specified frame time, and the data processing device intercepts the corresponding frame time from multiple video data streams according to the information about the specified frame time in the video frame interception instruction Video frames.
  • the designated frame time may be in units of frames, the Nth to Mth frames are regarded as the designated frame time, N and M are both integers not less than 1, and N ⁇ M; or, the designated frame time may also be The unit of time is X to Y seconds as the designated frame time, X and Y are both positive numbers, and X ⁇ Y. Therefore, the multiple synchronized video frames may include all frame-level synchronized video frames corresponding to a specified frame moment, and the pixel data of each video frame forms a corresponding frame image.
  • the data processing equipment can obtain the specified frame time as the second frame of the multiple video data streams, and the data processing equipment intercepts the video frames of the second frame of each video data stream. And the frame-level synchronization between the video frames of the second frame of each intercepted video data stream is used as the obtained multiple synchronized video frames.
  • the data processing device can obtain the video frames within the first second of the multiple video data streams at the specified frame time according to the received video frame interception instruction ,
  • the data processing equipment can respectively intercept 25 video frames within the first second in each video data stream, and the first video frame within the first second in each intercepted video data stream is synchronized at the frame level.
  • the data processing device can obtain the specified frame time as the second and third frames in the multiple video data streams, and the data processing device can intercept the first and third frames of each video data stream. 2 frames of video frames and 3rd frame of video frames, and frame-level synchronization between the 2nd frame and the 3rd frame of the intercepted video data streams, as multiple synchronized video frames .
  • the parameter data corresponding to the image combination can be obtained through a parameter matrix
  • the parameter matrix can include an internal parameter matrix, an external parameter matrix, a rotation matrix and a translation matrix, and the like.
  • a Structure From Motion (SFM) algorithm can be used to perform feature extraction, feature matching, and global optimization on the obtained image combination based on a parameter matrix, and the obtained parameter estimation value is used as the image combination The corresponding parameter data.
  • the algorithm used for feature extraction can include any of the following: Scale-Invariant Feature Transform (SIFT) algorithm, Speeded-Up Robust Features (SURF) algorithm, and accelerated segment test features ( Features from Accelerated Segment Test, FAST) algorithm.
  • Algorithms used for feature matching may include: Euclidean distance calculation method, Random Sample Consensus (RANSC) algorithm, etc.
  • Algorithms for global optimization may include: Bundle Adjustment (BA) and so on.
  • the depth data of each frame image may be determined based on multiple frame images in the image combination.
  • the depth data may include depth values corresponding to pixels of each frame image in the image combination.
  • the distance from the collection point to each point in the scene can be used as the aforementioned depth value, and the depth value can directly reflect the geometric shape of the visible surface in the area to be viewed.
  • the depth value may be the distance from each point in the scene to the optical center along the shooting optical axis.
  • the above distance may be a relative value, and multiple frames of images may use the same reference.
  • a binocular stereo vision algorithm may be used to calculate the depth data of each frame of image.
  • the depth data can also be indirectly estimated by analyzing the features such as the luminosity feature, the light and dark feature of the frame image.
  • a multi-view stereo (MVS) algorithm may be used to reconstruct the frame image.
  • all pixels can be used for reconstruction, or the pixels can be down-sampled and only part of the pixels can be used for reconstruction.
  • the pixel points of each frame image can be matched, the three-dimensional coordinates of each pixel point can be reconstructed to obtain points with image consistency, and then the depth data of each frame image can be calculated.
  • the pixel points of the selected frame image may be matched, and the three-dimensional coordinates of the pixel points of each selected frame image may be reconstructed to obtain points with image consistency, and then the depth data of the corresponding frame image may be calculated.
  • the pixel data of the frame image corresponds to the calculated depth data.
  • the method of selecting the frame image can be set according to the specific situation. For example, the distance between the frame image of the depth data and other frame images can be calculated according to the needs, and the selected part Frame image.
  • the multi-angle free-view video data may include: multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted according to frame time.
  • the pixel data of the frame image can be any of YUV data or RGB data, or other data that can express the frame image;
  • the depth data can include the depth value corresponding to the pixel data of the frame image, Alternatively, it may be selected from a set of depth values corresponding to the pixel data of the frame image, and the specific selection method depends on the specific scenario;
  • the virtual viewpoint is selected from a range of multi-angle free viewing angles, and the multiple The angle-free viewing angle range is a range that supports switching of viewpoints in the area to be viewed.
  • the preset frame image may be all the frame images in the image combination, or may be a selected partial frame image.
  • the selection method can be set according to the specific situation, for example, according to the positional relationship between the collection points, part of the frame images in the corresponding position in the image combination can be selected; for example, according to the desired frame time or frame period , Select part of the frame image of the corresponding frame time in the image combination.
  • each virtual viewpoint in the virtual viewpoint path can be corresponded to each frame time, and the corresponding frame image can be obtained according to the frame time corresponding to each virtual viewpoint, and then based on The image combines the corresponding parameter data, the depth data and the pixel data of the frame image corresponding to the frame time of each virtual viewpoint, and reconstructs the frame image of each virtual viewpoint to obtain the corresponding multi-angle free-view video data.
  • the multi-angle The free-view video data may include: multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted by frame time.
  • the pixel data and depth data of the frame images of a21 synchronized video frames at the first frame time, the second frame of image reconstruction is performed on the path composed of b2 virtual viewpoints, and the corresponding multi-angle free-view video data is finally obtained.
  • the designated frame time and the virtual viewpoint can be divided into more fine-grained divisions, thereby obtaining more synchronized video frames and virtual viewpoints corresponding to different frame times.
  • the above-mentioned embodiment is only an example for illustration, not a specific implementation. limits.
  • a depth map-based image rendering (DIBR) algorithm can be used to combine the corresponding parameter data and the preset virtual viewpoint path according to the image to determine the pixels of the preset frame image Data and depth data are combined for rendering, so as to realize the frame image reconstruction based on the preset virtual viewpoint path, and obtain the corresponding multi-angle free-view video data.
  • DIBR depth map-based image rendering
  • S25 Insert the multi-angle free-view video data into the to-be-played data stream of the playback control device and play it through the playback terminal.
  • the playback control device can take multiple video data streams as input, where the video data stream can come from each collection device in the collection array or from other collection devices.
  • the playback control device can select one input video data stream as the data stream to be played according to needs.
  • the multi-angle free-view video data obtained in step S24 can be inserted into the data stream to be played, or switched by the video data stream of other input interfaces.
  • the playback control device outputs the selected data stream to be played to the playback terminal, which can be played through the playback terminal, so the user can watch the video with multiple free perspectives through the playback terminal image.
  • the playback terminal can be a video playback device such as a TV, a mobile phone, a tablet, a computer, or an electronic device that includes a display screen.
  • the multi-angle free-view video data of the data stream to be played inserted into the playback control device can be retained in the playback terminal to facilitate time-shift viewing by the user, where the time-shift can be a pause or rewind when the user is watching , Fast forward to the current moment and other operations.
  • the data processing equipment in the distributed system architecture can be used to process the interception of the specified video frame and the reconstruction of the multi-angle free-view video after the server intercepts the preset frame image, which can avoid the deployment of a large number of servers on site.
  • the depth of the preset frame image in the image combination may be determined according to the relationship between the virtual parameter data of each virtual viewpoint in the preset virtual viewpoint path and the parameter data corresponding to the image combination.
  • the data are respectively mapped to the corresponding virtual viewpoint; according to the pixel data and depth data of the preset frame image respectively mapped to the corresponding virtual viewpoint, and the preset virtual viewpoint path, the frame image is reconstructed to obtain the corresponding multi-angle free-view video data.
  • the virtual parameter data of the virtual viewpoint may include: virtual viewing position data and virtual viewing angle data; the parameter data corresponding to the image combination may include: collecting position data, shooting angle data, and the like.
  • the forward mapping can be used first, and then the reverse mapping method can be used to obtain the reconstructed image.
  • the collected position data and shooting angle data may be referred to as external parameter data
  • the parameter data may also include internal parameter data
  • the internal parameter data may include attribute data of the collection device, so that the mapping relationship can be determined more accurately.
  • the internal parameter data may include distortion data. Since distortion factors are taken into consideration, the mapping relationship can be further accurately determined spatially.
  • a spliced image corresponding to the image combination may be generated based on the pixel data and depth data of the image combination, and the spliced image may include a first field and a second field. , wherein the first field includes the pixel data of the image combination, and the second field includes the depth data of the image combination, and then the stitched image and corresponding parameter data corresponding to the image combination are stored.
  • a spliced image corresponding to the preset frame image in the image combination may be generated based on the pixel data and depth data of the preset frame image in the image combination.
  • the stitched image corresponding to the frame image may include a first field and a second field, wherein the first field includes pixel data of the preset frame image, and the second field includes depth data of the preset frame image, Then, only the spliced image corresponding to the preset frame image and the corresponding parameter data may be stored.
  • the stitched image can be divided into an image area and a depth map area, the pixel field of the image area stores the pixel data of the multiple frame images, and the depth map area
  • the pixel field stores the depth data of the multiple frame images; the pixel field in the image area that stores the pixel data of the frame image is used as the first field, and the pixel field in the depth map area that stores the depth data of the frame image is used as The second field; the stitched image of the acquired image combination and the parameter data corresponding to the image combination can be stored in the data file.
  • the stitched image or corresponding parameter data needs to be acquired, it can be based on the header file of the data file
  • the included storage address is read from the corresponding storage space.
  • the storage format of the image combination may be a video format
  • the number of image combinations may be multiple
  • each image combination may be a combination of images corresponding to different frame moments after the video is decapsulated and decoded.
  • the interaction frame time information at the interaction time can be determined, and the stored corresponding images corresponding to the interaction frame time can be combined with the stitched image of the preset frame image and the corresponding image combination
  • the corresponding parameter data is sent to the interactive terminal, so that the interactive terminal selects corresponding pixel data and depth data and corresponding parameter data in the stitched image according to preset rules based on the virtual viewpoint position information determined by the interactive operation,
  • the selected pixel data and depth data are combined with the parameter data for rendering, and the multi-angle free-view video data corresponding to the virtual viewpoint position to be interacted is reconstructed and played.
  • the preset rules can be set according to specific scenarios. For example, based on the position information of the virtual viewpoint determined by the interactive operation, the position information of the W adjacent virtual viewpoints that are closest to the virtual viewpoint at the moment of interaction may be selected in order of distance, And obtain the pixel data and depth data corresponding to the above-mentioned total W+1 virtual viewpoints including the virtual viewpoints of the interaction moments that satisfy the information of the interaction frame moments in the stitched image.
  • the interactive frame time information is determined based on the trigger operation from the interactive terminal.
  • the trigger operation may be a trigger operation input by the user or a trigger operation automatically generated by the interactive terminal. For example, the interactive terminal detects that there are multiple angles.
  • the trigger operation can be initiated automatically when the free viewpoint data frame is identified.
  • the user triggers manually it can be the time information when the user chooses to trigger the interaction after the interactive terminal displays the interactive prompt information, or it can be the historical moment information when the interactive terminal receives the user operation to trigger the interaction, and the historical moment information can be the current playback moment. The previous moment information.
  • the interactive terminal may use the stitched image of the preset frame image and corresponding parameter data, the interactive frame time information, and the virtual viewpoint position information at the interactive frame time based on the acquired image combination of the interactive frame time.
  • Step S24 performs combined rendering on the pixel data and depth data of the spliced image of the preset frame image in the image combination of the acquired interactive frame time, to obtain the multi-angle free-view video data corresponding to the interactive virtual viewpoint position, and Start playing a multi-angle free-view video at the interactive virtual viewpoint position.
  • the multi-angle free-view video data corresponding to the interactive virtual viewpoint position can be generated at any time based on the image reconstruction instruction from the interactive terminal, which can further enhance the user's interactive experience.
  • the data processing system 10 may include: a data processing device 11, a server 12, a playback control device 13, and a playback terminal 14. among them:
  • the data processing device 11 is adapted to capture multiple synchronized video frames from multiple video data streams simultaneously collected in real time from different locations in the field collection area based on a video frame interception instruction.
  • the multiple synchronized video frames at the designated frame time are uploaded to the server, where the multiple video data streams may be video data streams in a compressed format, or video data streams in a non-compressed format;
  • the server 12 is adapted to combine the received frame images of multiple synchronized video frames at a specified frame time uploaded by the data processing device 11 as an image combination, and determine the parameter data corresponding to the image combination and the image combination
  • the playback control device 13 is adapted to insert the multi-angle free-view video data into the data stream to be played;
  • the playback terminal 14 is adapted to receive the to-be-played data stream from the playback control device 13 and perform real-time playback.
  • the playback control terminal 13 may output the data stream to be played based on the control instruction.
  • the playback control device 13 may select one of the multiple data streams as the data stream to be played, or continuously switch the selection among the multiple data streams to continuously output the data stream to be played.
  • the broadcast director control device may be used as a playback control device in the embodiment of the present invention.
  • the guide control device can be a manual or semi-manual guide control device that performs playback control based on external input control instructions, or it can be a virtual guide control device that can automatically perform guide control based on artificial intelligence or big data learning or preset algorithms.
  • the data processing equipment in the distributed system architecture can be used to process the interception of the specified video frame and the reconstruction of the multi-angle free-view video after the server intercepts the preset frame image, which can avoid the deployment of a large number of servers on site.
  • the server 12 is further adapted to generate a spliced image of a preset frame time in the image combination based on the pixel data and depth data of the preset frame of the image in the image combination, and the spliced image includes the first A field and a second field, wherein the first field includes pixel data of a preset frame image in the image combination, and the second field includes a second field of depth data of a preset frame image in the image combination, And store the spliced image of the image combination and the parameter data corresponding to the image combination.
  • the data processing system 10 may also include an interactive terminal 15, which is adapted to determine interactive frame time information based on a trigger operation, send an image reconstruction instruction containing the interactive frame time information to the server, and receive the information returned from the server Corresponding to the spliced image and corresponding parameter data of the preset frame image in the image combination at the time of the interactive frame, determine the virtual viewpoint position information based on the interactive operation, and select the corresponding pixel data and depth data in the spliced image according to preset rules, based on The selected pixel data and depth data are combined with the parameter data for rendering, and the multi-angle free-view video data corresponding to the virtual viewpoint position at the time of the interactive frame is reconstructed and played.
  • an interactive terminal 15 is adapted to determine interactive frame time information based on a trigger operation, send an image reconstruction instruction containing the interactive frame time information to the server, and receive the information returned from the server Corresponding to the spliced image and corresponding parameter data of the preset frame image in the image combination at the time of
  • the number of the playing terminal 14 may be one or more, the number of the interactive terminal 15 may be one or more, and the playing terminal 14 and the interactive terminal 15 may be the same terminal device.
  • at least one of a server, a playback control device, or an interactive terminal may be used as the transmitting end of the video frame interception instruction, and other devices capable of transmitting the video frame interception instruction may also be used.
  • the locations of the data processing device and the server can be flexibly deployed according to user requirements.
  • the data processing equipment can be placed in a non-collection area or in the cloud.
  • the server can be placed in a non-collection area on site, on the cloud or terminal access side.
  • edge node devices such as base stations, set-top boxes, routers, home data center servers, and hotspot devices can all be used as locations.
  • the server is used to obtain multi-angle free view data.
  • the data processing device and the server can also be centrally arranged and work together as a server cluster to realize the rapid generation of multi-angle free-view data, so as to realize low-latency playback and real-time playback of multi-angle free-view videos interactive.
  • the multi-angle free-view video data corresponding to the position of the virtual viewpoint to be interacted can be generated at any time based on the image reconstruction instruction from the interactive terminal, which can further enhance the user interaction experience.
  • FIG. 3 it is a schematic diagram of the structure of a data processing system in an application scenario, in which a layout scenario of a data processing system for a basketball game is shown.
  • the data processing system includes a collection of multiple collection devices.
  • An array 31 a data processing device 32, a cloud server cluster 33, a playback control device 34, a playback terminal 35 and an interactive terminal 36.
  • the basketball hoop on the left is taken as the core point of view
  • the core point of view is taken as the center of the circle
  • the fan-shaped area on the same plane as the core point of view is used as the preset multi-angle free viewing angle range.
  • the collection devices in the collection array 31 can be fan-shaped and placed in different positions in the field collection area according to the preset multi-angle free viewing angle range, and can collect video data streams synchronously in real time from corresponding angles respectively.
  • the collection equipment can also be set up in the ceiling area of the basketball stadium, on the basketball stand, and so on.
  • the collection devices can be arranged and distributed along a straight line, a fan shape, an arc line, a circle, or an irregular shape.
  • the specific arrangement can be set according to one or more factors such as the specific site environment, the number of acquisition equipment, the characteristics of the acquisition equipment, and the requirements for imaging effects.
  • the collection device may be any device with a camera function, for example, a common camera, a mobile phone, a professional camera, and the like.
  • the data processing device 32 can be placed in a non-collection area on site, which can be regarded as a site server.
  • the data processing device 32 may send a streaming instruction to each acquisition device in the acquisition array 31 through a wireless local area network, and each acquisition device in the acquisition array 31 will obtain a streaming instruction based on the streaming instruction sent by the data processing device 32.
  • the video data stream of is transmitted to the data processing device 32 in real time.
  • each collection device in the collection array 31 can transmit the obtained video data stream to the data processing device 32 in real time through the switch 37.
  • the data processing device 32 When the data processing device 32 receives the video frame interception instruction, it intercepts the video frame at the specified frame time from the received multiple video data streams to obtain the frame images of multiple synchronized video frames, and the obtained specified specified The multiple synchronized video frames at the frame time are uploaded to the server cluster 33 in the cloud.
  • the cloud server cluster 33 uses the received frame images of multiple synchronized video frames as image combinations, determines the parameter data corresponding to the image combination and the depth data of each frame image in the image combination, and based on the image Combine the corresponding parameter data, the pixel data and depth data of the preset frame image in the image combination, and perform frame image reconstruction on the preset virtual viewpoint path to obtain the corresponding multi-angle free view video data, the multi-angle free view
  • the video data may include: multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted by frame time.
  • the server can be placed in the cloud, and in order to process data more quickly in parallel, a cloud server cluster 33 can be composed of multiple different servers or server groups according to different processing data.
  • the cloud server cluster 33 may include: a first cloud server 331, a second cloud server 332, a third cloud server 333, and a fourth cloud server 334.
  • the first cloud server 331 can be used to determine the corresponding parameter data of the image combination
  • the second cloud server 332 can be used to determine the depth data of each frame of the image in the image combination
  • the third cloud server 333 can be based on the The parameter data corresponding to the image combination, the pixel data and the depth data of the image combination are reconstructed using a depth image-based virtual view point reconstruction (DIBR) algorithm to reconstruct a preset virtual view point path
  • DIBR depth image-based virtual view point reconstruction
  • the fourth cloud server 334 may be used to generate a multi-angle free-view video, where the multi-angle free-view video data may include: multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted by frame time.
  • first cloud server 831, the second cloud server 832, the third cloud server 833, and the fourth cloud server 834 may also be a server group composed of a server array or server sub-clusters, which is not done in the embodiment of the present invention. limit.
  • the server cluster 33 in the cloud can store the pixel data and depth data of the image combination in the following manner:
  • a stitched image corresponding to the frame time is generated, the stitched image includes a first field and a second field, wherein the first field includes a preset frame image in the image combination
  • the second field includes the second field of the depth data of the preset frame image in the image combination.
  • the playback control device 34 may insert the received multi-angle free-view video data into the data stream to be played, and the playback terminal 35 receives the data stream to be played from the playback control device 34 and plays it in real time.
  • the playback control device 34 may be a manual playback control device or a virtual playback control device.
  • a dedicated server that can automatically switch video streams can be set as a virtual playback control device to control the data source.
  • a broadcast control device such as a broadcast control station can be used as a playback control device in the embodiment of the present invention.
  • the data processing device 32 can be placed in the on-site non-collection area or the cloud according to specific scenarios, and the server (cluster) and playback control device can be placed in the on-site non-collection area, cloud or terminal access according to the specific scenario.
  • this embodiment is not used to limit the specific implementation and protection scope of the present invention.
  • the interactive interface schematic diagram of the interactive terminal 40 has a progress bar 41 on the interactive interface of the interactive terminal 40.
  • the interactive terminal 40 can compare the designated frame time received from the data processing device 32 with The progress bar is associated, and several interactive identifiers, such as interactive identifiers 42 and 43, can be generated on the progress bar 41.
  • the black segment of the progress bar 41 is the played portion 41a
  • the blank segment of the progress bar 41 is the unplayed portion 41b.
  • the interface of the interactive terminal 40 may display interactive prompt information. For example, when the user selects an operation to trigger the current interactive mark 43, the interactive terminal 40 generates an image reconstruction instruction corresponding to the interactive frame time of the interactive mark 43 after receiving the feedback, and sends the image reconstruction instruction containing the interactive frame time information to the Cloud server cluster 33.
  • the interactive terminal 40 can continue to read subsequent video data, and the played portion 41a on the progress bar continues to advance.
  • the user can also choose to trigger the historical interaction mark while watching, for example, to trigger the interaction mark 42 displayed in the played part 41a on the progress bar, and the interactive terminal 40 generates an image reconstruction instruction at the interaction frame time corresponding to the interaction mark 42 after receiving the feedback.
  • the server cluster 33 in the cloud receives the image reconstruction instruction from the interactive terminal 40, the spliced image of the preset frame image in the corresponding image combination and the corresponding parameter data of the corresponding image combination can be extracted and transmitted to the interactive terminal 40 .
  • the interactive terminal 40 determines the interactive frame time information based on the trigger operation, sends an image reconstruction instruction containing the interactive frame time information to the server, and receives the stitched image of the preset frame image in the image combination corresponding to the interactive frame time returned from the server cluster 33 in the cloud And corresponding parameter data, and determine the virtual viewpoint position information based on interactive operations, select the corresponding pixel data and depth data and corresponding parameter data in the stitched image according to preset rules, and combine the selected pixel data and depth data for rendering , The multi-angle free-view video data corresponding to the virtual viewpoint position at the time of the interaction frame is reconstructed and played.
  • each collection device in the collection array can be connected to the data processing device through a switch and/or a local area network, and the number of playback terminals and interactive terminals can be one or more.
  • the interactive terminal may be the same terminal device, the data processing device may be placed in an on-site non-collection area or in the cloud according to specific scenarios, and the server may be placed in an on-site non-collection area, in the cloud or on the terminal access side, according to specific scenarios.
  • the embodiments are not used to limit the specific implementation and protection scope of the present invention.
  • the embodiment of the present invention also provides a server corresponding to the above-mentioned data processing method.
  • a server corresponding to the above-mentioned data processing method.
  • the server 50 may include:
  • the data receiving unit 51 is adapted to receive frame images of multiple synchronized video frames uploaded by the data processing device as an image combination;
  • the parameter data calculation unit 52 is adapted to determine parameter data corresponding to the image combination
  • the depth data calculation unit 53 is adapted to determine the depth data of each frame of the image in the image combination
  • the video data acquisition unit 54 is adapted to perform frame image reconstruction on the preset virtual viewpoint path based on the parameter data corresponding to the image combination, the pixel data and the depth data of the preset frame image in the image combination, to obtain the corresponding multiple Angle free-view video data, wherein the multi-angle free-view video data includes: multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted according to frame time.
  • the first data transmission unit 55 is adapted to insert the multi-angle free-view video data into the to-be-played data stream of the playback control device and play it through the playback terminal.
  • the plurality of synchronized video frames may be obtained by the data processing device based on the video frame interception instruction, which is obtained by intercepting the video frame at the specified frame time in the multiple video data streams synchronized in real time and uploaded from different locations in the field collection area. , The shooting angles of the multiple synchronized video frames are different.
  • the server can be placed in a non-collection area on site, in the cloud or on the terminal access side according to specific scenarios.
  • the multi-angle free-view video data of the data stream to be played inserted into the playback control device can be retained in the playback terminal to facilitate time-shift viewing by the user, where the time-shift can be a pause or rewind when the user is watching , Fast forward to the current moment and other operations.
  • the video data acquiring unit 54 may include:
  • the data mapping subunit 541 is adapted to combine the preset frame images in the image combination according to the relationship between the virtual parameter data of each virtual viewpoint in the preset virtual viewpoint path and the parameter data corresponding to the image combination The depth data of are respectively mapped to the corresponding virtual viewpoints;
  • the data reconstruction subunit 542 is adapted to reconstruct the frame image according to the pixel data and depth data of the preset frame image respectively mapped to the corresponding virtual viewpoint, and the preset virtual viewpoint path, to obtain the corresponding multi-angle free-view video data .
  • the server 50 may further include:
  • the stitched image generating unit 56 is adapted to generate a stitched image corresponding to the image combination based on the pixel data and depth data of the preset frame image in the image combination.
  • the stitched image may include a first field and a second field, wherein , The first field includes pixel data of a preset frame image in the image combination, and the second field includes depth data of a preset frame image in the image combination;
  • the data storage unit 57 is adapted to store the stitched image of the image combination and the parameter data corresponding to the image combination.
  • the server 50 may further include:
  • the data extraction unit 58 is adapted to determine the interactive frame time information at the interactive time based on the received image reconstruction instruction from the interactive terminal, and extract the spliced image of the preset frame image and the corresponding image combination corresponding to the corresponding image combination at the interactive frame time Parameter data;
  • the second data transmission unit 59 is adapted to transmit the spliced image and corresponding parameter data extracted by the data extraction unit 58 to the interactive terminal, so that the interactive terminal determines the virtual viewpoint position information based on the interactive operation, Select the corresponding pixel data, depth data and corresponding parameter data in the stitched image according to preset rules, combine the selected pixel data and depth data for rendering, and reconstruct the multi-angle free viewing angle corresponding to the virtual viewpoint position at the time of the interaction frame Video data and play it.
  • the multi-angle free-view video data corresponding to the position of the virtual viewpoint to be interacted can be generated at any time based on the image reconstruction instruction from the interactive terminal, which can further enhance the user interaction experience.
  • the embodiments of the present invention also provide a data interaction method and data processing system, which can obtain the data stream to be played from the playback control device in real time and perform real-time playback and display.
  • Each interaction identifier in the data stream to be played and the designation of the video data Frame time is associated, and then, in response to a trigger operation on an interactive indicator, the interaction data corresponding to the specified frame time of the interactive indicator can be obtained.
  • the interactive data may include multi-angle free view data, it can be based on the Interactive data is displayed in a multi-angle free view of the specified frame time.
  • the interaction data can be acquired according to the trigger operation of the interaction identifier, and then the multi-angle free perspective display can be performed to enhance the user interaction experience.
  • S61 Acquire a data stream to be played from the playback control device in real time and perform real-time playback and display.
  • the data stream to be played includes video data and interactive identifiers, and each interactive identifier is associated with a designated frame moment of the data stream to be played.
  • the designated frame time may be in units of frames, the Nth to Mth frames are regarded as the designated frame time, N and M are integers not less than 1, and N ⁇ M; or, the designated frame time may also be in time As the unit, take X to Y seconds as the designated frame time, X and Y are positive numbers, and X ⁇ Y.
  • the data stream to be played may be associated with a number of designated frame moments, and the playback control device may generate an interactive identifier corresponding to each designated frame moment based on the information of each designated frame moment, so as to play and display in real time.
  • the corresponding interactive logo can be displayed at the specified frame time.
  • each interactive identifier and video data can be associated in different ways according to actual conditions.
  • the data stream to be played may include several frame moments corresponding to the video data. Since each interactive identifier also has a corresponding designated frame time, it can match the designated frame corresponding to each interactive identifier.
  • the time information and the information of each frame time in the data stream to be played can associate the frame time of the same information with the interactive identifier, so that when the data stream to be played is displayed in real time and the corresponding frame time is reached, Display the corresponding interactive logo.
  • the data stream to be played includes N frame moments, and the playback control device generates corresponding M interaction identifiers based on the information of the M designated frame moments. If the information of the i-th frame time is the same as the information of the j-th designated frame time, then the i-th frame time can be associated with the j-th interactive identifier, and when the real-time display proceeds to the i-th frame time, The j-th interactive logo can be displayed, where i is a natural number not greater than N, and j is a natural number not greater than M.
  • each interactive data corresponding to each designated frame time can be stored in a preset storage device. Since the interaction identifier and the designated frame time have a corresponding relationship, the interactive terminal can be triggered by performing a trigger operation. The interactive identification displayed by the interactive terminal can obtain the designated frame time corresponding to the triggered interactive identification according to the trigger operation on the interactive identification. In this way, the interaction data at the specified frame time corresponding to the triggered interaction identifier can be acquired.
  • a preset storage device may store M pieces of interactive data, where the M pieces of interactive data respectively correspond to M designated frame moments, and the M designated frame moments correspond to M interactive identifiers.
  • the designated frame time Ti corresponding to the interaction identifier Pi can be obtained according to the triggered interaction identifier Pi.
  • the interaction data of the specified frame time Ti corresponding to the obtained interaction identifier Pi is acquired.
  • i is a natural number.
  • the trigger operation may be a trigger operation input by the user, or a trigger operation automatically generated by the interactive terminal.
  • the preset storage device can be placed in a non-collection area on site, in the cloud or on the terminal access side.
  • the preset storage device may be a data processing device, server, or interactive terminal in the embodiment of the present invention, or an edge node device located on the side of the interactive terminal, such as a base station, a set-top box, a router, a home data center server, and a hotspot. Equipment, etc.
  • S63 Based on the interactive data, perform image display of a multi-angle free view at the specified frame time.
  • an image reconstruction algorithm may be used to perform image reconstruction on the multi-angle free view data of the interactive data, and then perform the multi-angle free view image display at the specified frame time.
  • the designated frame time is one frame time, a static image with a multi-angle free view can be displayed; if the designated frame time corresponds to multiple frame times, a dynamic image with a multi-angle free view can be displayed.
  • the interactive data can be acquired according to the trigger operation on the interactive identifier, and then the multi-angle free perspective display can be performed to enhance the user interaction experience.
  • the multi-angle free view data may be generated based on multiple frame images corresponding to the received specified frame moment, and the multiple frame images are synchronously collected by the data processing device on the multiple collection devices in the collection array. Multiple video data streams are intercepted at the designated frame time, and the multi-angle free view data may include pixel data, depth data, and parameter data of the multiple frame images, wherein the pixel data of each frame image and There is an association relationship between the depth data.
  • the pixel data of the frame image may be any of YUV data or RGB data, or may also be other data capable of expressing the frame image.
  • the depth data may include depth values corresponding to the pixel data of the frame image one-to-one, or may be a partial value selected from a depth value set that corresponds to the pixel data of the frame image one-to-one.
  • the specific selection method of the depth data depends on the specific situation.
  • the parameter data corresponding to the multiple frame images can be obtained through a parameter matrix
  • the parameter matrix can include an internal parameter matrix, an external parameter matrix, a rotation matrix and a translation matrix.
  • the SFM algorithm can be used to perform feature extraction, feature matching, and global optimization on the acquired multiple frame images based on the parameter matrix, and the obtained parameter estimation values are used as the corresponding parameter data of the multiple frame images.
  • the specific algorithms used in the process of feature extraction, feature matching and global optimization can be referred to the previous introduction.
  • the depth data of each frame image may be determined based on the multiple frame images.
  • the depth data may include depth values corresponding to pixels of each frame of image.
  • the distance from the collection point to each point in the scene can be used as the aforementioned depth value, and the depth value can directly reflect the geometric shape of the visible surface in the area to be viewed.
  • the depth value may be the distance from each point in the scene to the optical center along the shooting optical axis.
  • a binocular stereo vision algorithm may be used to calculate the depth data from each frame of image.
  • the depth data can also be indirectly estimated by analyzing the features such as the luminosity feature, the light and dark feature of the frame image.
  • the MVS algorithm can be used to reconstruct the frame image, the pixels of each frame image can be matched, the three-dimensional coordinates of each pixel point can be reconstructed, the points with image consistency can be obtained, and then the calculation The depth data of each frame image.
  • the pixel points of the selected frame image may be matched, and the three-dimensional coordinates of the pixel points of each selected frame image may be reconstructed to obtain points with image consistency, and then the depth data of the corresponding frame image may be calculated.
  • the pixel data of the frame image corresponds to the calculated depth data.
  • the method of selecting the frame image can be set according to the specific situation. For example, the distance between the frame image of the depth data and other frame images can be calculated according to the needs, and the selected part Frame image.
  • the data processing device may intercept frame-level synchronized video frames at the specified frame time in the multiple video data streams based on the received video frame interception instruction.
  • the video frame interception instruction may include frame time information for intercepting a video frame
  • the data processing device intercepts corresponding information from multiple video data streams according to the frame time information in the video frame interception instruction The video frame at the frame moment.
  • the data processing device sends the frame time information in the video frame interception instruction to the playback control device, and the playback control device can obtain the corresponding designated frame time according to the received frame time information, and according to the received frame time The information generates the corresponding interactive logo.
  • multiple collection devices in the collection array are placed at different locations in the field collection area according to a preset multi-angle free viewing angle range, and the data processing device can be placed in a field non-collection area or in the cloud.
  • the multi-angle free viewing angle may refer to the spatial position and viewing angle of the virtual viewpoint that enables the scene to be freely switched.
  • the multi-angle free viewing angle can be a 6-degree-of-freedom (6DoF) viewing angle, where the spatial position of the virtual viewpoint can be expressed as (x, y, z), and the viewing angle can be expressed as three rotation directions A total of 6 degrees of freedom directions are used as a 6 degree of freedom (6DoF) viewing angle.
  • the multi-angle free viewing angle range can be determined according to the needs of the application scene.
  • the playback control device may generate an interactive identifier associated with the video frame at the corresponding moment in the data stream to be played based on the frame time information of the intercepted video frame from the data processing device. For example, after receiving the video frame interception instruction, the data processing device sends the frame time information in the video frame interception instruction to the playback control device. Then, the playback control device can generate a corresponding interactive identifier based on the time information of each frame.
  • corresponding interactive data can be generated according to the objects displayed on site and the associated information of the displayed objects.
  • the interaction data may also include at least one of the following: field analysis data, information data of the collection object, information data of equipment associated with the collection object, information data of items deployed on site, and information data of logos displayed on site. Then, based on the interaction data, a multi-angle free perspective display can be performed to display richer interactive information to the user through a multi-angle free perspective, so that the user interaction experience can be further enhanced.
  • the interactive data can include not only multi-angle free perspective data, but also analysis data of the ball game, information data of a certain player, information data of the shoes worn by the player, information data of basketball, One or more of the information data of the on-site sponsor’s logo, etc.
  • step 63 in order to conveniently return to the data stream to be played after the image display is over, continue to refer to FIG. 6, after the step 63, it may further include:
  • switch when receiving an instruction to end the interaction, switch to the to-be-played data stream obtained in real time from the playback control device and perform real-time playback and display.
  • the multi-angle free-view image at the specified frame time when it is detected that the multi-angle free-view image at the specified frame time is displayed to the last image, it switches to the to-be-played data stream acquired in real time from the playback control device and performs real-time playback and display.
  • the multi-angle free-view image display based on the interaction data in step 63 may specifically include the following steps:
  • the virtual viewpoint is determined according to the interactive operation, the virtual viewpoint is selected from a multi-angle free view range, the multi-angle free view range is a range that supports switching of virtual viewpoints in the viewing area, and then the display is based on the virtual viewpoint An image for viewing the area to be viewed, the image being generated based on the interaction data and the virtual viewpoint.
  • a virtual viewpoint path may be preset, and the virtual viewpoint path may include several virtual viewpoints. Since the virtual viewpoint is selected from the range of multi-angle free viewing angles, the corresponding first virtual viewpoint can be determined according to the viewing angle of the image played and displayed during the interactive operation, and then the first virtual viewpoint can be started from the first virtual viewpoint according to the preset virtual viewpoint The corresponding images of each virtual viewpoint are displayed in sequence.
  • the DIBR algorithm may be used to combine the pixel data and depth data corresponding to the specified frame time of the triggered interactive mark according to the parameter data in the multi-angle free view data and the preset virtual viewpoint path Rendering, thereby realizing image reconstruction based on a preset virtual viewpoint path, and obtaining corresponding multi-angle free-view video data, and then starting from the first virtual viewpoint and sequentially displaying corresponding images in the order of the preset virtual viewpoints.
  • the obtained multi-angle free-view video data may include multi-angle free-view spatial data of images sorted according to the frame time, which can display multi-angle free-view static images; if said The designated frame moment corresponds to different frame moments, and the obtained multi-angle free-view video data can include multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted by frame time, which can display dynamic images with multi-angle free-view views , That is, the frame images of the video frames with multiple angles and free viewing angles are displayed.
  • the embodiment of the present invention also provides a system corresponding to the above-mentioned data interaction method.
  • a detailed description will be given below through specific embodiments with reference to the accompanying drawings.
  • the data processing system 70 may include: a collection array 71, a data processing device 72, a server 73, a playback control device 74, and an interactive terminal 75, wherein:
  • the collection array 71 may include a plurality of collection devices, which are placed at different positions in the field collection area according to a preset multi-angle free viewing angle range, and are suitable for real-time simultaneous collection of multiple video data streams and real-time uploading of videos. Data flows to the data processing device 72;
  • the data processing device 72 for the uploaded multiple video data streams, is adapted to intercept the multiple video data streams at the specified frame time according to the received video frame interception instruction to obtain the data corresponding to the specified frame time
  • Multiple frame images and frame time information corresponding to the specified frame time, and multiple frame images at the specified frame time and frame time information corresponding to the specified frame time are uploaded to the server 73, and the specified frame time
  • the frame time information of the frame time is sent to the playback control device 74;
  • the server 73 is adapted to receive the multiple frame images and the frame time information uploaded by the data processing device 72, and generate interactive data for interaction based on the multiple frame images.
  • the data includes multi-angle free view data, and the interaction data is associated with the frame time information;
  • the playback control device 74 is adapted to determine the designated frame time corresponding to the frame time information uploaded by the data processing device 72 in the data stream to be played, generate an interactive identifier associated with the designated frame time, and include all The to-be-played data stream of the interactive identifier is transmitted to the interactive terminal 75;
  • the interactive terminal 75 is adapted to play and display the video containing the interactive identifier in real time based on the received data stream to be played, and based on the triggering operation of the interactive identifier, obtain and store in the server 73 and correspond to the The interactive data at the specified frame time can be displayed in a multi-angle free-view image.
  • the locations of the data processing device and the server can be flexibly deployed according to user requirements.
  • the data processing equipment can be placed in a non-collection area or in the cloud.
  • the server can be placed in a non-collection area on site, on the cloud or terminal access side.
  • edge node devices such as base stations, set-top boxes, routers, home data center servers, and hotspot devices can all be used as locations.
  • the server is used to obtain multi-angle free view data.
  • the data processing device and the server can also be centrally arranged and work together as a server cluster to realize the rapid generation of multi-angle free-view data, so as to realize low-latency playback and real-time playback of multi-angle free-view videos interactive.
  • the interactive data can be acquired according to the trigger operation of the interactive identifier, and then the multi-angle free perspective display can be performed to enhance the user's interactive experience.
  • the multi-angle free viewing angle may refer to the spatial position and viewing angle of the virtual viewpoint that enables the scene to be freely switched. Moreover, the multi-angle free viewing angle range can be determined according to the needs of the application scene.
  • the multi-angle free viewing angle may be a 6-degree-of-freedom (6DoF) viewing angle.
  • the acquisition device itself may have the function of encoding and packaging, so that the original video data collected from the corresponding angle in real time can be encoded and packaged in real time.
  • the acquisition device can have a compression function.
  • the server 73 is adapted to generate the multi-angle free view data based on the received multiple frame images corresponding to the specified frame time, and the multi-angle free view data includes the data of the multiple frame images. Pixel data, depth data, and parameter data, where there is an association relationship between the pixel data and the depth data of each frame image.
  • the multiple collection devices in the collection array 71 can be placed in different locations in the field collection area according to the preset multi-angle free viewing angle range, and the data processing device 72 can be placed in the field non-collection area or in the cloud, so The server 73 can be placed in a non-collection area on site, in the cloud, or on the terminal access side.
  • the playback control device 74 is adapted to generate an interactive identifier associated with the corresponding video frame in the data stream to be played based on the frame information moment of the video frame intercepted by the data processing device 72.
  • the interactive terminal 75 is further adapted to switch to the to-be-played data stream obtained in real time from the playback control device 74 and perform real-time playback and display when the interaction end signal is detected.
  • the schematic structural diagram shows a basketball game playing application scenario, where the scene is the basketball court area on the left.
  • the data processing system 80 may include: a collection array 81 composed of collection devices, a data processing device 82, and a cloud
  • the server cluster 83 the playback control device 84 and the interactive terminal 85.
  • each collection device in the collection array 81 can be fan-shaped and placed in different positions of the field collection area according to the preset multi-angle free viewing angle range, and can simultaneously collect video data streams from corresponding angles in real time.
  • the collection equipment can also be set up in the ceiling area of the basketball stadium, on the basketball stand, and so on.
  • the collection devices can be arranged and distributed along a straight line, a fan shape, an arc line, a circle, or an irregular shape.
  • the specific arrangement can be set according to one or more factors such as the specific site environment, the number of acquisition equipment, the characteristics of the acquisition equipment, and the requirements for imaging effects.
  • the collection device may be any device with a camera function, for example, a common camera, a mobile phone, a professional camera, and the like.
  • the data processing device 82 may be placed in a non-collection area on site.
  • the data processing device 82 may send a streaming instruction to each collection device in the collection array 81 via a wireless local area network.
  • Each collection device in the collection array 81 transmits the obtained video data stream to the data processing device 82 in real time based on the streaming instruction sent by the data processing device 82.
  • each collection device in the collection array 81 can transmit the obtained video data stream to the data processing device 82 in real time through the switch 87.
  • Each collection device can compress the collected original video data in real time and transmit it to the data processing device in real time, so as to further save local area network transmission resources.
  • the data processing device 82 When the data processing device 82 receives the video frame interception instruction, it intercepts the video frames at the specified frame time from the received multiple video data streams to obtain the frame images corresponding to the multiple video frames and the frame images corresponding to the specified frame time. The frame time information, and upload the multiple frame images of the specified frame time and the frame time information corresponding to the specified frame time to the server cluster 83 in the cloud, and send the frame time information of the specified frame time to all ⁇ Play control device 84.
  • the video frame interception instruction may be manually issued by the user, or may be automatically generated by the data processing device.
  • the server can be placed in the cloud, and in order to process data faster in parallel, the cloud server 83 can be composed of multiple different servers or server groups according to different data processing.
  • the cloud server cluster 83 may include: a first cloud server 831, a second cloud server 832, a third cloud server 833, and a fourth cloud server 834.
  • the first cloud server 831 can be used to determine the parameter data corresponding to the multiple frame images
  • the second cloud server 832 can be used to determine the depth data of each frame image in the multiple frame images
  • the third cloud server 833 Based on the parameter data corresponding to the multiple frame images, the depth data and pixel data of the preset frame images in the multiple frame images, the DIBR algorithm may be used to reconstruct the frame image of the preset virtual viewpoint path
  • the four cloud servers 834 can be used to generate multi-angle free-view videos.
  • first cloud server 831, the second cloud server 832, the third cloud server 833, and the fourth cloud server 834 may also be a server group composed of a server array or server sub-clusters, which is not done in the embodiment of the present invention. limit.
  • the multi-angle free-view video data may include: multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted according to frame time.
  • the interactive data may include multi-angle free view data
  • the multi-angle free view data may include pixel data, depth data, and parameter data of a plurality of frame images, and there is an association relationship between the pixel data and the depth data of each frame image .
  • the server cluster 83 in the cloud can store the interactive data according to the specified frame time information.
  • the playback control device 84 may generate an interaction identifier associated with the specified frame time according to the frame time information uploaded by the data processing device, and transmit the data stream to be played containing the interaction identifier to the interactive terminal 85.
  • the interactive terminal 85 can play the display video in real time based on the received data stream to be played and display the interactive logo at the corresponding video frame moment.
  • the interactive terminal 85 can obtain the interactive data stored in the cloud server cluster 83 and corresponding to the specified frame time, so as to display images with a multi-angle free view.
  • the interaction terminal 85 detects the interaction end signal, it can switch to obtain the data stream to be played from the playback control device 84 in real time and perform real-time playback and display.
  • the data processing system 380 may include: a collection array 381, a data processing device 382, a playback control device 383, and an interactive terminal 384; among them:
  • the collection array 381 includes a plurality of collection devices, which are placed at different locations in the field collection area according to the preset multi-angle free viewing angle range, and are suitable for real-time simultaneous collection of multiple video data streams, and real-time upload of video data Flow to the data processing equipment;
  • the data processing device 382 for the uploaded multiple video data streams, is adapted to intercept the multiple video data streams at a specified frame time according to the received video frame interception instruction to obtain the data corresponding to the specified frame time Multiple frame images and frame time information corresponding to the specified frame time, and send the frame time information of the specified frame time to the playback control device 383;
  • the playback control device 383 is adapted to determine the designated frame time corresponding to the frame time information uploaded by the data processing device 382 in the data stream to be played, generate an interactive identifier associated with the designated frame time, and include all The to-be-played data stream of the interactive identifier is transmitted to the interactive terminal 384;
  • the interactive terminal 384 is adapted to play and display the video containing the interactive identifier in real time based on the received data stream to be played, and based on the triggering operation of the interactive identifier, obtain from the data processing device 382 corresponding to the The multiple frame images at the specified frame moment of the interactive mark, and based on the multiple frame images, generate interactive data for interaction, and then perform multi-angle free-view image display, wherein the interactive data includes multi-angle free Perspective data.
  • the data processing device can be flexibly deployed according to user requirements, for example, the data processing device can be placed in a non-collection area on site or in the cloud.
  • interactive data can be acquired according to the triggering operation of the interactive identifier, and then multi-angle free perspective display can be performed to enhance the user interaction experience.
  • the embodiment of the present invention also provides a terminal corresponding to the above-mentioned data interaction method.
  • the following describes in detail through specific embodiments with reference to the accompanying drawings.
  • the interactive terminal 90 may include:
  • the data stream acquiring unit 91 is adapted to acquire a data stream to be played from the playback control device in real time, the data stream to be played includes video data and an interactive identifier, and the interactive identifier is associated with a specified frame moment of the data stream to be played;
  • the play and display unit 92 is adapted to play and display the video and interactive identification of the data stream to be played in real time;
  • the interactive data obtaining unit 93 is adapted to obtain interactive data corresponding to the specified frame time in response to a trigger operation on the interactive identifier, and the interactive data includes multi-angle free view data;
  • the interactive display unit 94 is adapted to perform multi-angle free-view image display at the specified frame time based on the interactive data
  • the switching unit 95 is adapted to trigger switching to the data stream to be played acquired by the data stream acquiring unit 91 from the playback control device in real time from the playback control device when the interaction end signal is detected, and the playback and display unit 92 performs real-time playback and display .
  • the interactive data may be generated by the server and transmitted to the interactive terminal, or may be generated by the interactive terminal.
  • the interactive terminal can obtain the data stream to be played from the playing control device in real time, and can display the corresponding interactive identifier at the corresponding frame time.
  • FIG. 4 it is a schematic diagram of an interactive interface of an interactive terminal in an embodiment of the present invention.
  • the interactive terminal 40 obtains the data stream to be played in real time from the playback control device.
  • the first interactive indicator 42 can be displayed on the progress bar 41
  • the real-time playback display progresses to the first frame time T1.
  • the second interactive logo 43 can be displayed on the progress bar.
  • the black part of the progress bar is the played part
  • the white part is the unplayed part.
  • the trigger operation may be a trigger operation input by a user, or a trigger operation automatically generated by the interactive terminal.
  • the interactive terminal may automatically initiate a trigger operation when it detects the presence of an identifier of a multi-angle free viewpoint data frame.
  • the user triggers manually it can be the time information when the user chooses to trigger the interaction after the interactive terminal displays the interactive prompt information, or it can be the historical moment information when the interactive terminal receives the user operation to trigger the interaction, and the historical moment information can be the current playback moment. The previous moment information.
  • the interactive terminal system when the interactive terminal system reads the corresponding interactive mark 43 on the progress bar 41, the interactive prompt information can be displayed.
  • the interactive terminal 40 can continue to read subsequent videos Data, the played part of the progress bar 41 continues to advance.
  • the interactive terminal 40 When the user selects the trigger, the interactive terminal 40 generates an image reconstruction instruction at a specified frame time corresponding to the interactive identifier after receiving the feedback, and sends it to the server 73.
  • the interactive terminal 40 when the user chooses to trigger the current interactive indicator 43, the interactive terminal 40 generates an image reconstruction instruction corresponding to the specified frame time T2 of the interactive indicator 43 after receiving the feedback, and sends it to the server 73.
  • the server may send the interaction data corresponding to the designated frame time T2 according to the image reconstruction instruction.
  • the user can also choose to trigger the historical interaction mark while watching, for example, to trigger the interactive mark 42 displayed in the played part 41a on the progress bar.
  • the interactive terminal 40 After receiving the feedback, the interactive terminal 40 generates an image reconstruction instruction corresponding to the specified frame time T1 of the interactive mark 42 and sends it To the server 73.
  • the server may send interactive data corresponding to the designated frame time T1.
  • the interactive terminal 40 may use an image reconstruction algorithm to perform image processing on the multi-angle free view data of the interactive data, and then perform image display of the multi-angle free view at the specified frame time. If the designated frame time is one frame time, then a static image with a multi-angle free view is displayed; if the designated frame time corresponds to multiple frame times, then a dynamic image with a multi-angle free view is displayed.
  • the interactive terminal system when the interactive terminal system reads the corresponding interactive mark 43 on the progress bar 41, the interactive prompt information can be displayed.
  • the interactive terminal 40 can continue to read subsequent videos Data, the played part of the progress bar 41 continues to advance.
  • the interactive terminal 40 When the user selects the trigger, the interactive terminal 40 generates an image reconstruction instruction at a specified frame time corresponding to the interactive identifier after receiving the feedback, and sends it to the data processing device 382.
  • the interactive terminal 40 when the user chooses to trigger the current interactive indicator 43, the interactive terminal 40 generates an image reconstruction instruction corresponding to the specified frame time T2 of the interactive indicator 43 after receiving the feedback, and sends it to the data processing device.
  • the data processing device 382 can send multiple frame images corresponding to the specified frame time T2 according to the image reconstruction instruction.
  • the user can also choose to trigger the historical interaction mark while watching, for example, to trigger the interactive mark 42 displayed in the played part 41a on the progress bar.
  • the interactive terminal 40 After receiving the feedback, the interactive terminal 40 generates an image reconstruction instruction corresponding to the specified frame time T1 of the interactive mark 42 and sends it To the data processing equipment.
  • the data processing device can send multiple frame images corresponding to the designated frame time T1 according to the image reconstruction instruction.
  • the interactive terminal 40 may generate interactive data for interaction based on the multiple frame images, and may use an image reconstruction algorithm to perform image processing on the multi-angle free view data of the interactive data, and then perform image processing at the specified frame time.
  • Multi-angle and free viewing angle image display If the designated frame time is one frame time, then a static image with a multi-angle free view is displayed; if the designated frame time corresponds to multiple frame times, then a dynamic image with a multi-angle free view is displayed.
  • the interactive terminal of the embodiment of the present invention may be an electronic device with touch screen function, a head-mounted virtual reality (VR) terminal, an edge node device connected to a display, and an IoT (The Internet of Things (Internet of Things) equipment.
  • VR virtual reality
  • IoT The Internet of Things (Internet of Things) equipment.
  • FIG. 40 it is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.
  • the interactive terminal is an electronic device 400 with a touch screen function.
  • the electronic The interface of the device 400 may display an interactive prompt message box 403.
  • the user can make a selection according to the content of the interactive prompt information box 403.
  • the electronic device 400 can generate an image reconstruction instruction at the time of the interactive frame corresponding to the interactive identifier 402 after receiving the feedback.
  • the electronic device 400 can continue to read subsequent video data.
  • FIG. 41 it is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.
  • the interactive terminal is a head-mounted VR terminal 410.
  • the head-mounted The interface of the VR terminal 410 may display an interactive prompt information box 413.
  • the user can make a selection according to the content of the interactive prompt information box 413.
  • the head-mounted VR terminal 410 can generate an interactive frame corresponding to the interactive identifier 412 after receiving the feedback.
  • the image reconstruction instruction at the moment, when the user makes a non-triggering operation (for example, shaking the head) of selecting “No”, the head-mounted VR terminal 410 can continue to read subsequent video data.
  • FIG. 42 it is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.
  • the interactive terminal is an edge node device 421 connected to a display 420.
  • the display 420 may display an interactive prompt message box 424.
  • the user can make a selection according to the content of the interactive prompt information box 424.
  • the edge node device 421 can generate the image reconstruction instruction at the interactive frame time corresponding to the interactive identifier 423 after receiving the feedback.
  • the edge node device 421 may continue to read subsequent video data.
  • the interactive terminal may establish a communication connection with at least one of the above-mentioned data processing device and server, and may adopt a wired connection or a wireless connection.
  • FIG. 43 it is a schematic diagram of the connection of an interactive terminal in an embodiment of the present invention.
  • the edge node device 430 establishes a wireless connection with the interactive devices 431, 432, and 433 through the Internet of Things.
  • the interactive terminal after the interactive terminal triggers the interactive identifier, it can display images of a multi-angle free view at the specified frame time corresponding to the triggered interactive identifier, and determine the virtual viewpoint position information based on the interactive operation, as shown in FIG. 44, as A schematic diagram of the interactive operation of an interactive terminal in an embodiment of the present invention.
  • the user can operate horizontally or vertically on the interactive operation interface, and the operation track can be a straight line or a curve.
  • FIG. 45 it is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention. After the user clicks on the interactive identifier, the interactive terminal obtains the interactive data at the specified frame time of the interactive identifier.
  • the triggering operation is an interactive operation, and the corresponding first virtual viewpoint can be determined according to the viewing angle of the image displayed during the interactive operation. If the user takes a new operation, the new operation is an interactive operation, and the corresponding first virtual viewpoint can be determined according to the viewing angle of the image displayed during the interactive operation.
  • the obtained multi-angle free-view video data may include multi-angle free-view spatial data of images sorted according to the frame time, and can display static images with multiple-angle free-view views; if the designated frame Time corresponds to different frame moments, and the obtained multi-angle free-view video data can include multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted by frame time, which can display dynamic images with multi-angle free-view angles, namely It shows the frame images of video frames with multiple angles and free viewing angles.
  • the multi-angle free-view video data obtained by the interactive terminal may include multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted by frame time.
  • the user swipes horizontally to the right to generate an interactive operation and determine the corresponding first Virtual viewpoint, and because different virtual viewpoints can correspond to different multi-angle free-view spatial data and multi-angle free-view time data, as shown in Figure 46, the frame images displayed in the interactive interface occur in time and space with the interactive operation
  • the content displayed in the frame image has changed from the athlete in Figure 45 toward the finish line to the athlete in Figure 46 who is about to cross the finish line, and with the athlete as the target object, the perspective of the frame image has changed from the left view to the front view. .
  • Figures 45 and 47 The content displayed in the frame images changes from the athlete in Figure 45 to the finish line.
  • the athlete in Figure 47 has crossed the finish line.
  • the perspective of the frame image is from the left.
  • the view becomes the right view.
  • Figures 45 and 48 can be obtained.
  • the user slides up and vertically to generate interactive operations.
  • the content of the frame image changes from the athlete in Figure 45 to the finish line to the athlete in Figure 48 that has crossed the finish line, and the athlete is the target object.
  • the viewing angle of the frame image is changed from the left view to the top view.
  • the interactive data may also include at least one of the following: field analysis data, information data of the collection object, information data of the equipment associated with the collection object, information data of the items deployed on site, and logo information displayed on site. Information data.
  • FIG. 10 is a schematic diagram of an interactive interface of another interactive terminal in an embodiment of the present invention.
  • the interactive terminal 100 triggers the interactive identification, it can display images with a multi-angle free view at the specified frame time corresponding to the triggered interactive identification, and can superimpose on-site analysis data on the image (not shown), as shown in FIG. 10 Field analysis data 101 is shown.
  • FIG. 11 is a schematic diagram of another interactive interface of the interactive terminal in the embodiment of the present invention.
  • the interactive terminal 110 can display images with a multi-angle free view at the specified frame time corresponding to the triggered interactive identification, and can superimpose the information data of the collection object on the image (not shown), as shown in FIG.
  • the information data 111 of the collection object in 11 is shown.
  • FIG. 12 is a schematic diagram of another interactive interface of the interactive terminal in the embodiment of the present invention.
  • the interactive terminal 120 can display images with a multi-angle free view at the specified frame time corresponding to the triggered interactive identification, and can superimpose the information data of the collection object on the image (not shown), as shown in FIG.
  • the information data 121-123 of the collection object in 12 are shown.
  • FIG. 13 is a schematic diagram of another interactive interface of the terminal in the embodiment of the present invention.
  • the interactive terminal 130 can display images with a multi-angle free view at a specified frame time corresponding to the triggered interactive identification, and can superimpose the information data of the items deployed on-site on the image (not shown).
  • the information data 131 of the file package in FIG. 13 is shown.
  • FIG. 14 is a schematic diagram of another interactive interface of the terminal in the embodiment of the present invention.
  • the interactive terminal 140 triggers the interactive logo, it can display images with a multi-angle free view at the specified frame time corresponding to the triggered interactive logo, and can superimpose the information data of the logo displayed on-site on the image (not shown), such as The logo information data 141 in FIG. 14 is shown.
  • the user can obtain more relevant interactive information through the interactive data, and have a more in-depth, comprehensive, and professional understanding of the content being watched, thereby further enhancing the user's interactive experience.
  • the interactive terminal 390 may include: a processor 391, a network component 392, a memory 393, and a display component 394; among them:
  • the processor 391 is adapted to obtain the data stream to be played in real time through the network component 392, and in response to a triggering operation on an interactive indicator, obtain the interactive data corresponding to the specified frame time of the interactive indicator, wherein the to-be-played data stream is
  • the play data stream includes video data and an interactive identifier, the interactive identifier is associated with a designated frame moment of the data stream to be played, and the interactive data includes multi-angle free view data;
  • the memory 393 is suitable for storing the data stream to be played obtained in real time
  • the display component 394 is suitable for real-time playback of the video and interactive logo showing the data stream to be played based on the data stream to be played acquired in real time, and to perform a multi-angle free view at the specified frame time based on the interactive data Image display.
  • the interactive terminal 390 may obtain the interactive data at the specified frame time from the server that stores the interactive data, or obtain multiple frame images corresponding to the specified frame time from the data processing device that stores the frame images, and then generate corresponding Interactive data.
  • the multi-angle free viewing angle may refer to the spatial position and viewing angle of the virtual viewpoint that enables the scene to be freely switched. Moreover, the multi-angle free viewing angle range can be determined according to the needs of the application scene.
  • the preset bandwidth threshold can be determined according to the transmission capacity of the transmission network where each collection device in the collection array is located. For example, if the uplink bandwidth of the transmission network is 1000 Mbps, the preset bandwidth value may be 1000 Mbps.
  • S152 Receive a compressed video data stream transmitted in real time by each acquisition device in the acquisition array based on the pull instruction, where the compressed video data stream is real-time synchronous acquisition and data compression by each acquisition device in the acquisition array from a corresponding angle. obtain.
  • the capture device itself can have the function of encoding and packaging, so that the original video data collected from the corresponding angle can be encoded and packaged in real time.
  • the packaging format used by the capture device can be AVI, QuickTime File Format , MPEG, WMV, Real Video, Flash Video, Matroska, etc., or other packaging formats.
  • the encoding format used by the capture device can be H.261, H.263, H.264, H. 265, MPEG, AVS and other encoding formats, or other encoding formats.
  • the acquisition device can have a compression function. The higher the compression rate, the smaller the amount of compressed data can be made when the amount of data before compression is the same, which can relieve the bandwidth pressure of real-time synchronous transmission. Therefore, the acquisition device can use predictive coding. , Transform coding and entropy coding techniques to improve the compression rate of the video.
  • the sum of the bit rates of the compressed video data streams pre-transmitted by the collection devices in the collection array can be calculated by obtaining the values of the parameters of the collection devices and is not greater than the preset bandwidth threshold.
  • the parameters of the acquisition device may include: acquisition parameters and compression parameters, and each acquisition device in the acquisition array is obtained by synchronous acquisition and data compression in real time from a corresponding angle according to the set value of the parameters of each acquisition device.
  • the sum of the bit rates of the compressed video data stream is not greater than the preset bandwidth threshold.
  • the acquisition parameters and compression parameters are complementary to each other, when the value of the compression parameter is unchanged, the data size of the original video data can be reduced by setting the value of the acquisition parameter, so that the time of data compression processing is shortened; in the value of the acquisition parameter In the case of unchanged, setting the value of the compression parameter can correspondingly reduce the amount of compressed data, so that the data transmission time is shortened. For another example, setting a higher compression rate can save transmission bandwidth, and setting a lower sampling rate can also save transmission bandwidth. Therefore, the acquisition parameters and/or compression parameters can be set according to the actual situation.
  • the acquisition parameters may include focal length parameters, exposure parameters, resolution parameters, encoding rate parameters, and encoding format parameters, etc.
  • the compression parameters may include compression rate parameters, compression format parameters, etc., by setting the values of different parameters , To obtain the most suitable value for the transmission network where each collection device is located.
  • the collection devices in the collection array perform collection and data compression according to the values of the parameters that have been set. Whether the sum of the code rates of the compressed video data stream is greater than the preset bandwidth threshold, and when the sum of the obtained compressed video data streams’ code rates is greater than the preset bandwidth threshold, a pull request is sent to each collection device in the collection array. Before the flow command, the value of the parameter of each collection device in the collection array can be set. It is understandable that, in specific implementation, the value of the acquisition parameter and the value of the compression parameter can also be set according to imaging quality requirements such as the resolution of the multi-angle free-view image to be displayed.
  • the process from transmission to writing of the compressed video data stream obtained by each acquisition device occurs continuously. Therefore, before the pull instruction is sent to each acquisition device in the acquisition array, it is also possible to determine the Whether the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than the preset writing speed threshold, and the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array When it is greater than the preset writing speed threshold, the value of the parameter of each acquisition device in the acquisition array can be set, so that each acquisition device in the acquisition array is based on the set value of the parameter of each acquisition device from a corresponding angle The sum of the code rates of the compressed video data stream obtained by real-time synchronous collection and data compression is not greater than the preset writing speed threshold.
  • the preset writing speed threshold may be determined according to the data storage writing speed of the storage medium. For example, if the upper limit of the data storage writing speed of the solid state disk (Solid State Disk or Solid State Drive, SSD) of the data processing device is 100 Mbps, the preset writing speed threshold may be 100 Mbps.
  • the compressed video data stream obtained by each collection device can be stored.
  • the frame-level synchronized video frames in each compressed video data stream can be intercepted according to the received video frame interception instruction, and the intercepted video frames are synchronously uploaded to the designated target terminal.
  • the designated target terminal may be a preset target terminal, or may be a target terminal designated by a video frame interception instruction.
  • the captured video frame may be encapsulated first, and uploaded to the designated target terminal through a network transmission protocol, and then analyzed to obtain a frame-level synchronized video frame in the corresponding compressed video data stream.
  • the subsequent processing of the video frames intercepted by the compressed video data stream is handed over to the designated target end, which can save network transmission resources, reduce the pressure and difficulty of deploying a large number of server resources on site, and greatly reduce the data processing load. , To shorten the transmission delay of multi-angle free-view video frames.
  • S161 Determine one of the compressed video data streams of each acquisition device in the acquisition array received in real time as a reference data stream;
  • S163 Intercept video frames to be intercepted in each compressed video data stream.
  • the acquisition array may include 40 acquisition devices, and therefore, 40 channels of compressed video data streams can be received in real time. It is assumed that in the compressed video data streams of each acquisition device in the acquisition array received in real time, Determine the compressed video data stream A1 corresponding to the acquisition device A1' as the reference data stream, and then, based on the feature information X of the object in the video frame that is indicated in the received video frame interception instruction, determine that the reference data stream corresponds to the The video frame a1 with the same feature information X of the object is taken as the video frame to be intercepted, and then according to the feature information x1 of the object in the video frame a1 to be intercepted in the reference data stream, the remaining compressed video data streams A2-A40 are selected The video frames a2-a40 consistent with the feature information x1 of the object are used as video frames to be intercepted in the remaining compressed video data streams.
  • the feature information of the object may include at least one of shape feature information, color feature information, and position feature information.
  • the feature information X of the object in the video frame indicated to be intercepted in the video frame interception instruction may be the same as the feature information x1 of the object in the video frame a1 to be intercepted in the reference data stream.
  • the feature information X and x1 of the object are two-dimensional feature information; the feature information X of the object and the feature information x1 of the object can also be different representations of the feature information of the same object, for example, the feature of the object
  • the information X can be two-dimensional feature information, and the feature information x1 of the object can be three-dimensional feature information.
  • a similarity threshold can be preset.
  • the similarity threshold When the similarity threshold is met, it can be considered that the feature information X of the object is consistent with x1, or the feature information x1 of the object is the same as the feature information x2-x40 of the object in the other compressed video data streams A2-A40. Unanimous.
  • the specific representation mode and similarity threshold of the feature information of the object can be determined according to the preset multi-angle free viewing angle range and the scene of the scene, which is not limited in the embodiment of the present invention.
  • the acquisition array may contain 40 acquisition devices, therefore, 40 channels of compressed video data streams can be received in real time. It is assumed that the compressed video data streams of each acquisition device in the acquisition array are received in real time. , Determine the compressed video data stream B1 corresponding to the acquisition device B1' as the reference data stream, and then, based on the time stamp information Y indicating the intercepted video frame in the received video frame interception instruction, determine that the reference data stream corresponds to the The video frame b1 corresponding to the time stamp information Y is used as the video frame to be intercepted, and then according to the time stamp information y1 in the video frame b1 to be intercepted in the reference data stream, the remaining compressed video data streams B2-B40 and Video frames b2-b40 with the same time stamp information y1 are used as video frames to be intercepted in the remaining compressed video data streams.
  • the time stamp information Y of the video frame indicated to be intercepted in the video frame interception instruction may have a certain error with the time stamp information y1 in the video frame b1 to be intercepted in the reference data stream.
  • the The time stamp information corresponding to the video frame in the reference data stream is inconsistent with the time stamp information Y.
  • an error range can be preset. For example, if the error range is ⁇ 1ms, the 0.1ms error is within the error range Therefore, the video frame b1 corresponding to the time stamp information y1 that differs from the time stamp information Y by 0.1 ms can be selected as the video frame to be intercepted in the reference data stream.
  • the specific error range and the selection rule of the time stamp information y1 in the reference data stream can be determined according to the on-site collection equipment and transmission network, which is not limited in this embodiment.
  • the data processing device can smoothly and smoothly pull the data collected and compressed by each collection device.
  • the collection devices placed in different positions of the field collection area according to the preset multi-angle free viewing angle range in the collection array collect raw video data synchronously in real time from corresponding angles, and perform real-time data compression on the collected raw video data. Obtain the corresponding compressed video data stream.
  • the preset bandwidth threshold can be determined according to the transmission capacity of the transmission network where each collection device in the collection array is located. For example, if the uplink bandwidth of the transmission network is 1000 Mbps, the preset bandwidth value can be 1000 Mbps.
  • each acquisition device in the acquisition array transmits the obtained compressed video data stream to the data processing device in real time.
  • the data processing device can be set according to actual scenarios. For example, when there is a suitable space on site, the data processing device can be placed in a non-collection area on site and used as a field server; when there is no suitable space on site, the data processing device can be placed in the cloud and used as a cloud server.
  • the data processing device connected to the collection array link determines that the sum of the code rates of the compressed video data streams pre-transmitted by the collection devices in the collection array is not greater than the preset bandwidth threshold, they send the Each collection device in the collection array sends a streaming instruction so that the data collected and compressed by each collection device can be transmitted synchronously in real time, so that real-time streaming can be performed through the transmission network where it is located, and data transmission congestion during the streaming process can be avoided; Then, each acquisition device in the acquisition array transmits the obtained compressed video data stream to the data processing device in real time based on the pull instruction. Since the data transmitted by each acquisition device is compressed, the bandwidth pressure of real-time synchronous transmission can be relieved , To speed up the processing speed of multi-angle free-view video data.
  • the data processing device may first determine the performance of each collection device in the collection array according to the set parameters. Whether the sum of the code rates of the compressed video data stream obtained by data collection and data compression is greater than the preset bandwidth threshold, when the sum of the code rates of the obtained compressed video data stream is greater than the preset bandwidth threshold, the data processing device can set The value of the parameter of each acquisition device in the acquisition array is then sent to each acquisition device in the acquisition array respectively to pull the current instruction.
  • the process from transmission to writing of the compressed video data stream obtained by each acquisition device occurs continuously, and it is also necessary to ensure that the data processing equipment is unblocked when writing the compressed video data stream obtained by each acquisition device.
  • the data processing device may also determine whether the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than a preset writing speed threshold , And when the sum of the bit rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than the preset writing speed threshold, the data processing device may set the value of the parameter of each acquisition device in the acquisition array , So that each acquisition device in the acquisition array, according to the set values of the parameters of each acquisition device, from a corresponding angle in real-time synchronous acquisition and data compression, the sum of the code rates of the compressed video data stream obtained is not greater than the preset Write speed threshold.
  • the preset writing speed threshold may be determined according to the data storage writing speed of the data processing device.
  • data can be transmitted between each collection device in the collection array and the data processing device through at least one of the following methods:
  • Each collection device in the collection array is connected to the data processing device through a switch.
  • the switch can aggregate and uniformly transmit the compressed video data streams of more collection devices to the data processing device, which can reduce the number of ports supported by the data processing device. Quantity.
  • the switch supports 40 inputs, so the data processing device can simultaneously receive the video stream of the collection array composed of 40 collection devices through the switch, thereby reducing the number of data processing devices.
  • Each acquisition device in the acquisition array is connected to the data processing device through a local area network.
  • the local area network can transmit the compressed video data stream of the acquisition device to the data processing device in real time, reducing the number of ports supported by the data processing device, thereby reducing data The number of processing equipment.
  • the data processing device may store the compressed video data stream obtained by each collection device (may be a buffer), and when the video frame interception instruction is received, the data processing device may store the compressed video data stream according to the received video frame
  • the video frame interception instruction intercepts the frame-level synchronized video frames in each compressed video data stream, and synchronously uploads the intercepted video frames to the designated target terminal.
  • the data processing device may establish a connection with a target terminal through a port or an IP address in advance, and may also synchronously upload the captured video frame to the port or IP address specified by the video frame capture instruction.
  • the data processing device may first encapsulate the captured video frame and upload it to the designated target terminal through a network transmission protocol, and then perform analysis to obtain frame-level synchronized video frames in the corresponding compressed video data stream.
  • the compressed video data stream obtained by the real-time synchronous collection and data compression of each collection device in the collection array can be uniformly transmitted to the data processing device.
  • the data processing device intercepts the frame through dots.
  • the frame-level synchronized video frames of each intercepted compressed video data stream can be synchronously uploaded to the designated target end, and the subsequent processing of the video frames intercepted by the compressed video data stream can be handed over to the designated target end, Therefore, network transmission resources can be saved, the pressure and difficulty of on-site deployment can be reduced, the data processing load can be greatly reduced, and the transmission delay of multi-angle free-view video frames can be shortened.
  • the data processing device may first determine one of the compressed video data streams of each acquisition device in the acquisition array received in real time. Stream as a reference data stream, and then, the data processing device may determine the video frame to be intercepted in the reference data stream based on the received video frame interception instruction, and select the video frame to be intercepted in the reference data stream. The video frames in the remaining compressed video data streams synchronized with the video frames are used as video frames to be intercepted in the remaining compressed video data streams. Finally, the data processing device intercepts the video frames to be intercepted in each compressed video data stream.
  • the specific frame cutting method please refer to the example of the foregoing embodiment, which will not be repeated here.
  • the embodiment of the present invention also provides a data processing device corresponding to the data processing method in the above-mentioned embodiment.
  • a data processing device corresponding to the data processing method in the above-mentioned embodiment.
  • the data processing device 180 may include:
  • the first transmission matching unit 181 is adapted to determine whether the sum of the code rates of the compressed video data streams pre-transmitted by the collection devices in the collection array is not greater than a preset bandwidth threshold, wherein each collection device in the collection array is based on a preset
  • the multi-angle free viewing angle range is placed at different locations in the field collection area.
  • the instruction sending unit 182 is adapted to send a pull signal to each acquisition device in the acquisition array when it is determined that the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is not greater than a preset bandwidth threshold. Stream instructions.
  • the data stream receiving unit 183 is adapted to receive a compressed video data stream transmitted in real time by each acquisition device in the acquisition array based on the pull stream instruction, and the compressed video data stream is a corresponding angle for each acquisition device in the acquisition array. Real-time synchronous acquisition and data compression are obtained.
  • the transmission bandwidth matches which can avoid data transmission congestion during the streaming process, so that the data collected and compressed by each acquisition device can be Real-time synchronous transmission speeds up the processing speed of multi-angle free-view video data, realizes multi-angle free-view video with limited bandwidth resources and data processing resources, and reduces implementation costs.
  • the data processing device 180 may further include:
  • the first parameter setting unit 184 is adapted to set the value of the parameter of each collection device in the collection array before sending a current drawing instruction to each collection device in the collection array respectively;
  • the parameters of the acquisition device may include: acquisition parameters and compression parameters, and each acquisition device in the acquisition array is obtained by synchronous acquisition and data compression in real time from a corresponding angle according to the set value of the parameters of each acquisition device.
  • the sum of the bit rates of the compressed video data stream is not greater than the preset bandwidth threshold.
  • the data processing device 180 may further include:
  • the second transmission matching unit 185 is adapted to determine the compressed video obtained by collecting and data compression of each collecting device in the collecting array according to the set parameter value before setting the value of the parameter of each collecting device in the collecting array Whether the sum of the bit rates of the data stream is not greater than the preset bandwidth threshold.
  • the data processing device 180 may further include:
  • the writing matching unit 186 is adapted to determine whether the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array is greater than a preset writing speed threshold;
  • the second parameter setting unit 187 is adapted to set the data rate of each acquisition device in the acquisition array when the sum of the bit rates of the compressed video data streams pre-transmitted by the acquisition devices in the acquisition array is greater than a preset writing speed threshold.
  • the value of the parameter so that each acquisition device in the acquisition array, according to the set value of the parameter of each acquisition device, from a corresponding angle in real-time synchronous acquisition and data compression, the sum of the code rates of the compressed video data stream obtained is not greater than the said The preset write speed threshold.
  • the data processing device 180 may further include:
  • the frame interception processing unit 188 is adapted to intercept the frame-level synchronized video frames in each compressed video data stream according to the received video frame interception instruction;
  • the uploading unit 189 is adapted to synchronously upload the captured video frames to the designated target terminal.
  • the designated target terminal may be a preset target terminal, or may be a target terminal designated by a video frame interception instruction.
  • the frame cutting processing unit 188 may include:
  • the reference data stream selection subunit 1881 is adapted to determine one of the compressed video data streams of each acquisition device in the acquisition array received in real time as a reference data stream;
  • the video frame selection subunit 1882 is adapted to determine the video frame to be intercepted in the reference data stream based on the received video frame interception instruction, and select the remaining video frames that are synchronized with the video frame to be intercepted in the reference data stream The video frame in each compressed video data stream is used as the video frame to be intercepted in the remaining compressed video data streams;
  • the video frame interception subunit 1883 is adapted to intercept the video frames to be intercepted in each compressed video data stream.
  • the video frame selection subunit 1882 may include at least one of the following:
  • the first video frame selection module 18821 is adapted to select video frames consistent with the feature information of the object in the remaining compressed video data streams according to the feature information of the object in the video frame to be intercepted in the reference data stream, as Video frames to be intercepted of the remaining compressed video data streams;
  • the second video frame selection module 18822 is adapted to select video frames consistent with the time stamp information in the remaining compressed video data streams according to the time stamp information of the video frames to be intercepted in the reference data stream, as the remaining video frames Compress the video frame to be intercepted of the video data stream.
  • the embodiment of the present invention also provides a data processing system corresponding to the above-mentioned data processing method.
  • the above-mentioned data processing device is used to realize real-time reception of multiple compressed video data streams.
  • the data processing system 190 may include: an acquisition array 191 and a data processing device 192.
  • the acquisition array 191 includes a preset multi-angle free viewing angle.
  • Multiple collection devices located at different locations in the field collection area including:
  • Each collection device in the collection array 191 is adapted to collect raw video data synchronously in real time from corresponding angles, and perform real-time data compression on the collected raw image data respectively to obtain compressed video data streams synchronously collected in real time from corresponding angles, And based on the streaming instruction sent by the data processing device 192, the obtained compressed video data stream is transmitted to the data processing device 192 in real time;
  • the data processing device 192 is adapted to, when it is determined that the sum of the code rates of the compressed video data streams pre-transmitted by the collection devices in the collection array is not greater than a preset bandwidth threshold, respectively send data to each collection in the collection array 191.
  • the device sends a streaming instruction, and receives the compressed video data stream transmitted in real time by each collection device in the collection array 191.
  • the cable and SDI interface are used for data transmission and streaming through a common transmission network. With limited bandwidth resources and data processing resources, low-latency playback of multi-angle and free-view video is realized, and implementation costs are reduced.
  • the data processing device 192 is further adapted to set the value of the parameter of each collection device in the collection array before sending a pull command to each collection device in the collection array 191 respectively;
  • the parameters of the acquisition device include: acquisition parameters and compression parameters, and each acquisition device in the acquisition array, according to the set values of the parameters of each acquisition device, synchronizes acquisition and data compression in real time from a corresponding angle.
  • the sum of the bit rates of the video data streams is not greater than the preset bandwidth threshold.
  • the data processing device can set the value of the parameter of each acquisition device in the acquisition array to ensure that the value of the parameter of each acquisition device in the acquisition array is unified, and each acquisition device can synchronize in real time from the corresponding angle.
  • data compression, and the sum of the bit rate of the compressed video data stream obtained is not greater than the preset bandwidth threshold, so that network congestion can be avoided, and low-latency playback of multi-angle free-view video can also be achieved when bandwidth resources are limited. .
  • the data processing device 192 determines the compressed video data stream pre-transmitted by each acquisition device in the acquisition array 191 before sending a streaming instruction to each acquisition device in the acquisition array 191. Whether the sum of code rates is greater than the preset writing speed threshold, and when the sum of the code rates of the compressed video data streams pre-transmitted by each acquisition device in the acquisition array 191 is greater than the preset writing speed threshold, set the Collect the values of the parameters of the collection devices in the collection array 191, so that the collection devices in the collection array 192 can synchronously collect and compress the compressed video data stream obtained from the corresponding angle in real time according to the set values of the parameters of the collection devices. The sum of the bit rates is not greater than the preset writing speed threshold.
  • the data writing of the device is congested to ensure that the compressed video data stream is unblocked during the collection, transmission and writing process, so that the compressed video stream uploaded by each collection device can be processed in real time, thereby realizing the playback of multi-angle and free-view video .
  • each collection device in the collection array and the data processing device are adapted to be connected through a switch and/or a local area network.
  • the data processing system 190 may further include a designated target terminal 193.
  • the data processing device 192 is adapted to intercept frame-level synchronized video frames in each compressed video stream according to the received video frame interception instruction, and synchronously upload the intercepted video frames to the designated target terminal 193;
  • the designated target terminal 193 is adapted to receive the video frame obtained by the data processing device 192 based on the video frame interception instruction.
  • the data processing device may establish a connection with a target terminal through a port or an IP address in advance, and may also synchronously upload the captured video frame to the port or IP address specified by the video frame capture instruction.
  • the compressed video data stream obtained by the real-time synchronous collection and data compression of each collection device in the collection array can be uniformly transmitted to the data processing device.
  • the data processing device After receiving the video frame interception instruction, the data processing device will intercept the frame after receiving the video frame interception instruction.
  • the frame-level synchronized video frames of each intercepted compressed video data stream can be synchronously uploaded to the specified target end, and the subsequent processing of the video frames intercepted by the compressed video data stream can be handed over to the specified target end. It can save network transmission resources and reduce the pressure and difficulty of on-site deployment. It can also greatly reduce the data processing load and shorten the transmission delay of multi-angle free-view video frames.
  • the data processing device 192 is adapted to determine that one of the compressed video data streams of each acquisition device in the acquisition array 191 received in real time is used as a reference data stream; and based on the received compressed video data stream;
  • the video frame interception instruction determines the video frame to be intercepted in the reference data stream, and selects the video frames in the remaining compressed video data streams that are synchronized with the video frame to be intercepted in the reference data stream, as the remaining video frames
  • S201 Send a pull instruction to each collection device in the collection array, where each collection device in the collection array is placed in a different position of the field collection area according to a preset multi-angle free viewing angle range, and each collection device in the collection array The equipment simultaneously collects the video data stream in real time from the corresponding angle.
  • the main acquisition device in order to achieve synchronization of pulling streams, there may be multiple implementation manners. For example, it is possible to send a current drawing instruction to each acquisition device in the acquisition array at the same time; or, it is also possible to send a drawing instruction only to the main acquisition device in the acquisition array to trigger the drawing of the main acquisition device, and then the main acquisition device will The streaming instruction is synchronized to all slave acquisition devices and triggers all slave acquisition devices to pull streaming.
  • S202 Receive, in real time, the video data streams respectively transmitted by each acquisition device in the acquisition array based on the streaming instruction, and determine whether the video data streams respectively transmitted by the acquisition devices in the acquisition array are frame-level synchronized.
  • the acquisition device itself may have the function of encoding and packaging, so that the original video data collected from the corresponding angle in real time can be encoded and packaged in real time.
  • each collection device can also have a compression function. The higher the compression rate, the smaller the amount of compressed data can be made when the amount of data before compression is the same, which can relieve the bandwidth pressure of real-time synchronous transmission. Therefore, the collection device can use Predictive coding, transform coding, entropy coding and other technologies improve the compression rate of video.
  • the above data synchronization method by determining whether the video data streams transmitted by each acquisition device in the acquisition array are frame-level synchronized, it can ensure the synchronous transmission of multiple channels, thereby avoiding the problem of missing frames and multi-frame transmission, and improving data processing Speed to meet the needs of low-latency playback of multi-angle free-view video.
  • each collection device in the collection array when each collection device in the collection array is manually started, there is a start-up time error, and it is possible that the video data stream may not be collected at the same time. Therefore, at least one of the following methods can be adopted to ensure that each collection device in the collection array is set to collect the video data stream synchronously in real time from a corresponding angle:
  • the acquisition device that has acquired the acquisition start instruction synchronizes the acquisition start instruction to other acquisition devices, so that each acquisition device in the acquisition array is based on The collection start instruction starts to synchronously collect the video data stream in real time from the corresponding angle.
  • the acquisition array may contain 40 acquisition devices, and when the acquisition device A1 acquires the acquisition start instruction, the acquisition device A1 synchronously sends the acquired acquisition start instruction to the other acquisition devices A2-A40. After all the collection devices receive the collection start instruction, each collection device starts to synchronously collect the video data stream from a corresponding angle in real time based on the collection start instruction. Since the data transmission speed between the collection devices is much faster than the speed of manual start, the start time error caused by manual start can be reduced.
  • Each acquisition device in the acquisition array synchronously acquires a video data stream in real time from a corresponding angle based on a preset clock synchronization signal.
  • a clock signal synchronization device can be set, and each collection device can be connected to the clock signal synchronization device.
  • the clock signal synchronization device receives a trigger signal (such as a synchronous acquisition start instruction)
  • the clock signal synchronization device can A clock synchronization signal is transmitted to each collection device, and each collection device starts to synchronously collect the video data stream from a corresponding angle in real time based on the clock synchronization signal.
  • the clock signal transmitting device can transmit a clock synchronization signal to each collection device based on a preset trigger signal, so that each collection device can collect synchronously, and is not susceptible to interference from external conditions and manual operations. Therefore, the synchronization accuracy of each collection device can be improved And synchronization efficiency.
  • the collection devices in the collection array may not receive the pull command at the same time, and there may be a time difference of several milliseconds or less between the collection devices, resulting in each collection
  • the video data stream transmitted by the device in real time is not synchronized.
  • the acquisition array contains acquisition devices 1 and 2.
  • the acquisition parameter settings of acquisition devices 1 and 2 are the same, and the acquisition frame rate is X fps, and the acquisition device 1 Synchronous acquisition with the video frame of 2 acquisition.
  • the acquisition interval T of each frame in acquisition devices 1 and 2 is Assuming that the data processing equipment at t0
  • the collection device 1 receives the pull instruction r at time t1
  • the acquisition device 2 receives the pull instruction r at time t2. If the acquisition devices 1 and 2 both receive the pull instruction r within the same collection interval T, then you can It is assumed that the acquisition devices 1 and 2 receive the streaming instruction at the same time, and the acquisition devices 1 and 2 can respectively transmit frame-level synchronized video data streams; if the acquisition devices 1 and 2 do not receive it within the same acquisition interval, it can be regarded as the acquisition device 1 and 2 did not receive the streaming instruction at the same time, and the acquisition devices 1 and 2 could not realize the synchronous transmission of the frame-level video data stream.
  • the frame-level synchronization of video data stream transmission can also be referred to as pull stream synchronization. Once the pull stream synchronization is achieved, it will automatically continue until the pull stream stops.
  • At least one of the following methods can be used to determine whether the video data streams respectively transmitted by the collection devices in the collection array are frame-level synchronized:
  • the characteristic information of the object of the Nth frame of each video data stream can be matched, when the Nth frame of each video data stream is When the feature information of the object meets the preset similarity threshold, it is determined that the feature information of the object in the Nth frame of the video data stream transmitted by each collection device in the collection array is consistent, and then the video data streams transmitted by each collection device are consistent.
  • Frame level synchronization When acquiring the Nth frame of the video data stream transmitted by each acquisition device in the acquisition array, the characteristic information of the object of the Nth frame of each video data stream can be matched, when the Nth frame of each video data stream is When the feature information of the object meets the preset similarity threshold, it is determined that the feature information of the object in the Nth frame of the video data stream transmitted by each collection device in the collection array is consistent, and then the video data streams transmitted by each collection device are consistent. Frame level synchronization.
  • the feature information of the object of the Nth frame of each video data stream may include at least one of shape feature information, color feature information, and position feature information.
  • the time stamp information of the Nth frame of each video data stream can be matched when the Nth frame of the video data stream respectively transmitted by each acquisition device in the acquisition array is obtained, where N is an integer not less than 1.
  • N is an integer not less than 1.
  • the video frames in the video data stream of each acquisition device can also be intercepted and transmitted to the designated destination.
  • the following steps can be included:
  • S221 Determine one of the video data streams of the video data streams of each acquisition device in the acquisition array received in real time as a reference data stream.
  • S222 Based on the received video frame interception instruction, determine the video frame to be intercepted in the reference data stream, and select videos in the remaining video data streams that are synchronized with the video frame to be intercepted in the reference data stream Frame, as the video frame to be intercepted in the remaining video data streams.
  • S223 Intercept video frames to be intercepted in each video data stream.
  • S224 Synchronously upload the captured video frames to the designated target terminal.
  • the designated target terminal may be a preset target terminal, or may be a target terminal designated by a video frame interception instruction.
  • the above-mentioned solution it is possible to achieve frame-cutting synchronization, improve frame-cutting efficiency, further improve the display effect of the generated multi-angle free-view video, and enhance user experience.
  • the coupling between the process of selecting and intercepting video frames and the process of generating multi-angle free-view videos can be reduced, and the independence between each process can be enhanced, which is convenient for later maintenance.
  • the intercepted video frames can be synchronized to the specified designated At the target end, it can save network transmission resources and reduce data processing load, and increase the speed of data processing to generate multi-angle free-view videos.
  • One way is to select the video frame consistent with the feature information of the object in the remaining video data streams according to the feature information of the object in the video frame to be intercepted in the reference data stream as the waiting data of the remaining video data streams.
  • the captured video frame is to select the video frame consistent with the feature information of the object in the remaining video data streams according to the feature information of the object in the video frame to be intercepted in the reference data stream as the waiting data of the remaining video data streams.
  • the acquisition array contains 40 acquisition devices, so 40 video data streams can be received in real time. It is assumed that the video data corresponding to the acquisition device A1' is determined in the video data streams of each acquisition device in the acquisition array received in real time.
  • Stream A1 is used as the reference data stream, and then, based on the feature information X of the object in the video frame that is indicated in the received video frame interception instruction, the video frame a1 in the reference data stream that is consistent with the feature information X of the object is determined
  • the video frame to be intercepted according to the feature information x1 of the object in the video frame a1 to be intercepted in the reference data stream, select the video frame a2 that is consistent with the feature information x1 of the object in the remaining video data streams A2-A40 -a40, as the video frame to be intercepted for the remaining video data streams.
  • the feature information of the object may include shape feature information, color feature information, position feature information, etc.; the feature information X of the object in the video frame that is instructed to be intercepted in the video frame interception instruction is the same as that of the object to be intercepted in the reference data stream.
  • the feature information x1 of the object in the video frame a1 can be the same way of representing the feature information of the same object, for example, the feature information X and x1 of the object are both two-dimensional feature information; the feature information X of the object and the feature information of the object x1 can also be different representations of the feature information of the same object.
  • the feature information X of the object can be two-dimensional feature information
  • the feature information x1 of the object can be three-dimensional feature information
  • a similarity threshold can be preset. When the similarity threshold is met, it can be considered that the feature information X of the object is consistent with x1, or the feature information x1 of the object is consistent with the feature information x2-x40 of the object in the remaining video data streams A2-A40. .
  • the specific representation method and similarity threshold of the feature information of the object can be determined according to the preset multi-angle free viewing angle range and the scene of the scene, which is not limited in this embodiment.
  • Another way is to select video frames consistent with the time stamp information in the remaining video data streams according to the time stamp information of the video frames in the reference data stream as the video to be intercepted in the remaining video data streams frame.
  • the acquisition array can contain 40 acquisition devices, so 40 video data streams can be received in real time. It is assumed that the video data corresponding to the acquisition device B1 is determined in the video data stream of each acquisition device in the acquisition array received in real time. Stream B1 is used as the reference data stream, and then, based on the time stamp information Y of the video frame that is indicated to be intercepted in the received video frame interception instruction, the video frame b1 corresponding to the time stamp information Y in the reference data stream is determined as the to-be The intercepted video frames are then selected according to the time stamp information y1 in the video frame b1 to be intercepted in the reference data stream, and the video frames b2-b40 that are consistent with the time stamp information y1 in the remaining video data streams B2-B40, As the video frame to be intercepted for the remaining video data streams.
  • the time stamp information Y of the video frame indicated to be intercepted in the video frame interception instruction may have a certain error with the time stamp information y1 in the video frame b1 to be intercepted in the reference data stream.
  • the The time stamp information corresponding to the video frame in the reference data stream is inconsistent with the time stamp information Y.
  • an error range can be preset. For example, if the error range is ⁇ 1ms, the 0.1ms error is within the error range Therefore, the video frame b1 corresponding to the time stamp information y1 that differs from the time stamp information Y by 0.1 ms can be selected as the video frame to be intercepted in the reference data stream.
  • the specific error range and the selection rule of the time stamp information y1 in the reference data stream can be determined according to the on-site collection equipment and transmission network, which is not limited in this embodiment.
  • the embodiment of the present invention also provides a data processing device corresponding to the above-mentioned data processing method.
  • a data processing device corresponding to the above-mentioned data processing method.
  • the data processing device 230 may include:
  • the instruction sending unit 231 is adapted to send a streaming instruction to each collection device in the collection array, wherein each collection device in the collection array is placed in a different position of the field collection area according to a preset multi-angle free viewing angle range, and Each acquisition device in the acquisition array is set to acquire the video data stream synchronously in real time from the corresponding angle;
  • the data stream receiving unit 232 is adapted to receive in real time the video data streams respectively transmitted by each acquisition device in the acquisition array based on the pull instruction;
  • the first synchronization judging unit 233 is adapted to determine whether the video data streams respectively transmitted by the acquisition devices in the acquisition array are frame-level synchronized, and the video data streams respectively transmitted by the acquisition devices in the acquisition array are not synchronized at the frame level. During frame-level synchronization, the instruction sending unit 231 is triggered again until the frame-level synchronization between the video data streams respectively transmitted by each acquisition device in the acquisition array.
  • the data processing device can be set according to actual scenarios. For example, when there is free space on site, the data processing device can be placed in a non-collection area on site and serve as a site server; when there is no free space on site, the data processing device can be placed in the cloud and serve as a cloud server.
  • the data processing device 230 may further include:
  • the reference video stream determining unit 234 is adapted to determine one of the video data streams of each acquisition device in the acquisition array received in real time as a reference data stream;
  • the video frame selection unit 235 is adapted to determine the video frame to be intercepted in the reference data stream based on the received video frame interception instruction, and select the remaining video frames that are synchronized with the video frame to be intercepted in the reference data stream.
  • the video frame in the video data stream is used as the video frame to be intercepted in the remaining video data streams;
  • the video frame interception unit 236 is adapted to intercept the video frames to be intercepted in each video data stream;
  • the uploading unit 237 is adapted to synchronously upload the captured video frames to the designated target terminal.
  • the data processing device 230 may establish a connection with a target terminal through a port or an IP address in advance, and may also synchronously upload the captured video frame to the port or IP address specified by the video frame capture instruction.
  • the video frame selection unit 235 includes at least one of the following:
  • the first video frame selection module 2351 is adapted to select, according to the feature information of the object in the video frame to be intercepted in the reference data stream, the video frame consistent with the feature information of the object in the remaining video data streams, as the rest Video frames to be intercepted in each video data stream;
  • the second video frame selection module 2352 is adapted to select, according to the time stamp information of the video frames in the reference data stream, a video frame consistent with the time stamp information among the remaining video data streams, as the video frame of the remaining video data streams.
  • the video frame to be captured.
  • the embodiment of the present invention also provides a data synchronization system corresponding to the above-mentioned data processing method.
  • the above-mentioned data processing device is used to realize real-time reception of multiple video data streams.
  • the data synchronization system 240 may include: a collection array 241 placed in the field collection area and a collection array 241 placed in a link connected to the collection array A data processing device 242.
  • the collection array 241 includes a plurality of collection devices, and each collection device in the collection array 241 is located at different locations in the field collection area according to a preset multi-angle free viewing angle range, wherein:
  • Each collection device in the collection array 241 is adapted to collect the video data stream synchronously in real time from corresponding angles, and transmit the obtained video data stream to the data processing in real time based on the streaming instruction sent by the data processing device 242.
  • the data processing device 242 is adapted to respectively send a streaming instruction to each acquisition device in the acquisition array 241, and receive in real time the video data stream respectively transmitted by each acquisition device in the acquisition array 241 based on the streaming instruction, And when the video data streams transmitted by the respective collection devices in the collection array 241 are not synchronized at the frame level, the current pull instructions are sent to the respective collection devices in the collection array 241 again until each of the collection arrays 241 is The video data stream transmitted by the capture device is synchronized at the frame level.
  • the data synchronization system in the embodiment of the present invention, by determining whether the video data streams respectively transmitted by each acquisition device in the acquisition array are frame-level synchronized, the synchronous transmission of multiple channels of data can be ensured, thereby avoiding missed frames and multi-frame transmissions
  • the problem is to increase the data processing speed to meet the needs of low-latency playback of multi-angle free-view videos.
  • the data processing device 242 is further adapted to determine that one of the video data streams of each acquisition device in the acquisition array 241 received in real time is used as a reference data stream; and based on the received video frame interception Instruction to determine the video frame to be intercepted in the reference data stream, and select the video frames in the remaining video data streams that are synchronized with the video frame to be intercepted in the reference data stream, as the video frames of the remaining video data streams.
  • the video frame to be intercepted; the video frame to be intercepted in each video data stream is intercepted and the intercepted video frame is synchronously uploaded to the designated target terminal 243.
  • the data processing device 240 may establish a connection with a target terminal through a port or an IP address in advance, and may also synchronously upload the captured video frame to the port or IP address specified by the video frame capture instruction.
  • the data synchronization system 240 may further include a cloud server, which is suitable for serving as the designated target 243.
  • the data synchronization system 240 may further include a playback control device 341, which is suitable for serving as a designated target terminal 243.
  • the data synchronization system 240 may further include an interactive terminal 351, which is suitable for serving as a designated target terminal 243.
  • At least one of the following methods may be adopted to ensure that each collection device in the collection array 241 is set to collect the video data stream synchronously in real time from a corresponding angle:
  • the collection devices in the collection array are connected through a synchronization line, wherein when at least one collection device acquires a collection start instruction, the collection device that has acquired the collection start instruction connects the collection device through the synchronization line.
  • the acquisition start instruction is synchronized to other acquisition devices, so that each acquisition device in the acquisition array starts to synchronously acquire the video data stream from a corresponding angle in real time based on the acquisition start instruction;
  • Each acquisition device in the acquisition array synchronously acquires a video data stream in real time from a corresponding angle based on a preset clock synchronization signal.
  • the synchronization system includes a collection array 251 composed of collection devices, a data processing device 252, and a server cluster 253 in the cloud.
  • At least one of the acquisition devices in the acquisition array 251 acquires the acquisition start instruction, and synchronizes the acquired acquisition start instruction to other acquisition devices through the synchronization line 254, so that each acquisition device in the acquisition array The acquisition device respectively starts to synchronously acquire the video data stream in real time from the corresponding angle based on the acquisition start instruction.
  • the data processing device 252 may send a streaming instruction to each collection device in the collection array 251 through a wireless local area network. Based on the streaming instruction sent by the data processing device 252, each collection device in the collection array 251 transmits the obtained video data stream to the data processing device 252 in real time through the switch 255.
  • the data processing device 252 determines whether the video data streams respectively transmitted by the respective collection devices in the collection array 251 are synchronized at the frame level, and the video data streams respectively transmitted by the respective collection devices in the collection array 251 are not synchronized with each other.
  • a streaming instruction is sent to each acquisition device in the acquisition array 251, respectively, until the frame-level synchronization between the video data streams transmitted by each acquisition device in the acquisition array 251 is synchronized.
  • the data processing device 252 determines that the video data streams transmitted by the acquisition devices in the acquisition array 251 are frame-level synchronized, determine one of the video data streams of each acquisition device in the acquisition array 251 received in real time. Stream as the reference data stream, and after the received video frame interception instruction, the video frame to be intercepted in the reference data stream is determined according to the video frame interception instruction, and then, the data processing device 252 selects the video frame to be intercepted.
  • the video frames to be intercepted in the reference data stream are synchronized with the video frames in the remaining video data streams as the video frames to be intercepted in the remaining video data streams, and then the video frames to be intercepted in each video data stream are intercepted and combined
  • the captured video frames are synchronously uploaded to the cloud.
  • the server cluster 253 in the cloud performs subsequent processing on the captured video frames to obtain a multi-angle free-view video for playback.
  • the cloud server cluster 253 may include: a first cloud server 2531, a second cloud server 2532, a third cloud server 2533, and a fourth cloud server 2534.
  • the first cloud server 2531 can be used for parameter calculation
  • the second cloud server 2532 can be used for depth calculation to generate a depth map
  • the third cloud server 2533 can be used for DIBR to a preset virtual viewpoint path Perform frame image reconstruction
  • the fourth cloud server 2534 can be used to generate multi-angle free-view videos.
  • the data processing device can be placed in a non-collection area on site or in the cloud according to actual scenarios, and the data synchronization system can use at least one of a cloud server, a playback control device, or an interactive terminal in practical applications.
  • the transmitting end of the video frame interception instruction may also adopt other devices capable of transmitting the video frame interception instruction, which is not limited in the embodiment of the present invention.
  • the embodiment of the present invention also provides a collection device corresponding to the above-mentioned data processing method.
  • the collection device is adapted to synchronize the collection start instruction to other collection devices when acquiring the collection start instruction, and start from the corresponding perspective.
  • the video data stream is synchronously collected in real time, and when a streaming instruction sent by the data processing device is received, the obtained video data stream is transmitted to the data processing device in real time.
  • the collection device 360 includes: a photoelectric conversion camera component 361, a processor 362, an encoder 363, and a transmission component 365, in which:
  • Photoelectric conversion camera component 361 suitable for collecting images
  • the processor 362 is adapted to synchronize the acquisition start instruction to other acquisition devices through the transmission component 365 when the acquisition start instruction is acquired, and start to perform real-time real-time processing of the images collected by the photoelectric conversion camera component 361. Process to obtain an image data sequence, and when a streaming instruction is obtained, transmit the obtained video data stream to the data processing device in real time through the transmission component 365;
  • the encoder 363 is adapted to encode the image data sequence to obtain a corresponding video data stream.
  • the collection device 360 may further include a recording component 364, which is adapted to collect sound signals and obtain audio data.
  • the collected image data sequence and audio data can be processed by the processor 362, and then the collected image data sequence and audio data can be encoded by the encoder 363 to obtain a corresponding video data stream.
  • the processor 362 can synchronize the acquisition start instruction to other acquisition devices through the transmission component 365; when receiving the streaming instruction, the transmission component 365 will transmit the acquired video data The stream is transmitted to the data processing device in real time.
  • the collection device can be placed at different locations in the field collection area according to the preset multi-angle free view range, and the collection device can be fixed at a certain point in the field collection area, or it can move within the field collection area. So as to form a collection array. Therefore, the collection device may be a fixed device or a mobile device, so that the video data stream can be flexibly collected from multiple angles.
  • FIG. 37 it is a schematic diagram of the acquisition array in an application scenario in an embodiment of the present invention.
  • the center of the stage is taken as the core point of view
  • the core point of view is the center of the circle
  • the fan-shaped area where the core point of view is located on the same plane is used as a preset Multi-angle free viewing angle range.
  • the collection devices 371-375 in the collection array are fan-shaped and placed in different positions of the field collection area according to the preset multi-angle free viewing angle range.
  • the collection device 376 is a movable device, which can be moved to a designated location according to instructions for flexible collection.
  • the acquisition device can be a handheld device to supplement the acquisition data when the acquisition device fails or in a small space.
  • the handheld device 377 located in the stage audience area in Figure 37 can be added to the acquisition array to provide stage audiences.
  • the video data stream of the region is taken as the core point of view
  • the core point of view is the center of the circle
  • the embodiment of the present invention provides a computing node cluster, and multiple computing nodes can simultaneously generate depth maps in parallel and batch-wise on the texture data synchronously collected by the same collection array.
  • the depth map calculation process can be divided into multiple steps such as obtaining a rough depth map through the first depth calculation, determining the unstable region in the rough depth map, and then calculating the second depth.
  • the calculation Multiple computing nodes in the node cluster can perform the first depth calculation on the texture data collected by multiple collection devices in parallel to obtain a rough depth map, and verify the obtained rough depth map in parallel and perform the second depth calculation, thereby It can save the time of depth map calculation and increase the rate of depth map generation.
  • FIG. 26 Referring to the flowchart of a method for generating a depth map shown in FIG. 26, in an embodiment of the present invention, multiple computing nodes in a computing node cluster are used to generate the depth map respectively.
  • any of the computing node clusters is A computing node is called the first computing node.
  • the method for generating the depth map of the computing node cluster is described in detail below through specific steps:
  • S261 Receive texture data, where the texture data is synchronously collected by multiple collection devices in the same collection array.
  • the multiple collection devices can be placed at different locations in the field collection area according to the preset multi-angle free view range, and the collection devices can be fixedly set at a certain point in the field collection area, or can be located in the field collection area. Move inside to form a collection array.
  • the multi-angle free viewing angle may refer to the spatial position and viewing angle of the virtual viewpoint that enables the scene to be freely switched.
  • the multi-angle free angle of view can be a 6-degree-of-freedom (6DoF) angle of view
  • the collection devices used in the collection array can be general cameras, cameras, video recorders, handheld devices such as mobile phones, etc.
  • 6DoF 6-degree-of-freedom
  • the texture data is the pixel data of the two-dimensional image frame collected by the aforementioned acquisition device, which may be an image at one frame time, or may be the pixel data of a frame image corresponding to a video stream formed by continuous or non-continuous frame images.
  • the first calculation node performs a first depth calculation according to the first texture data and the second texture data to obtain a first rough depth map.
  • the texture data that meets the preset first mapping relationship with the first computing node in the texture data is called first texture data; It is assumed that the texture data collected by the collection device of the first spatial position relationship is called the second texture data.
  • the first mapping relationship may be obtained based on a preset first mapping relationship table or through random mapping.
  • the texture data processed by each computing node can be pre-allocated according to the number of computing nodes in the computing node cluster and the number of collection devices in the collection array corresponding to the texture data.
  • a special distribution node may be set to distribute the computing tasks of each computing node in the computing node cluster, and the distribution node may obtain the first mapping relationship based on a preset first mapping relationship table or through random mapping. For example, if there are a total of 40 collection devices in the collection array, in order to achieve the highest concurrent processing efficiency, 40 computing nodes can be configured, and each collection device corresponds to a computing node.
  • each computing node can correspond to two collection devices to collect data.
  • the texture data Specifically, the mapping relationship between the identification of the acquisition device corresponding to the texture data and the identification of each computing node can be set as the first mapping relationship, and the corresponding acquisition device in the acquisition array can be directly collected based on the first mapping relationship.
  • the texture data is distributed to the corresponding computing node.
  • computing tasks can also be randomly assigned, and texture data collected by each collection device in the collection array can be randomly assigned to each computing node in the computing node cluster. For this reason, in order to improve processing efficiency, the collection array can be allocated in advance. All the collected texture data are copied on each computing node in the computing node cluster.
  • any server in the server cluster may perform the first depth calculation according to the first texture data and the second texture data.
  • the second texture data may be the first texture data acquisition device that meets the preset first texture data.
  • the texture data collected by the collection device with a distance relationship, or the texture data collected by the collection device that meets the preset first quantity relationship with the collection device of the first texture data, or the texture data collected by the collection device with the first texture data The data collection device meets the preset first distance relationship and the texture data collected by the collection device that meets the preset first quantity relationship.
  • the first preset number can take any integer value from 1 to N-1, and N is the total number of collection devices in the collection array.
  • the first preset number is set to 2, so that the highest possible image quality can be obtained with the least amount of calculation. For example, assuming that the calculation node 9 corresponds to the camera 9 in the preset first mapping relationship, the texture data of the camera 9 and the texture data of the cameras 5, 6, 7, 10, 11, 12 adjacent to the camera 9 can be used , The rough depth map of the camera 9 is obtained by calculation.
  • the second texture data may also be data collected by a collection device that meets other types of first spatial position relationships with the collection device of the first texture data, such as the first texture data collection device.
  • the spatial position relationship may also satisfy a preset angle, satisfy a preset relative position, and so on.
  • the first computing node synchronizes the first rough depth map to the remaining computing nodes in the computing node cluster to obtain a rough depth map set.
  • the rough depth map obtained after the rough calculation of the depth map needs to be cross-validated to determine the unstable region in each rough depth map, so as to perform a refined solution in the next step.
  • cross-validation needs to be performed through rough depth maps corresponding to multiple acquisition devices around the acquisition device corresponding to the rough depth map. (Typically, the rough depth map to be verified and the rough depth map corresponding to all other acquisition devices are cross-validated together). Therefore, the rough depth map calculated by each computing node needs to be synchronized to the rest of the computing node cluster. After the computing nodes are synchronized in step S263, each computing node in the computing node cluster obtains the rough depth map calculated by the remaining computing nodes in the computing node cluster, and each server obtains the exact same rough depth map set.
  • the first computing node uses a third rough depth map for verification for the second rough depth map in the rough depth map set, and obtains an unstable region in the second rough depth map.
  • the second rough depth map and the first computing node may satisfy a preset second mapping relationship;
  • the third rough depth map may be that a collection device corresponding to the second rough depth map satisfies a preset A rough depth map corresponding to the acquisition device of the second spatial position relationship.
  • the second mapping relationship may be obtained based on a preset second mapping relationship table or through random mapping.
  • the texture data processed by each computing node can be pre-allocated according to the number of computing nodes in the computing node cluster and the number of collection devices in the collection array corresponding to the texture data.
  • a special distribution node may be set to distribute the computing tasks of each computing node in the computing node cluster, and the distribution node may obtain the second mapping relationship based on a preset second mapping relationship table or through random mapping.
  • setting the second mapping relationship refer to the foregoing implementation example of the first mapping relationship.
  • the second mapping relationship may completely correspond to the first mapping relationship, or may not correspond to the first mapping relationship.
  • the identification of the acquisition device corresponding to the data can be established according to the hardware identification to establish a one-to-one corresponding second mapping relationship .
  • the descriptions of the first rough depth map, the second rough depth map, and the third rough depth map are only for clear and concise description.
  • the first rough depth map may be the same as or different from the second rough depth map; the acquisition device corresponding to the third rough depth map and the acquisition device corresponding to the second rough depth map It suffices to satisfy the preset second spatial position relationship.
  • the texture data corresponding to the third rough depth map may be collected by a collection device corresponding to the second rough depth map that satisfies a preset second distance relationship.
  • the obtained texture data, or the texture data corresponding to the third texture depth map may be texture data collected by the collection device corresponding to the second rough depth map that satisfies the preset second quantity relationship, or
  • the texture data corresponding to the third rough depth map is texture data collected by a collection device corresponding to the second rough depth map that satisfies a preset second distance relationship and a second quantitative relationship.
  • the second preset number can take any integer value from 1 to N-1, and N is the total number of collection devices in the collection array.
  • the second preset number may be equal to or different from the first preset number.
  • the second preset number is taken as 2, so that the highest possible image quality can be obtained with the least amount of calculation.
  • the second spatial position relationship may also be another type of spatial position relationship, such as satisfying a preset angle, satisfying a preset relative position, and so on.
  • the first computing node performs a second depth based on the unstable region in the second rough depth map, the texture data corresponding to the second rough depth map, and the texture data corresponding to the third rough depth map. Calculate to get the corresponding fine depth map.
  • the difference between the second depth calculation and the first depth calculation is that the depth map candidate value in the second rough depth map selected by the second depth calculation does not include the depth value of the unstable region. In this way, unstable regions in the generated depth map can be eliminated, so that the generated depth map is more accurate, and the quality of the generated multi-angle free-view image can be improved.
  • the server S may perform the first round of depth calculation (first depth calculation) based on the allocated texture data of the camera M and the texture data of the camera that meets the preset first spatial position relationship with the camera M to obtain a rough depth map .
  • the depth map can be continuously refined and solved on the same server.
  • the server S can cross-validate the assigned rough depth map corresponding to the camera M with the results of all other rough depth maps, and can obtain the unstable region in the rough depth map corresponding to the camera M.
  • the server S You can perform another round of depth map calculation (second depth calculation) on the unstable area in the rough depth map corresponding to the assigned camera M, the texture data collected by camera M, and the texture information of the N cameras around camera M, namely A refined depth map corresponding to the first texture data (texture data collected by the camera M) can be obtained.
  • the rough depth map corresponding to the camera M is a rough depth map calculated based on the texture data collected by the camera M and the texture data collected by a collection device that satisfies the preset first spatial position relationship with the camera M.
  • S266 Use the fine depth atlas of the fine depth maps obtained by each computing node as a final generated depth map.
  • multiple computing nodes can simultaneously generate the depth map in parallel and batch processing on the texture data synchronously collected by the same acquisition array, thereby greatly improving the efficiency of generating the depth map.
  • the above solution is adopted to eliminate unstable regions in the generated depth map through secondary depth calculation, so the obtained fine depth map is more accurate, and the quality of the generated multi-angle free-view image can be improved.
  • the computing node cluster may be a server cluster composed of multiple servers, and multiple servers in the server cluster may be deployed in a centralized manner or in a distributed deployment.
  • some or all of the computing node devices in the computing node cluster may be used as local servers, or may be used as edge node devices, or as cloud computing devices.
  • the computing node cluster may also be a computing device formed by multiple CPUs or GPUs.
  • the embodiment of the present invention also provides a computing node, which is suitable for forming a computing node cluster with at least another computing node to generate a depth map.
  • the computing node 270 may include:
  • the input unit 271 is adapted to receive texture data, which originates from the simultaneous collection of multiple collection devices in the same collection array;
  • the first depth calculation unit 272 is adapted to perform a first depth calculation according to the first texture data and the second texture data to obtain a first rough depth map, wherein: the first texture data and the calculation node meet a preset A first mapping relationship; the second texture data is texture data collected by a collection device that meets a preset first spatial position relationship with the collection device of the first texture data;
  • the synchronization unit 273 is adapted to synchronize the first rough depth map to the remaining computing nodes in the computing node cluster to obtain a rough depth atlas;
  • the verification unit 274, for the second rough depth map in the rough depth map set is adapted to use a third rough depth map for verification to obtain an unstable region in the second rough depth map, wherein: the second rough depth map The depth map and the computing node meet the preset second mapping relationship; the third rough depth map is corresponding to the acquisition device corresponding to the second rough depth map that meets the preset second spatial position relationship Rough depth map;
  • the second depth calculation unit 275 is adapted to perform a second operation based on the unstable region in the second rough depth map, the texture data corresponding to the second rough depth map, and the texture data corresponding to the third rough depth map. Depth calculation to obtain a corresponding fine depth map, where: the depth map candidate value in the second rough depth map selected by the second depth calculation does not include the depth value of the unstable region;
  • the output unit 276 is adapted to output the fine depth map, so that the computing node cluster obtains the fine depth atlas as the final generated depth map.
  • the depth map calculation process can include multiple steps such as obtaining a rough depth map through the first depth calculation, determining the unstable region in the rough depth map and subsequent second depth calculation, and performing the depth map calculation through the above steps , Which facilitates the separate calculation of multiple computing nodes, thereby improving the efficiency of generating the depth map.
  • the embodiment of the present invention also provides a computing node cluster, the computing node cluster may include multiple computing nodes, and multiple computing nodes in the computing node cluster can simultaneously synchronize the texture data collected by the same collection array, Depth map generation is performed in batch mode.
  • any computing node in the computing node cluster is referred to as the first computing node.
  • the first computing node is adapted to perform a first depth calculation according to the first texture data and the second texture data in the received texture data to obtain a first rough depth map;
  • the first rough depth map is synchronized to the rest of the computing nodes in the computing node cluster to obtain a rough depth atlas; for the second rough depth map in the rough depth atlas, the third rough depth map is used for verification, and all The unstable region in the second rough depth map; and according to the unstable region in the second rough depth map, the texture data corresponding to the second rough depth map, and the texture data corresponding to the third rough depth map , Perform a second depth calculation to obtain a corresponding fine depth map, and output the obtained fine depth map so that the computing node cluster uses the obtained fine depth atlas as the final generated depth map;
  • the first texture data meets a preset first mapping relationship with the first computing node; the second texture data meets a preset first spatial position relationship with the acquisition device of the first texture data
  • the texture data collected by the acquisition device; the second rough depth map and the first computing node meet the preset second mapping relationship; the third rough depth map is the acquisition corresponding to the second rough depth map
  • the device satisfies the rough depth map corresponding to the acquisition device with the preset second spatial position relationship; and the depth map candidate value in the second rough depth map selected by the second depth calculation does not include the depth value of the unstable region.
  • the texture data collected by the N cameras in the camera array are respectively input to the N servers in the server cluster, and the first depth calculation is performed respectively to obtain the rough depth map 1 ⁇ N.
  • each server copies the rough depth map calculated by itself to other servers in the server cluster and realizes time synchronization.
  • each server verifies the rough depth map assigned by itself and performs the second depth calculation.
  • the depth map after fine calculation is obtained as the depth map generated by the server cluster. From the above calculation process, it can be seen that each server in the server cluster can perform the first depth calculation on the texture data collected by multiple cameras in parallel, and perform verification and second depth calculation on each rough depth map in the rough depth atlas.
  • the entire depth map generation process is performed by multiple servers in parallel, which can greatly save the time of depth map calculation and improve the efficiency of depth map generation.
  • the server cluster can then store the generated depth map or output it to the terminal device according to the request, so as to further generate and display the virtual viewpoint image, which will not be repeated here.
  • the embodiment of the present invention also provides a computer-readable storage medium on which computer instructions are stored.
  • the steps of the depth map generation method described in any of the foregoing embodiments can be executed.
  • the steps of the generation method will not be repeated here.
  • DIBR depth-image-based rendering
  • the embodiments of the present invention provide a method for generating virtual viewpoint images through parallel processing, which can greatly accelerate the time-efficiency performance of the generation of virtual viewpoint images with multi-angle free-view angles, thereby meeting the requirements of low-time multi-angle free-view videos.
  • the need for delayed playback and real-time interaction improves user experience.
  • the virtual view point image can be generated through the following steps:
  • the multi-angle free viewing angle may refer to the spatial position and viewing angle of the virtual viewpoint that enables the scene to be switched freely.
  • the multi-angle free viewing angle range can be determined according to the needs of the application scene.
  • a collection array composed of multiple collection devices can be arranged on site. Each collection device in the collection array can be placed in a different position of the on-site collection area according to a preset multi-angle free viewing angle range.
  • the live images can be collected synchronously, and the texture map with synchronized multiple angles can be obtained. For example, multiple cameras, video cameras, etc. can be used to perform synchronized image acquisition of multiple angles of a certain scene.
  • the images in the multi-angle free-view image combination may be completely free-view images.
  • it can be a 6-degree-of-freedom (DoF) viewing angle, that is, the spatial position and viewing angle of the viewpoint can be freely switched.
  • DoF 6-degree-of-freedom
  • the spatial position of the viewpoint can be expressed as coordinates (x, y, z), and the angle of view can be expressed as three rotation directions So it can be called 6DoF.
  • the image combination of multiple angles and free viewing angles and the parameter data of the image combination can be acquired first.
  • the texture map and the depth map in the image combination correspond one-to-one.
  • the texture map can adopt any type of two-dimensional image format, for example, it can be any of BMP, PNG, JPEG, webp format, and so on.
  • the depth map can represent the distance of each point in the scene relative to the shooting device, that is, each pixel value in the depth map represents the distance between a certain point in the scene and the shooting device.
  • the texture map in the image combination is a plurality of synchronized two-dimensional images.
  • the depth data of each two-dimensional image may be determined based on the plurality of two-dimensional images.
  • the depth data may include depth values corresponding to pixels of the two-dimensional image.
  • the distance from the collection device to each point in the area to be viewed can be used as the aforementioned depth value, and the depth value can directly reflect the geometric shape of the visible surface in the area to be viewed.
  • the depth value may be the distance from each point in the area to be viewed along the optical axis of the camera to the optical center, and the origin of the camera coordinate system may be used as the optical center.
  • the distance may be a relative value, and the same reference may be used for multiple images.
  • the depth data may include depth values corresponding to the pixels of the two-dimensional image one-to-one, or may be a partial value selected from a set of depth values corresponding to the pixels of the two-dimensional image one-to-one.
  • the depth value set can be stored in the form of a depth map.
  • the depth data can be the data obtained after down-sampling the original depth map, and the two-dimensional image (texture map).
  • the one-to-one corresponding depth value set of pixels is the original depth map in the form of the image stored according to the pixel point arrangement of the two-dimensional image (texture map).
  • the image combination of multiple angles and free viewing angles and the parameter data of the image combination can be obtained through the following steps, which will be described below through specific application scenarios.
  • the first step is the acquisition and calculation of the depth map, including three main steps, namely: Multi-camera Video Capturing, and camera internal and external parameter calculation (Camera Parameter Estimation), and depth map calculation (Depth Map Calculation).
  • Multi-camera Video Capturing and camera internal and external parameter calculation (Camera Parameter Estimation), and depth map calculation (Depth Map Calculation).
  • Camera Parameter Estimation Camera Parameter Estimation
  • Depth Map Calculation depth map calculation
  • the video captured by each camera is required to be frame-level aligned.
  • texture images can be obtained, that is, multiple images synchronized; through the calculation of internal and external parameters of the camera, camera parameters can be obtained, that is, the parameter data of the image combination, including internal parameter data And external parameter data; through depth map calculation, depth map (Depth Map) can be obtained.
  • multiple sets of texture maps and depth maps with corresponding relationships can be spliced together to form a frame of spliced image.
  • the stitched image can have a variety of stitching structures.
  • Each frame of stitched image can be used as an image combination.
  • Multiple sets of texture maps and depth maps in the image combination can be spliced and combined according to a preset relationship.
  • the texture map and depth map of the image combination can be divided into a texture map area and a depth map area according to the position relationship.
  • the texture map area stores the pixel values of each texture map
  • the depth map area stores each texture map according to a preset position relationship.
  • the texture map area and the depth map area can be continuous or spaced apart. In the embodiment of the present invention, there is no restriction on the positional relationship between the texture map and the depth map in the image combination.
  • the parameter data of each image in the image combination can be obtained from the attribute information of the image.
  • the parameter data may include external parameter data, and may also include internal parameter data.
  • the external parameter data is used to describe the spatial coordinates and posture of the shooting device
  • the internal parameter data is used to express the property information of the shooting device such as the optical center and focal length of the shooting device.
  • the internal parameter data may also include distortion parameter data.
  • the distortion parameter data includes radial distortion parameter data and tangential distortion parameter data. Radial distortion occurs in the process of converting the coordinate system of the shooting device to the physical coordinate system of the image.
  • the tangential distortion occurs in the manufacturing process of the shooting equipment, which is due to the fact that the plane of the photosensitive element is not parallel to the lens.
  • the combination of internal parameter data including distortion parameter data can make the determined spatial mapping relationship more accurate.
  • the virtual view point path can be preset.
  • a sports game such as a basketball game or a football game
  • an arc-shaped path can be planned in advance. For example, whenever a wonderful shot appears, a corresponding virtual viewpoint image is generated according to the arc-shaped path.
  • the specific application process it can be based on a specific location or perspective in the scene (such as the basket, the sidelines, the referee's perspective, the coach's perspective, etc.), or based on specific objects (such as the players on the court, the host on the scene, the audience, etc.) As well as actors in film and television images, etc.) set the virtual viewpoint path.
  • a specific location or perspective in the scene such as the basket, the sidelines, the referee's perspective, the coach's perspective, etc.
  • specific objects such as the players on the court, the host on the scene, the audience, etc.
  • actors in film and television images, etc. set the virtual viewpoint path.
  • the path data corresponding to the virtual viewpoint path may include position data of a series of virtual viewpoints in the path.
  • the texture map and depth map of the corresponding group of the relationship For example, for a virtual viewpoint location area with a higher camera density, only the texture maps and corresponding depth maps taken by the two cameras closest to the virtual viewpoint may be selected, while for a virtual viewpoint location area with a lower camera density, you can select only the texture map and the corresponding depth map taken by the two cameras closest to the virtual viewpoint. Select the texture map and the corresponding depth map taken by three or four cameras closest to the virtual viewpoint.
  • texture maps and depth maps corresponding to 2 to N acquisition devices closest to each virtual viewpoint position in the virtual viewpoint path can be selected respectively, where N is the number of all acquisition devices in the acquisition array.
  • the texture map and the depth map corresponding to the two acquisition devices closest to each virtual viewpoint position can be selected by default.
  • the user can set the selected number of collection devices closest to the virtual view point position by himself, and the maximum number does not exceed the number of collection devices corresponding to the image combination.
  • the spatial position distribution of the collection devices in the collection array there is no special requirement for the spatial position distribution of the collection devices in the collection array (for example, it can be linear distribution, arc-shaped array arrangement, or any irregular arrangement form).
  • the virtual viewpoint position data and the parameter data corresponding to the image combination determine the actual distribution of the acquisition equipment, and then adopt an adaptive strategy to select the texture map and depth map of the corresponding group in the image combination, so as to reduce the amount of data calculation, In the case of ensuring the quality of the generated virtual view point image, it provides a higher degree of freedom and flexibility in selection, and also reduces the installation requirements for the acquisition equipment in the acquisition array, which is convenient to adapt to different site requirements and easy installation and operability.
  • a preset number of corresponding sets of texture maps and depth maps that are closest to the virtual viewpoint position are selected from the image combination.
  • S293 Input the texture map and depth map of the corresponding group of each virtual view point into the graphics processor, and for each virtual view point in the virtual view point path, using the pixel as the processing unit, multiple threads respectively combine the selected images in the corresponding The pixel points in the texture map and the depth map of the group are combined and rendered to obtain an image corresponding to the virtual viewpoint.
  • GPU Graphics Processing Unit
  • display core also known as display core, visual processor, display chip, etc.
  • display core is a microprocessor that specializes in image and graphics-related operations. It can be configured in personal computers, workstations, and electronic games. Computers and some mobile terminals (such as tablet computers, smart phones, etc.) have image-related computing requirements in electronic equipment.
  • the GPU may adopt a unified device architecture (Compute Unified Device Architecture, CUDA) parallel programming architecture to perform combined rendering on the pixels in the corresponding set of texture maps and depth maps in the selected image combination.
  • CUDA is a new hardware and software architecture that is used to allocate and manage calculations on GPUs as data parallel computing devices without mapping them to a graphical application programming interface (API).
  • API graphical application programming interface
  • the GPU When programming with CUDA, the GPU can be regarded as a computing device capable of executing a large number of threads in parallel. It runs as the main CPU or the coprocessor of the host. In other words, the data-parallel and computationally intensive part of the application running on the host is delegated to the GPU.
  • a part of an application that executes multiple times and is independent of different data can be isolated into a function that runs on the GPU device, just like many different threads.
  • functions can be compiled into the instruction set of the GPU device, and the generated program (called a kernel (Kernel)) can be downloaded to the GPU.
  • the thread batches that execute the kernel are organized into thread blocks.
  • a thread block is a batch of threads that can effectively share data through some fast shared memory and synchronize its execution to coordinate memory access for coordination.
  • a synchronization point can be specified in the kernel, and the threads in the thread block will be suspended until they all reach the synchronization point.
  • the maximum number of threads that can be included in a thread block is limited.
  • blocks of the same dimension and size that execute the same kernel can be batched into a grid of threads (Grid of Thread Blocks), so that the total number of threads that can be started in a single kernel call is much larger.
  • step S293 may be implemented by the following steps:
  • S2931 Perform forward mapping on the depth maps of the corresponding group in parallel, and map them to the virtual viewpoint.
  • the forward mapping of the depth map is to map the depth map of the original camera (acquisition device) to the position of the virtual camera through the conversion of the coordinate space position, so as to obtain the depth map of the virtual camera position.
  • the forward mapping of the depth map is an operation of mapping each pixel of the depth map of the original camera (acquisition device) to a virtual viewpoint according to a preset coordinate mapping relationship.
  • the first kernel (Kernel) function can be run on the GPU, and the pixels in the depth map of the corresponding group can be forward mapped in parallel to the corresponding virtual view point position.
  • a second depth map of the virtual viewpoint position may be created based on the first depth map of the virtual viewpoint position, and for each pixel in the second depth map in parallel
  • the maximum value of the pixel points in the preset area around the corresponding pixel position in the first depth map is taken.
  • S2932 Perform post-processing in parallel on the depth map after forward mapping.
  • the virtual viewpoint depth map can be post-processed.
  • the preset second core function can be run on the GPU to perform each pixel in the second depth map obtained by the forward mapping. Perform median filtering processing on a preset area around the pixel position. Since the median filter processing can be performed on each pixel in the second depth map in parallel, the post-processing speed can be greatly accelerated, and the time-efficiency performance of the post-processing can be improved.
  • This step is to calculate the coordinates of the virtual viewpoint position in the original camera texture map according to the value of the depth map, and calculate the corresponding value through sub-pixel interpolation calculation.
  • the sub-pixel value can be directly interpolated according to bilinearity. Therefore, in this step, it is only necessary to directly obtain the value in the original camera texture according to the coordinate calculated for each pixel.
  • the preset third core function can be run on the GPU, and the pixels in the selected corresponding group of texture maps can be interpolated in parallel to generate the corresponding virtual texture map.
  • the pixels in the texture map of the selected corresponding group are interpolated in parallel to generate the corresponding virtual texture map, which can greatly accelerate the processing speed of reverse mapping and improve the time efficiency of reverse mapping performance.
  • the fourth core function can be run on the GPU to perform weighted fusion of pixels at the same position in each virtual texture map generated after reverse mapping in parallel.
  • step S2931 for the forward mapping of the depth map, first, the projection mapping relationship of each pixel can be calculated through the first Kernel function of the GPU.
  • [u,v,1] T is the homogeneous coordinates of the pixel (u,v)
  • [X,YZ] T is (u,v) corresponding to the coordinates of the real object in the camera coordinate system
  • f x , f y These are the focal lengths in the x and y directions
  • c x and c y are the optical center coordinates in the x and y directions, respectively.
  • the depth value Z of the known pixel and the physical parameters of the corresponding camera lens (f x , f y , c x , c y can be obtained from the parameter data of the aforementioned image combination Obtained), the coordinates [X, YZ] T of the corresponding point in the camera coordinate system can be obtained through the above formula (1).
  • the coordinates of the object in the current camera coordinate system can be transformed into the coordinate system of the camera where the virtual viewpoint is located according to the coordinate transformation in the three-dimensional space.
  • the following transformation formula can be used:
  • R 12 is a 3x3 rotation matrix
  • T 12 is a translation vector
  • the transformed three-dimensional coordinates are [X 1 , Y 1 , Z 1 ] T
  • the transformed virtual camera three-dimensional coordinates and virtual camera can be obtained Correspondence position of image coordinates.
  • the projection relationship of the points from the real viewpoint image to the virtual viewpoint image is established.
  • the projection depth map in the virtual viewpoint image can be obtained.
  • the value with the smallest depth value is also the value with the largest pixel value of the depth map. Therefore, by taking the value with the largest pixel value on the depth map obtained by mapping, the first depth map corresponding to the virtual view point position can be obtained.
  • the operation of taking the maximum or minimum value of multiple point mappings can be provided in a CUDA parallel environment, which can be specifically performed by calling the atomic operation function atomicMin or atomicMax possessed by CUDA.
  • the embodiment of the present invention may perform gap concealment processing on the obtained first depth map.
  • a 3*3 gap concealment process is performed on the first depth map.
  • the size range of the surrounding area during the gap concealment process can also take other values, such as 5*5. In order to obtain a better processing effect, it can be set according to experience.
  • a 3*3 or 5*5 median filter may be performed on the second depth map of the virtual view point position.
  • the second core mapping function of the GPU may operate according to the following formula:
  • step S2933 the third core function running on the GPU calculates the coordinates of the virtual viewpoint position in the original camera texture map according to the value of the depth map, and the third core function can be implemented by performing the reverse process of step S2391.
  • step S2934 for the pixel f(x, y) at the virtual viewpoint position (x, y), the pixel values of the corresponding positions of the texture map obtained by all the original camera mapping can be performed according to the confidence level conf(x, y) Weighted.
  • the fourth core function can be calculated using the following formula:
  • a virtual viewpoint image can be obtained.
  • the virtual texture map obtained after weighted fusion can also be further processed and optimized. For example, holes can be filled in parallel for each pixel in the texture map after the weighted fusion to obtain the image corresponding to the virtual viewpoint.
  • a separate windowing method can be used to perform parallel operations. For example, for each hole pixel, a window of size N*M can be opened, and then the value of the hole pixel is weighted according to the value of the non-hole pixel in the window.
  • the pixels f1 and f2 in the hole area F are respectively opened with rectangular windows a and b.
  • the pixel f1 all pixels are obtained from the existing non-hole pixels in the rectangular window (some pixels may also be obtained by down-sampling), and the value of the pixel f1 in the hole area F is obtained by weighting according to the distance (or average weighting).
  • the same operation can be used to obtain the value of the pixel f2.
  • the fifth core function can be run on the GPU to parallelize the processing and speed up the time for filling holes.
  • the fifth core function can be calculated using the following formula:
  • P(x,y) is the value of a certain point in the hole
  • Window(x,y) is the value (or down-sampling value) of all existing pixels in the hole area
  • Average is the average (or weighted average) of these pixels value.
  • the corresponding groups of each virtual viewpoint in the virtual viewpoint path may be grouped.
  • the texture map and depth map of are input to multiple GPUs, and multiple virtual viewpoint images are generated in parallel.
  • each of the above-mentioned steps may be executed by different block grids.
  • the virtual viewpoint image generation system 320 may include a CPU321 and a GPU322, among which:
  • the CPU 321 is adapted to acquire image combinations of multiple angles and free viewing angles, parameter data of the image combinations, and preset virtual viewpoint path data, where the image combination includes multiple sets of texture maps that are synchronized at multiple angles and have corresponding relationships. And a depth map; according to the preset virtual view path data and the parameter data of the image combination, select from the image combination the texture map and the depth map of the corresponding group of each virtual view point in the virtual view path;
  • the GPU322 is adapted to call the corresponding core function for each virtual view point in the virtual view point path, and perform the combined rendering in parallel on the texture map and the pixel point in the depth map of the corresponding group in the selected image combination, to obtain the corresponding virtual view point image.
  • the GPU322 is adapted to perform forward mapping of the depth maps of the corresponding group in parallel to the virtual viewpoint; perform post-processing on the depth maps of the forward mapping in parallel; and parallelize the texture maps of the corresponding group.
  • Reverse mapping is performed on the ground; the pixels in each virtual texture map generated after reverse mapping are merged in parallel.
  • the GPU 322 may use steps S2931 to S2934 in the aforementioned virtual view point image generation method, and the hole filling step to generate virtual view point images of each virtual view point.
  • steps S2931 to S2934 in the aforementioned virtual view point image generation method, and the hole filling step to generate virtual view point images of each virtual view point.
  • the GPU can be an independent GPU chip, or a GPU core in a GPU chip, or a GPU server, or it can be a GPU chip packaged by multiple GPU chips or multiple GPU cores. , Or a GPU cluster composed of multiple GPU servers.
  • the texture map and depth map of the corresponding group of each virtual viewpoint in the virtual viewpoint path can be input into multiple GPU chips, multiple GPU cores, or multiple GPU servers, respectively, to generate multiple virtual viewpoints in parallel image.
  • the virtual viewpoint path data corresponding to a certain virtual viewpoint path contains a total of 20 virtual viewpoint position coordinates
  • the data corresponding to the 20 virtual viewpoint position coordinates can be input into multiple GPU chips in parallel, for example, there are 10 GPU chips in total.
  • the data corresponding to the 20 virtual viewpoint position coordinates can be processed in parallel in two batches, and each GPU chip can generate virtual viewpoint images corresponding to the virtual viewpoint positions in parallel in units of pixels, thus greatly speeding up the processing of virtual viewpoint images.
  • the generation speed improves the time-efficiency performance of virtual viewpoint image generation.
  • An embodiment of the present invention also provides an electronic device.
  • the electronic device 330 may include a memory 331, a CPU 332, and a GPU333.
  • the computer instructions running on CPU332 and GPU333 are suitable for executing the steps of the method for generating virtual viewpoint images according to any of the foregoing embodiments of the present invention when the CPU 332 and GPU333 cooperatively run the computer instructions.
  • the CPU 332 and GPU333 cooperatively run the computer instructions.
  • the electronic device may be one server or a server cluster composed of multiple servers.
  • Each of the above embodiments can be applied to a live broadcast scenario, and two or more of the embodiments can be used in combination as needed in the application process.
  • the solutions in the above embodiments are not limited to live broadcast scenarios.
  • the solutions in the embodiments of the present invention for video or image collection, data processing of video data streams, and server image generation can also be applied to The playback requirements of non-live broadcast scenes, such as recording, broadcasting, and other scenes with low latency requirements.
  • the embodiment of the present invention also provides a computer-readable storage medium on which computer instructions are stored, and when the computer instructions are run, the steps of the method in any of the foregoing embodiments of the present invention can be executed.
  • the computer-readable storage medium may be various suitable readable storage media such as an optical disk, a mechanical hard disk, and a solid-state hard disk.
  • suitable readable storage media such as an optical disk, a mechanical hard disk, and a solid-state hard disk.
  • An embodiment of the present invention also provides a server, including a memory and a processor, the memory stores computer instructions that can run on the processor, and the processor can execute the above-mentioned instructions of the present invention when the processor runs the computer instructions.
  • a server including a memory and a processor
  • the memory stores computer instructions that can run on the processor
  • the processor can execute the above-mentioned instructions of the present invention when the processor runs the computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

虚拟视点图像生成方法、系统、电子设备及存储介质,所述方法包括:获取多角度自由视角的图像组合、所述图像组合的参数数据以及预设的虚拟视点路径数据;根据所述预设的虚拟视点路径数据及所述图像组合的参数数据,从图像组合中选择虚拟视点路径中各虚拟视点的相应组的纹理图和深度图;将各虚拟视点的相应组的纹理图和深度图输入至图形处理器中,针对虚拟视点路径中各虚拟视点,以像素点为处理单位,由多个线程分别将选择的图像组合中相应组的纹理图和深度图中的像素点进行组合渲染,得到所述虚拟视点对应的图像。上述方案可以提升数据处理速度,满足多角度自由视角视频低时延播放和实时互动的需求。

Description

虚拟视点图像生成方法、系统、电子设备及存储介质
本申请要求2019年10月28日递交的申请号为201911032857.0、发明名称为“虚拟视点图像生成方法、系统、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及图像处理技术领域,尤其涉及一种虚拟视点图像生成方法、系统、电子设备及存储介质。
背景技术
随着互联技术的不断发展,越来越多的视频平台不断通过提供更高清晰度或者观看流畅度更高的视频,来提高用户的观看体验。
然而,针对现场体验感比较强的视频,例如一场体育比赛的视频,用户在观看过程中往往只能通过一个视点位置观看比赛,无法自己自由切换视点位置,来观看不同视角位置处的比赛画面或比赛过程,也就无法体验在现场一边移动视点一边看比赛的感觉。
6自由度(6Degree of Freedom,6DoF)技术就是为了提供高自由度观看体验的一种技术,用户可以在观看中通过交互手段,来调整视频观看的视角,从用户想观看的自由视点角度进行观看,从而大幅度的提升观看体验。
然而目前6DoF视频观看方案对存储量和运算量的需求都非常大,因此需要在现场布置大量服务器进行处理,造成实施成本过高,限制条件过多,无法快速处理数据,从而不能满足多角度自由视角视频低时延播放和实时互动的需求,不利于推广普及。
经发明人研究发现,目前的虚拟视点图像生成方法速度较慢,严重制约了多角度自由视角视频低时延播放和实时互动的需求。
发明内容
有鉴于此,本发明实施例提供一种虚拟视点图像生成方法、系统、电子设备及存储介质,提升虚拟视点图像生成速度,满足多角度自由视角视频低时延播放和实时互动的需求。
本发明实施例提供了一种虚拟视点图像生成方法,包括:
获取多角度自由视角的图像组合、所述图像组合的参数数据以及预设的虚拟视点路 径数据,其中,所述图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;
根据所述预设的虚拟视点路径数据及所述图像组合的参数数据,从图像组合中选择虚拟视点路径中各虚拟视点的相应组的纹理图和深度图;
将各虚拟视点的相应组的纹理图和深度图输入至图形处理器中,针对虚拟视点路径中各虚拟视点,以像素点为处理单位,由多个线程分别将选择的图像组合中相应组的纹理图和深度图中的像素点进行组合渲染,得到所述虚拟视点对应的图像。
可选地,所述针对虚拟视点路径中各虚拟视点,以像素点为处理单位,由多个线程分别将选择的图像组合中相应组的纹理图和深度图中的像素点进行组合渲染,包括:
将相应组的深度图并行地进行前向映射,映射至所述虚拟视点上;
对前向映射后的深度图并行地进行后处理;
将相应组的纹理图并行地进行反向映射;
将反向映射后所生成的各虚拟纹理图中的像素并行地进行融合。
可选地,所述将相应组的深度图并行地进行前向映射,映射至所述虚拟视点上,包括:在所述图形处理器上运行第一核心函数,将相应组的深度图中的像素并行地进行前向映射,映射至对应的虚拟视点位置上,其中:对于多个映射到虚拟视点同一个像素的深度值,采用原子操作,取像素值最大的值,得到对应的虚拟视点位置的第一深度图;基于所述虚拟视点位置的第一深度图,创建所述虚拟视点位置的第二深度图,对于所述第二深度图中的每一个像素并行处理,取所述第一深度图中对应像素位置周围预设区域的像素点的最大值。
可选地,所述对前向映射后的深度图并行地进行后处理,包括:在所述图形处理器上运行第二核心函数,对前向映射后得到的第二深度图中的每一像素,在所述像素位置周围预设区域进行中值滤波处理。
可选地,所述将相应组的纹理图并行地进行反向映射,包括:在所述图形处理器上运行第三核心函数,将选择的相应组的纹理图中的像素并行地进行插值运算,生成对应的虚拟纹理图。
可选地,所述将反向映射后所生成的各虚拟纹理图中的像素并行地进行融合,包括:在所述图形处理器上运行第四核心函数,将反向映射后所生成的各虚拟纹理图中的同一位置的像素,并行地进行加权融合。
可选地,在将反向映射后所生成的各虚拟纹理图中的像素并行地进行融合之后,还包括:对加权融合后的纹理图中的各像素并行地进行空洞填补,得到所述虚拟视点对应 的图像。
可选地,所述将各虚拟视点的相应组的纹理图和深度图输入至图形处理器中,包括:将所述虚拟视点路径中的各虚拟视点的相应组的纹理图和深度图分别输入至多个图形处理器中,各图形处理器并行处理,生成多个虚拟视点图像。
本发明实施例提供了一种虚拟视点图像生成系统,包括:
中央处理器,适于获取多角度自由视角的图像组合、所述图像组合的参数数据以及预设的虚拟视点路径数据,其中,所述图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;根据所述预设的虚拟视点路径数据及所述图像组合的参数数据,从图像组合中选择虚拟视点路径中各虚拟视点的相应组的纹理图和深度图;
图形处理器,适于针对虚拟视点路径中各虚拟视点,以像素点为处理单位,由多个线程分别将选择的图像组合中相应组的纹理图和深度图中的像素点进行组合渲染,得到所述虚拟视点对应的图像。
可选地,所述图形处理器适于将相应组的深度图并行地进行前向映射,映射至所述虚拟视点上;对前向映射后的深度图并行地进行后处理;将相应组的纹理图并行地进行反向映射;将反向映射后所生成的各虚拟纹理图中的像素并行地进行融合。
本发明实施例还提供了一种电子设备,包括:存储器、中央处理器和图形处理器,所述存储器上存储有可在所述中央处理器和图形处理器上运行的计算机指令,所述中央处理器和图形处理器协同运行所述计算机指令时执行本发明任一实施例所述虚拟视点图像生成方法的步骤。
本发明实施例还提供了一种计算机可读存储介质,其上存储有计算机指令,所述计算机指令运行时执行本发明任一实施例所述虚拟视点图像生成方法的步骤。
采用本发明实施例的虚拟视点图像生成方案,通过将各虚拟视点的相应组的纹理涂和深度图输入至图形处理器中,针对虚拟视点路径中各虚拟视点,可以以像素点为处理单位,由多个线程分别将选择的图像组合中相应组的纹理图和深度图中的像素点进行组合渲染,得到所述虚拟视点对应的图像,整个过程中由于图形处理器可以对各虚拟视点所对应的像素点由多个线程并行地进行组合渲染,因而可以使得多角度自由视角的虚拟视点图像生成的时效性能大大加速,满足多角度自由视角视频低时延播放和实时互动的需求,提升用户体验。
进一步地,通过在所述图形处理器上运行第一核心函数,将相应组的深度图中的像素并行地进行前向映射,映射到对应的虚拟视点位置上,且对于多个映射到虚拟视点同 一个像素的深度值,采用原子操作,取像素值最大的值,得到对应的虚拟视点位置的第一深度图,可以快速地处理深度图映射中的前背景遮挡关系,而对于所创建的虚拟视点位置的第二深度图中的每一个像素,取所述第一深度图中对应像素位置周围预设区域的像素点的最大值,可以改善映射缝隙效应,由于每个像素均可以并行处理,因此可以大大加快前向映射处理速度,提升前向映射的时效性能。
进一步地,通过在所述图形处理器上运行第二核心函数,对前向映射后得到的第二深度图中的每一像素,在所述像素位置周围预设区域进行中值滤波处理,可以大大加快后处理速度,提升后处理的时效性能。
进一步地,在所述图形处理器上运行第三核心函数,将选择的相应组的纹理图中的像素并行地进行插值运算,生成对应的虚拟纹理图,可以大大加快反向映射的处理速度,提升反向映射的时效性能。
进一步地,在所述图形处理器上运行第四核心函数,将反向映射后所生成的各虚拟纹理图中的同一位置的像素,并行地进行加权融合,可以大大地加快虚拟纹理图的融合的速度,提升图像融合的时效性能。
进一步地,对加权融合后的纹理图中的各像素并行地进行空洞填补,得到所述虚拟视点对应的图像,通过空洞填补,可以提升所生成的虚拟视点图像的质量,而对于各像素并行地进行空洞填补,可以大大加快空洞填补速度,提升空洞填补的时效性能。
进一步地,将所述虚拟路径中的各虚拟视点的相应组的纹理图和深度图输入值多个所述图形处理器中,并行地生成多个虚拟视点图像,可以进一步加速虚拟视点图像生成速度,提升虚拟视点图像生成的时效性能。
附图说明
图1是本发明实施例中一种数据处理系统的结构示意图;
图2是本发明实施例中一种数据处理方法的流程图;
图3是本发明实施例中一种应用场景中数据处理系统的结构示意图;
图4是本发明实施例中一种交互终端的交互界面示意图;
图5是本发明实施例中一种服务器的结构示意图;
图6是本发明实施例中一种数据交互方法的流程图;
图7是本发明实施例中另一种数据处理系统的结构示意图;
图8是本发明实施例中另一种应用场景中数据处理系统的结构示意图;
图9是本发明实施例中一种交互终端的结构示意图;
图10是本发明实施例中另一种交互终端的交互界面示意图;
图11是本发明实施例中另一种交互终端的交互界面示意图;
图12是本发明实施例中另一种交互终端的交互界面示意图;
图13是本发明实施例中另一种交互终端的交互界面示意图;
图14是本发明实施例中另一种交互终端的交互界面示意图;
图15是本发明实施例中另一种数据处理方法的流程图;
图16是本发明实施例中一种截取压缩视频数据量中同步的视频帧的方法的流程图;
图17是本发明实施例中另一种数据处理方法的流程图;
图18是本发明实施例中一种数据处理设备的结构示意图;
图19是本发明实施例中另一种数据处理系统的结构示意图;
图20是本发明实施例中一种数据同步方法的流程图;
图21是本发明实施例中一种拉流同步的时序图;
图22是本发明实施例中另一种截取压缩视频数据量中同步的视频帧的方法的流程图;
图23是本发明实施例中另一种数据处理设备的结构示意图;
图24是本发明实施例中一种数据同步系统的结构示意图;
图25是本发明实施例中一种应用场景中的数据同步系统的结构示意图;
图26是本发明实施例中一种深度图生成方法的流程图;
图27是本发明实施例中一种服务器的结构示意图;
图28是本发明实施例中一种服务器集群进行深度图处理的示意图;
图29是本发明实施例中一种虚拟视点图像生成方法的流程图;
图30是本发明实施例中一种GPU进行组合渲染的方法的流程图;
图31是本发明实施例中一种空洞填补方法的示意图;
图32是本发明实施例中一种虚拟视点图像生成系统的结构示意图;
图33是本发明实施例中一种电子设备的结构示意图。
图34是本发明实施例中另一种数据同步系统的结构示意图;
图35是本发明实施例中另一种数据同步系统的结构示意图;
图36是本发明实施例中一种采集设备的结构示意图。
图37是本发明实施例中一种应用场景中采集阵列的示意图。
图38是本发明实施例中另一种数据处理系统的结构示意图。
图39是本发明实施例中另一种交互终端的结构示意图。
图40是本发明实施例中另一种交互终端的交互界面示意图。
图41是本发明实施例中另一种交互终端的交互界面示意图。
图42是本发明实施例中另一种交互终端的交互界面示意图。
图43是本发明实施例中一种交互终端的连接示意图。
图44是本发明实施例中一种交互终端的交互操作示意图。
图45是本发明实施例中另一种交互终端的交互界面示意图。
图46是本发明实施例中另一种交互终端的交互界面示意图。
图47是本发明实施例中另一种交互终端的交互界面示意图。
图48是本发明实施例中另一种交互终端的交互界面示意图。
具体实施方式
在传统的直播、转播和录播等播放场景中,如前所述,用户在观看过程中往往只能通过一个视点位置观看比赛,无法自己自由切换视点位置,来观看不同视角位置处的比赛画面或比赛过程,也就无法体验在现场一边移动视点一边看比赛的感觉。
采用6自由度(6Degree of Freedom,6DoF)技术可以提供高自由度观看体验,用户可以在观看过程中通过交互手段,来调整视频观看的视角,从想观看的自由视点角度进行观看,从而大幅度的提升观看体验。
为实现6DoF场景,目前有Free-D回放技术、光场渲染技术及基于深度图的6DoF视频生成技术等。其中,Free-D回放技术是通过多角度拍摄获取场景的点云数据对6DoF图像进行表达,而光场渲染技术是通过密集光场的焦距和空间位置变化获得像素的景深信息和三维位置信息,进而对6DoF图像进行表达。基于深度图的6DoF视频生成方法是基于所述虚拟视点位置及对应组的纹理图和深度图对应的参数数据,将用户交互时刻所述视频帧的图像组合中相应组的纹理图和深度图进行组合渲染,进行6DoF图像或视频的重建。
例如,在现场使用Free-D回放方案时,需要采用大量相机进行原始数据采集,并通过数字分量串行接口(Serial Digital Interface,SDI)采集卡汇总到现场的计算机房,然后通过现场机房中的计算服务器对原始数据进行处理,获得对空间所有点的三维位置以及像素信息进行表达的点云数据,并重建6DoF场景。这种方案使得现场采集、传输和 计算的数据量极大,尤其对于直播和转播这类对传输网络以及计算服务器有很高要求的播放场景,6DoF重建场景实施成本过高,限制条件过多。并且,目前并没有很好的技术标准和工业级软硬件对点云数据进行支持,因此,从现场的原始数据采集到最终6DoF重建场景,需要花费较长的数据处理时间,从而不能满足多角度自由视角视频的低时延播放和实时互动的需求。
又例如,在现场使用光场渲染方案时,需要通过密集光场的焦距和空间位置变化获得像素的景深信息和三维位置信息,由于密集光场获取的光场图像分辨率过大,往往需要分解成几百张常规的二维图片,因此,这种方案也使得现场采集、传输和计算的数据量极大,对于现场的传输网络以及计算服务器有很高的要求,实施成本过高,限制条件过多,无法快速处理数据。并且,通过光场图像重建6DoF场景的技术手段仍然在实验探索中,目前无法有效地满足多角度自由视角视频的低时延播放和实时互动的需求。
综上所述,无论是Free-D回放技术还是光场渲染技术,对存储量和运算量的需求都非常大,因此需要在现场布置大量服务器进行处理,造成实施成本过高,限制条件过多,无法快速处理数据,从而不能满足观看和互动的需求,不利于推广普及。
基于深度图的6DoF视频重建方法虽然可以减小视频重建过程中的数据运算量,但由于网络传输带宽、设备解码能力等多种因素的约束,也难以满足多角度自由视角视频的低时延播放及实时互动的需求。
针对上述问题,本发明一些实施例提出了多角度自由视角图像生成方案,采用分布式系统架构,其中,在现场采集区域设置多个采集设备组成的采集阵列进行多个角度的帧图像的同步采集,通过数据处理设备对采集设备采集到的帧图像根据帧截取指令进行视频帧的截取,并由服务器将数据处理设备上传的多个同步视频帧的帧图像作为图像组合,可以确定所述图像组合相应的参数数据和所述图像组合中各帧图像的深度数据,基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频数据,并将所述多角度自由视角视频数据插入至播放控制设备的待播放数据流以用于传输至播放终端进行播放。
参照本发明实施例中一种应用场景的数据处理系统的结构示意图,数据处理系统10包括:数据处理设备11、服务器12、播放控制设备13和播放终端14,其中,数据处理设备11可以对现场采集区域中采集阵列采集到的帧图像进行视频帧的截取,通过对待生成多角度自由视角图像的视频帧进行截取,可以避免大量的数据传输及数据处理,之后, 由服务器12进行多角度自由视角图像的生成,可以充分利用服务器强大的计算能力,即可快速地生成多角度自由视角视频数据,从而可以及时地插入播放控制设备的待播放数据流中,以低廉的成本实现多角度自由视角的播放,满足用户对多角度自由视角视频低时延播放和实时互动的需求。
为使本领域技术人员更加清楚地了解及实施本说明书实施例,下面将结合本说明书实施例中的附图,对本说明书实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本说明书的一部分实施例,而不是全部实施例。基于本说明书中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都应属于本说明书保护的范围。
参照图2所示的数据处理方法的流程图,在本发明实施例中,具体可以包括如下步骤:
S21,接收数据处理设备上传的多个同步视频帧的帧图像作为图像组合。
其中,所述多个同步视频帧为所述数据处理设备基于视频帧截取指令,在从现场采集区域不同位置实时同步采集并上传的多路视频数据流中对指定帧时刻的视频帧截取得到,所述多个同步视频帧的拍摄视角不同。
在具体实施中,所述视频帧截取指令可以包括指定帧时刻的信息,所述数据处理设备根据所述视频帧截取指令中的指定帧时刻的信息,从多路视频数据流中截取相应帧时刻的视频帧。其中,所述指定帧时刻可以以帧为单位,将第N至M帧作为指定帧时刻,N和M均为不小于1的整数,且N≤M;或者,所述指定帧时刻也可以以时间为单位,将第X至Y秒作为指定帧时刻,X和Y均为正数,且X≤Y。因此,多个同步视频帧可以包括指定帧时刻对应的所有帧级同步的视频帧,各视频帧的像素数据形成对应的帧图像。
例如,数据处理设备根据接收到的视频帧截取指令,可以获得指定帧时刻为多路视频数据流中的第2帧,则数据处理设备分别截取各路视频数据流中第2帧的视频帧,且截取的各路视频数据流的第2帧的视频帧之间帧级同步,作为获取得到的多个同步视频帧。
又例如,假设采集帧率设置为25fps,即1秒采集25帧,数据处理设备根据接收到的视频帧截取指令,可以获得指定帧时刻为多路视频数据流中的第1秒内的视频帧,则数据处理设备可以分别截取各路视频数据流中第1秒内的25个视频帧,且截取的各路视频数据流中第1秒内的第1个视频帧之间帧级同步,截取的各路视频数据流中第1秒内 的第2个视频帧之间帧级同步,直至取的各路视频数据流中第1秒内的第25个视频帧之间帧级同步,作为获取得到的多个同步视频帧。
还例如,数据处理设备根据接收到的视频帧截取指令,可以获得指定帧时刻为多路视频数据流中的第2帧和第3帧,则数据处理设备可以分别截取各路视频数据流中第2帧的视频帧和第3帧的视频帧,且截取的各路视频数据流的第2帧的视频帧之间和第3帧的视频帧之间分别帧级同步,作为多个同步视频帧。
S22,确定所述图像组合相应的参数数据。
在具体实施中,可以通过参数矩阵来获得所述图像组合相应的参数数据,所述参数矩阵可以包括内参矩阵,外参矩阵、旋转矩阵和平移矩阵等。由此,可以确定空间物体表面指定点的三维几何位置与其在图像组合中对应点之间的相互关系。
在本发明的实施例中,可以采用运动重构(Structure From Motion,SFM)算法,基于参数矩阵,对获取到的图像组合进行特征提取、特征匹配和全局优化,获得的参数估计值作为图像组合相应的参数数据。其中,特征提取采用的算法可以包括以下任意一种:尺度不变特征变换(Scale-Invariant Feature Transform,SIFT)算法、加速稳健特征(Speeded-Up Robust Features,SURF)算法、加速段测试的特征(Features from Accelerated Segment Test,FAST)算法。特征匹配采用的算法可以包括:欧式距离计算方法、随机样本一致性(Random Sample Consensus,RANSC)算法等。全局优化的算法可以包括:光束法平差(Bundle Adjustment,BA)等。
S23,确定所述图像组合中各帧图像的深度数据。
在具体实施中,可以基于所述图像组合中多个帧图像,确定各帧图像的深度数据。其中,深度数据可以包括与图像组合中各帧图像的像素对应的深度值。采集点到现场中各个点的距离可以作为上述深度值,深度值可以直接反映待观看区域中可见表面的几何形状。例如,以拍摄坐标系的原点作为光心,深度值可以是现场中各个点沿着拍摄光轴到光心的距离。本领域技术人员可以理解的是,上述距离可以是相对数值,多个帧图像可以采用相同的基准。
在本发明一实施例中,可以采用双目立体视觉的算法,计算各帧图像的深度数据。除此之外,深度数据还可以通过对帧图像的光度特征、明暗特征等特征进行分析间接估算得到。
在本发明另一实施例中,可以采用多视点三维重建(Mult-View Stereo,MVS)算法进行帧图像重建。重建过程中可以采用所有像素进行重建,也可以对像素进行降采样仅 用部分像素重建。具体而言,可以对每个帧图像的像素点都进行匹配,重建每个像素点的三维坐标,获得具有图像一致性的点,然后计算各个帧图像的深度数据。或者,可以对选取的帧图像的像素点进行匹配,重建各选取的帧图像的像素点的三维坐标,获得具有图像一致性的点,然后计算相应帧图像的深度数据。其中,帧图像的像素数据与计算得到的深度数据对应,选取帧图像的方式可以根据具体情景来设定,比如,可以根据需要计算深度数据的帧图像与其他帧图像之间的距离,选择部分帧图像。
S24,基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频数据。
所述多角度自由视角视频数据可以包括:按照帧时刻排序的帧图像的多角度自由视角空间数据和多角度自由视角时间数据。
其中,帧图像的像素数据可以为YUV数据或RGB数据中任意一种,或者也可以是其它能够对帧图像进行表达的数据;深度数据可以包括与帧图像的像素数据一一对应的深度值,或者,可以是对与帧图像的像素数据一一对应的深度值集合中选取的部分数值,具体的选取方式根据具体的情景而定;所述虚拟视点选自多角度自由视角范围,所述多角度自由视角范围为支持对待观看区域进行视点的切换观看的范围。
在具体实施中,预设帧图像可以是图像组合中所有的帧图像,也可以是选择的部分帧图像。其中,选取的方式可以根据具体情景来设定,例如,可以根据采集点之间的位置关系,选择图像组合中相应位置的部分帧图像;又例如,可以根据想要获取的帧时刻或帧时段,选择图像组合中相应帧时刻的部分帧图像。
由于所述预设的帧图像可以对应不同的帧时刻,因此,可以对虚拟视点路径中各虚拟视点与各帧时刻进行对应,根据各虚拟视点相对应的帧时刻获取相应的帧图像,然后基于所述图像组合相应的参数数据、各虚拟视点的帧时刻对应的帧图像的深度数据和像素数据,对各虚拟视点进行帧图像重建,获得相应的多角度自由视角视频数据,此时,多角度自由视角视频数据可以包括:按照帧时刻排序的帧图像的多角度自由视角空间数据和多角度自由视角时间数据。换言之,在具体实施中,除了可以实现某一个时刻的多角度自由视角图像,还可以实现时序上连续的或非连续的多角度自由视角视频。
在本发明一实施例中,所述图像组合包括A个同步视频帧,其中,a1个同步视频帧对应第一帧时刻,a2个同步视频帧对应第二帧时刻,a1+a2=A;并且,预设有B个虚拟视点组成的虚拟视点路径,其中b1个虚拟视点与第一帧时刻相对应,b2个虚拟视点与 第二帧时刻相对应,b1+b2≤2B,则基于所述图像组合相应的参数数据、第一帧时刻的a1个同步视频帧的帧图像的像素数据和深度数据,对b1个虚拟视点组成的路径进行第一帧图像重建,基于所述图像组合相应的参数数据、第一帧时刻的a21个同步视频帧的帧图像的像素数据和深度数据,对b2个虚拟视点组成的路径进行第二帧图像重建,最终获得相应的多角度自由视角视频数据,其中,所述多角度自由视角视频数据可以包括:按照帧时刻排序的帧图像的多角度自由视角空间数据和多角度自由视角时间数据。
可以理解的是,可以将指定帧时刻和虚拟视点进行更细的划分,由此得到更多的不同帧时刻对应的同步视频帧和虚拟视点,上述实施例仅为举例说明,并非对具体实施方式的限制。
在本发明实施例中,可以采用基于深度图的图像绘制(Depth Image Based Rendering,DIBR)算法,根据所述图像组合相应的参数数据和预设的虚拟视点路径,对预设的帧图像的像素数据和深度数据进行组合渲染,从而实现基于预设的虚拟视点路径的帧图像重建,获得相应的多角度自由视角视频数据。
S25,将所述多角度自由视角视频数据插入至播放控制设备的待播放数据流并通过播放终端进行播放。
播放控制设备可以将多路视频数据流作为输入,其中,视频数据流可以来自采集阵列中各采集设备,也可以来自其他采集设备。播放控制设备可以根据需要选择一路输入的视频数据流作为待播放数据流,其中,可以选择前述步骤S24获得的多角度自由视角视频数据插入待播放数据流,或者由其他输入接口的视频数据流切换至所述多角度自由视角视频数据的输入接口,播放控制设备将选择的待播放数据流输出至播放终端,即可通过播放终端进行播放,因此用户可以通过播放终端观看到多角度自由视角的视频图像。播放终端可以是电视、手机、平板、电脑等视频播放设备或包含显示屏的电子设备。
在具体实施中,插入播放控制设备的待播放数据流的多角度自由视角视频数据可以保留在播放终端中,以便于用户进行时移观看,其中,时移可以是用户观看时进行的暂停、后退、快进到当前时刻等操作。
采用上述数据处理方法,可以使用分布式系统架构中的数据处理设备处理指定视频帧的截取以及服务器对预设帧图像进行截取后的多角度自由视角视频的重建,可以避免在现场布置大量服务器进行处理,也可以避免将采集阵列的采集设备采集到的视频数据流直接上传,因此可以节省大量的传输资源及服务器处理资源,且在网络传输带宽有限的情况下,使得指定视频帧的多角度自由视角视频可以实时重建,实现多角度自由视角 视频的低时延播放,减小网络传输带宽的限制,因而可以降低实施成本,减少限制条件,易于实现,满足多角度自由视角视频低时延播放和实时互动的需求。
在具体实施中,可以根据所述预设的虚拟视点路径中各虚拟视点的虚拟参数数据以及所述图像组合相应的参数数据之间的关系,将所述图像组合中预设的帧图像的深度数据分别映射至相应的虚拟视点;根据分别映射至相应的虚拟视点的预设帧图像的像素数据和深度数据,以及预设的虚拟视点路径,进行帧图像重建,获得相应的多角度自由视角视频数据。
其中,所述虚拟视点的虚拟参数数据可以包括:虚拟观看位置数据和虚拟观看角度数据;所述图像组合相应的参数数据可以包括:采集位置数据和拍摄角度数据等。可以先采用前向映射,进而进行反向映射的方法,得到重建后的图像。
在具体实施中,采集位置数据和拍摄角度数据可以称作外部参数数据,参数数据还可以包括内部参数数据,所述内部参数数据可以包括采集设备的属性数据,从而可以更加准确地确定映射关系。例如,内部参数数据可以包括畸变数据,由于考虑到畸变因素,可以从空间上进一步准确地确定映射关系。
在一具体实施例中,为了后续能够方便获取数据,可以基于所述图像组合的像素数据及深度数据,生成所述图像组合相应的拼接图像,所述拼接图像可以包括第一字段和第二字段,其中,所述第一字段包括所述图像组合的像素数据,所述第二字段包括所述图像组合的深度数据,然后,存储所述图像组合相应的拼接图像及相应的参数数据。
在另一具体实施例中,为了节约存储空间,可以基于所述图像组合中预设帧图像的像素数据及深度数据,生成所述图像组合中预设帧图像相应的拼接图像,所述预设帧图像相应的拼接图像可以包括第一字段和第二字段,其中,所述第一字段包括所述预设帧图像的像素数据,所述第二字段包括所述预设帧图像的深度数据,然后,仅存储所述预设帧图像相应的拼接图像及相应的参数数据即可。
其中,所述第一字段与所述第二字段相对应,所述拼接图像可以分为图像区域以及深度图区域,图像区域的像素字段存储所述多个帧图像的像素数据,深度图区域的像素字段存储所述多个帧图像的深度数据;所述图像区域中存储帧图像的像素数据的像素字段作为所述第一字段,所述深度图区域中存储帧图像的深度数据的像素字段作为所述第二字段;获取的图像组合的拼接图像和及所述图像组合相应的参数数据可以存入数据文件中,当需要获取拼接图像或相应的参数数据时,可以根据数据文件的头文件中包含的存储地址,从相应的存储空间中读取。
此外,图像组合的存储格式可以为视频格式,图像组合的数量可以是多个,每个图像组合可以是对视频进行解封装和解码后,对应不同帧时刻的图像组合。
在具体实施中,可以基于收到的来自交互终端的图像重建指令,确定交互时刻的交互帧时刻信息,将存储的对应交互帧时刻的相应的图像组合预设帧图像的拼接图像及相应图像组合对应的参数数据发送至所述交互终端,使得所述交互终端基于交互操作所确定的虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据及对应的参数数据,将选择的像素数据和深度数据与所述参数数据进行组合渲染,重建得到所述待交互的虚拟视点位置对应的多角度自由视角视频数据并进行播放。
其中,所述预设规则可以根据具体情景来设定,比如,可以基于交互操作确定的虚拟视点位置信息,选择按距离排序最靠近交互时刻的虚拟视点的W个临近的虚拟视点的位置信息,并在拼接图像中获取包括交互时刻的虚拟视点的上述共W+1个虚拟视点对应的满足交互帧时刻信息的像素数据和深度数据。
其中,所述交互帧时刻信息基于来自交互终端的触发操作确定,所述触发操作可以是用户输入的触发操作,也可以是交互终端自动生成的触发操作,例如,交互终端在检测到存在多角度自由视点数据帧的标识时可以自动发起触发操作。在用户手动触发时,可以是交互终端显示交互提示信息后用户选择触发交互的时刻信息,也可以是交互终端接收到用户操作触发交互的历史时刻信息,所述历史时刻信息可以为位于当前播放时刻之前的时刻信息。
在具体实施中,所述交互终端可以基于获取的交互帧时刻的图像组合中预设帧图像的拼接图像及对应的参数数据,交互帧时刻信息以及交互帧时刻的虚拟视点位置信息,采用与上述步骤S24相同的方法对获取的交互帧时刻的图像组合中预设帧图像的拼接图像的像素数据和深度数据进行组合渲染,获得所述交互的虚拟视点位置对应的多角度自由视角视频数据,并在所述交互的虚拟视点位置开始播放多角度自由视角视频。
采用上述方案,可以基于来自交互终端的图像重建指令随时生成交互的虚拟视点位置对应的多角度自由视角视频数据,可以进一步提升用户互动体验。
参照图1所示的数据处理系统的结构示意图,在本发明实施例中,如图1所示,数据处理系统10可以包括:数据处理设备11、服务器12、播放控制设备13以及播放终端14,其中:
所述数据处理设备11,适于基于视频帧截取指令,从所述现场采集区域不同位置实时同步采集的多路视频数据流中对指定帧时刻的视频帧截取得到多个同步视频帧,将获 得的所述指定帧时刻的多个同步视频帧上传至所述服务器,其中,多路视频数据流可以是采用压缩格式的视频数据流,也可以是采用非压缩格式的视频数据流;
所述服务器12,适于将接收到的所述数据处理设备11上传的指定帧时刻的多个同步视频帧的帧图像作为图像组合,确定所述图像组合相应的参数数据及所述图像组合中各帧图像的深度数据,并基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频数据,所述多角度自由视角视频数据包括:按照帧时刻排序的帧图像的多角度自由视角空间数据和多角度自由视角时间数据;
所述播放控制设备13,适于将所述多角度自由视角视频数据插入待播放数据流;
所述播放终端14,适于接收来自所述播放控制设备13的待播放数据流并进行实时播放。
在具体实施中,播放控制终端13可以基于控制指令输出待播放数据流。作为可选示例,播放控制设备13可以从多路数据流中选择一路作为待播放数据流,或者在多路数据流中不断地切换选择以持续地输出所述待播放数据流。导播控制设备可以作为本发明实施例中的一种播放控制设备。其中导播控制设备可以为基于外部输入控制指令进行播放控制的人工或半人工导播控制设备,也可以为基于人工智能或大数据学习或预设算法能够自动进行导播控制的虚拟导播控制设备。
采用上述数据处理系统,可以采用分布式系统架构中的数据处理设备处理指定视频帧的截取以及服务器对预设帧图像进行截取后的多角度自由视角视频的重建,可以避免在现场布置大量服务器进行处理,也可以避免将采集阵列的采集设备采集到的视频数据流直接上传,因此可以节省大量的传输资源及服务器处理资源,且在网络传输带宽有限的情况下,使得指定视频帧的多角度自由视角视频可以实时重建,实现多角度自由视角视频的低时延播放,减小网络传输带宽的限制,因而可以降低实施成本,减少限制条件,易于实现,满足多角度自由视角视频低时延播放和实时互动的需求。
在具体实施中,所述服务器12还适于基于所述图像组合中预设帧图像的像素数据和深度数据,生成所述图像组合中预设帧时刻的拼接图像,所述拼接图像包括第一字段和第二字段,其中,所述第一字段包括所述图像组合中预设帧图像的像素数据,所述第二字段包括所述图像组合中预设帧图像的深度数据的第二字段,并存储所述图像组合的拼接图像及所述图像组合相应的参数数据。
在具体实施中,所述数据处理系统10还可以包括交互终端15,适于基于触发操作, 确定交互帧时刻信息,向服务器发送包含交互帧时刻信息的图像重建指令,接收从所述服务器返回的对应交互帧时刻的图像组合中预设帧图像的拼接图像及对应的参数数据,并基于交互操作确定虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据,基于选择的像素数据和深度数据与所述参数数据进行组合渲染,重建得到交互帧时刻虚拟视点位置对应的多角度自由视角视频数据并进行播放。
其中,所述播放终端14的数量可以是一个或多个,所述交互终端15的数量可以是一个或多个,所述播放终端14与所述交互终端15可以为同一终端设备。并且,可以采用服务器、播放控制设备或者交互终端中的至少一种作为视频帧截取指令的发射端,也可以采用其他能够发射视频帧截取指令的设备。
需要说明的是,在具体实施中,根据用户需求,所述数据处理设备和所述服务器的位置可以灵活部署。例如,所述数据处理设备可以置于现场非采集区域或云端。又如,所述服务器可以置于现场非采集区域,云端或者终端接入侧,比如,在终端接入侧,基站、机顶盒、路由器、家庭数据中心服务器、热点设备等边缘节点设备均可以作为所述服务器,用以获得多角度自由视角数据。或者,所述数据处理设备和所述服务器也可以集中设置在一起,作为一个服务器集群进行协同工作,实现多角度自由视角数据的快速生成,以实现多角度自由视角视频的低时延播放及实时互动。
采用上述方案,可以基于来自交互终端的图像重建指令随时生成待交互的虚拟视点位置对应的多角度自由视角视频数据,可以进一步提升用户互动体验。
为使本领域技术人员更好地理解和实现本发明实施例,以下通过具体的应用场景详细说明数据处理系统。
如图3所示,为一种应用场景中数据处理系统的结构示意图,其中,示出了一场篮球赛的数据处理系统的布置场景,所述数据处理系统包括由多个采集设备组成的采集阵列31、数据处理设备32、云端的服务器集群33、播放控制设备34,播放终端35和交互终端36。
参照图3,以左侧的篮球框作为核心看点,以核心看点为圆心,与核心看点位于同一平面的扇形区域作为预设的多角度自由视角范围。所述采集阵列31中各采集设备可以根据所述预设的多角度自由视角范围,成扇形置于现场采集区域不同位置,可以分别从相应角度实时同步采集视频数据流。
在具体实施中,采集设备还可以设置在篮球场馆的顶棚区域、篮球架上等。各采集设备可以沿直线、扇形、弧线、圆形或者不规则形状排列分布。具体排列方式可以根据 具体的现场环境、采集设备数量、采集设备的特点、成像效果需求等一种或多种因素进行设置。所述采集设备可以是任何具有摄像功能的设备,例如,普通的摄像机、手机、专业摄像机等。
而为了不影响采集设备工作,所述数据处理设备32可以置于现场非采集区域,可视为现场服务器。所述数据处理设备32可以通过无线局域网向所述采集阵列31中各采集设备分别发送拉流指令,所述采集阵列31中各采集设备基于所述数据处理设备32发送的拉流指令,将获得的视频数据流实时传输至所述数据处理设备32。其中,所述采集阵列31中各采集设备可以通过交换机37将获得的视频数据流实时传输至所述数据处理设备32。
当所述数据处理设备32接收到视频帧截取指令时,从接收到的多路视频数据流中对指定帧时刻的视频帧截取得到多个同步视频帧的帧图像,并将获得的所述指定帧时刻的多个同步视频帧上传至云端的服务器集群33。
相应地,云端的服务器集群33将接收的多个同步视频帧的帧图像作为图像组合,确定所述图像组合相应的参数数据及所述图像组合中各帧图像的深度数据,并基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频数据,所述多角度自由视角视频数据可以包括:按照帧时刻排序的帧图像的多角度自由视角空间数据和多角度自由视角时间数据。
服务器可以置于云端,并且为了能够更快速地并行处理数据,可以按照处理数据的不同,由多个不同的服务器或服务器组组成云端的服务器集群33。
例如,所述云端的服务器集群33可以包括:第一云端服务器331,第二云端服务器332,第三云端服务器333,第四云端服务器334。其中,第一云端服务器331可以用于确定所述图像组合相应的参数数据;第二云端服务器332可以用于确定所述图像组合中各帧图像的深度数据;第三云端服务器333可以基于所述图像组合相应的参数数据、所述图像组合的像素数据和深度数据,使用基于深度图的虚拟视点重建(Depth Image Based Rendering,DIBR)算法,对预设的虚拟视点路径进行帧图像重建;所述第四云端服务器334可以用于生成多角度自由视角视频,其中,所述多角度自由视角视频数据可以包括:按照帧时刻排序的帧图像的多角度自由视角空间数据和多角度自由视角时间数据。
可以理解的是,所述第一云端服务器831、第二云端服务器832、第三云端服务器833、第四云端服务器834也可以为服务器阵列或服务器子集群组成的服务器组,本发明 实施例不做限制。
在具体实施中,云端的服务器集群33可以采用如下方式存储所述图像组合的像素数据及深度数据:
基于所述图像组合的像素数据及深度数据,生成对应帧时刻的拼接图像,所述拼接图像包括第一字段和第二字段,其中,所述第一字段包括所述图像组合中预设帧图像的像素数据,所述第二字段包括所述图像组合中预设帧图像的深度数据的第二字段。获取的拼接图像和相应的参数数据可以存入数据文件中,当需要获取拼接图像或参数数据时,可以根据数据文件的头文件中相应的存储地址,从相应的存储空间中读取。
然后,播放控制设备34可以将接收到的所述多角度自由视角视频数据插入待播放数据流中,播放终端35接收来自所述播放控制设备34的待播放数据流并进行实时播放。其中,播放控制设备34可以为人工播放控制设备,也可以为虚拟播放控制设备。在具体实施中,可以设置专门的可以自动切换视频流的服务器作为虚拟播放控制设备进行数据源的控制。导播控制设备如导播台可以作为本发明实施例中的一种播放控制设备。
可以理解的是,所述数据处理设备32可以根据具体情景置于现场非采集区域或云端,所述服务器(集群)和播放控制设备可以根据具体情景置于现场非采集区域,云端或者终端接入侧,本实施例并不用于限制本发明的具体实现和保护范围。
如图4所示的交互终端的交互界面示意图,交互终端40的交互界面上具有进度条41,结合图3和图4,交互终端40可以将来自所述数据处理设备32接收的指定帧时刻与进度条相关联,可以在进度条41上生成数个互动标识,例如互动标识42和43。其中,进度条41黑色段为已播放部分41a,进度条41空白段为未播放部分41b。
当交互终端的系统读取到进度条41上相应的互动标识43,交互终端40的界面可以显示交互提示信息。例如,当用户选择操作触发当前的互动标识43时,交互终端40接收到反馈后生成互动标识43相对应的交互帧时刻的图像重建指令,并发送包含交互帧时刻信息的图像重建指令至所述云端的服务器集群33。当用户未选择触发时,交互终端40可以继续读取后续视频数据,所述进度条上已播放部分41a继续前进。用户也可以在观看时选择触发历史互动标识,例如触发进度条上已播放部分41a展示的互动标识42,交互终端40接收到反馈后生成互动标识42对应的交互帧时刻的图像重建指令。
当云端的服务器集群33收到的来自交互终端40的图像重建指令时,可以提取所述相应图像组合中预设帧图像的拼接图像及相应图像组合相应的参数数据并传输至所述交互终端40。
交互终端40基于触发操作,确定交互帧时刻信息,向服务器发送包含交互帧时刻信息的图像重建指令,接收从云端的服务器集群33返回的对应交互帧时刻的图像组合中预设帧图像的拼接图像及对应的参数数据,并基于交互操作确定虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据及对应的参数数据,将选择的像素数据和深度数据进行组合渲染,重建得到所述交互帧时刻虚拟视点位置对应的多角度自由视角视频数据并进行播放。
可以理解的是,所述采集阵列中各采集设备与所述数据处理设备之间可以通过交换机和/或局域网进行连接,播放终端、交互终端数量均可以是一个或多个,所述播放终端与所述交互终端可以为同一终端设备,所述数据处理设备可以根据具体情景置于现场非采集区域或云端,所述服务器可以根据具体情景置于现场非采集区域,云端或者终端接入侧,本实施例并不用于限制本发明的具体实现和保护范围。
本发明实施例还提供了与上述数据处理方法相应的服务器,为使本领域技术人员更好地理解和实现本发明实施例,以下参照附图,通过具体实施例进行详细介绍。
参照图5所示的服务器的结构示意图,在本发明实施例中,如图5所示,服务器50可以包括:
数据接收单元51,适于接收所述数据处理设备上传的多个同步视频帧的帧图像作为图像组合;
参数数据计算单元52,适于确定所述图像组合相应的参数数据;
深度数据计算单元53,适于确定所述图像组合中各帧图像的深度数据;
视频数据获取单元54,适于基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频数据,其中,所述多角度自由视角视频数据包括:按照帧时刻排序的帧图像的多角度自由视角空间数据和多角度自由视角时间数据。
第一数据传输单元55,适于将所述多角度自由视角视频数据插入至播放控制设备的待播放数据流并通过播放终端进行播放。
其中,所述多个同步视频帧可以为所述数据处理设备基于视频帧截取指令,在从现场采集区域不同位置实时同步采集并上传的多路视频数据流中对指定帧时刻的视频帧截取得到,所述多个同步视频帧的拍摄角度不同。
所述服务器可以根据具体情景置于现场非采集区域,云端或者终端接入侧。
在具体实施中,插入播放控制设备的待播放数据流的多角度自由视角视频数据可以 保留在播放终端中,以便于用户进行时移观看,其中,时移可以是用户观看时进行的暂停、后退、快进到当前时刻等操作。
在具体实施中,如图5所示,所述视频数据获取单元54可以包括:
数据映射子单元541,适于根据所述预设的虚拟视点路径中各虚拟视点的虚拟参数数据以及所述图像组合相应的参数数据之间的关系,将所述图像组合中预设的帧图像的深度数据分别映射至相应的虚拟视点;
数据重建子单元542,适于根据分别映射至相应的虚拟视点的预设帧图像的像素数据和深度数据,以及预设的虚拟视点路径,进行帧图像重建,获得相应的多角度自由视角视频数据。
在具体实施中,如图5所示,所述服务器50还可以包括:
拼接图像生成单元56,适于基于所述图像组合中预设帧图像的像素数据和深度数据,生成所述图像组合相应的拼接图像,所述拼接图像可以包括第一字段和第二字段,其中,所述第一字段包括所述图像组合中预设帧图像的像素数据,所述第二字段包括所述图像组合中预设帧图像的深度数据;
数据存储单元57,适于存储所述图像组合的拼接图像及所述图像组合相应的参数数据。
在具体实施中,如图5所示,所述服务器50还可以包括:
数据提取单元58,适于基于收到的来自交互终端的图像重建指令,确定交互时刻的交互帧时刻信息,提取对应交互帧时刻的相应图像组合中预设帧图像的拼接图像及相应图像组合相应的参数数据;
第二数据传输单元59,适于将所述数据提取单元58提取的所述拼接图像以及相应的参数数据传输至所述交互终端,使得所述交互终端基于交互操作所确定的虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据及对应的参数数据,将选择的像素数据和深度数据进行组合渲染,重建得到交互帧时刻的虚拟视点位置对应的多角度自由视角视频数据并进行播放。
采用上述方案,可以基于来自交互终端的图像重建指令随时生成待交互的虚拟视点位置对应的多角度自由视角视频数据,可以进一步提升用户互动体验。
本发明实施例还提供了一种数据交互方法和数据处理系统,可以通过从播放控制设备实时获取待播放数据流并进行实时播放展示,待播放数据流中各互动标识与所述视频数据的指定帧时刻关联,然后,可以响应于对一互动标识的触发操作,获取对应于所述 互动标识的指定帧时刻的交互数据,由于所述交互数据可以包括多角度自由视角数据,因此可以基于所述交互数据,对所述指定帧时刻进行多角度自由视角展示。
采用本发明实施例中的数据交互方案,在播放过程中,可以根据互动标识的触发操作,获取交互数据,进而进行多角度自由视角展示,以提升用户交互体验。以下参照附图,特别针对数据交互方法和数据处理系统通过具体实施例进行详细说明。
参照图6所示的数据交互方法的流程图,以下通过具体步骤说明本发明实施例所采用的数据交互方法。
S61,从播放控制设备实时获取待播放数据流并进行实时播放展示,所述待播放数据流包括视频数据及互动标识,各互动标识与所述待播放数据流的指定帧时刻关联。
其中,所述指定帧时刻可以以帧为单位,将第N至M帧作为指定帧时刻,N和M为不小于1的整数,且N≤M;或者,所述指定帧时刻也可以以时间为单位,将第X至Y秒作为指定帧时刻,X和Y为正数,且X≤Y。
在具体实施中,所述待播放数据流可以与数个指定帧时刻进行关联,所述播放控制设备可以基于各指定帧时刻的信息,生成各指定帧时刻相应的互动标识,从而在实时播放展示待播放数据流时,可以在指定帧时刻显示相应的互动标识。其中,各互动标识与视频数据可以根据实际情况采用不同的方式进行关联。
在本发明一实施例中,所述待播放数据流可以包括数个与所述视频数据相应的帧时刻,由于各互动标识也有相应的指定帧时刻,因此,可以匹配各互动标识相应的指定帧时刻的信息和所述待播放数据流中的各帧时刻的信息,可以将相同信息的帧时刻与互动标识进行关联,从而在实时播放展示待播放数据流并进行到相应帧时刻的时候,可以显示相应的互动标识。
例如,所述待播放数据流包括N个帧时刻,所述播放控制设备基于M个指定帧时刻的信息,生成相应的M个互动标识。若第i个帧时刻的信息与第j个指定帧时刻的信息相同,则可以将第i个帧时刻与第j个互动标识进行关联,在实时播放展示进行到第i个帧时刻的时候,可以显示第j个互动标识,其中,i为不大于N的自然数,j为不大于M的自然数。
S62,响应于对一互动标识的触发操作,获取对应于所述互动标识的指定帧时刻的交互数据,所述交互数据包括多角度自由视角数据。
在具体实施中,分别对应于各指定帧时刻的各交互数据可以存储于预设的存储设备中,由于互动标识和指定帧时刻有对应关系,因此,通过对交互终端执行触发操作,从 而可以触发交互终端展示的互动标识,根据对互动标识的触发操作,可以获得与触发的互动标识相对应的指定帧时刻。由此,可以获取与触发的互动标识相对应的指定帧时刻的交互数据。
例如,预设的存储设备可以存储有M份交互数据,其中,M份交互数据分别对应于M个指定帧时刻,并且,M个指定帧时刻对应于M个互动标识。假设触发的互动标识为Pi,则根据被触发的互动标识Pi可以获得互动标识Pi相对应的指定帧时刻Ti。由此,获取与得互动标识Pi相对应的指定帧时刻Ti的交互数据。其中,i为自然数。
其中,所述触发操作可以是用户输入的触发操作,也可以是交互终端自动生成的触发操作。
并且,预设的存储设备可以置于现场非采集区域,云端或终端接入侧。具体而言,预设的存储设备可以是本发明实施例中的数据处理设备、服务器或交互终端,或者为位于交互终端侧的边缘节点设备,如基站、机顶盒、路由器、家庭数据中心服务器、热点设备等。
S63,基于所述交互数据,进行所述指定帧时刻的多角度自由视角的图像展示。
在具体实施中,可以采用图像重建算法对所述交互数据的多角度自由视角数据进行图像重建,然后进行所述指定帧时刻的多角度自由视角的图像展示。
并且,若所述指定帧时刻为一个帧时刻,则可以展示多角度自由视角的静态图像;若所述指定帧时刻对应多个帧时刻,则可以展示多角度自由视角的动态图像。
采用上述方案,在视频播放的过程中,可以根据对互动标识的触发操作,获取交互数据,进而进行多角度自由视角展示,以提升用户交互体验。
在具体实施中,所述多角度自由视角数据可以基于接收的所述指定帧时刻对应的多个帧图像生成,所述多个帧图像由数据处理设备对采集阵列中多个采集设备同步采集的多路视频数据流在所述指定帧时刻进行截取得到,所述多角度自由视角数据可以包括所述多个帧图像的像素数据、深度数据,以及参数数据,其中每个帧图像的像素数据以及深度数据之间存在关联关系。
其中,帧图像的像素数据可以为YUV数据或RGB数据中任意一种,或者也可以是其它能够对帧图像进行表达的数据。深度数据可以包括与帧图像的像素数据一一对应的深度值,或者,可以是对与帧图像的像素数据一一对应的深度值集合中选取的部分数值。深度数据的具体选取方式根据具体的情景而定。
在具体实施中,可以通过参数矩阵来获得所述多个帧图像相应的参数数据,所述参 数矩阵可以包括内参矩阵,外参矩阵、旋转矩阵和平移矩阵等。由此,可以确定空间物体表面指定点的三维几何位置与其在多个帧图像中对应点之间的相互关系。
在本发明的实施例中,可以采用SFM算法,基于参数矩阵,对获取到的多个帧图像进行特征提取、特征匹配和全局优化,获得的参数估计值作为多个帧图像相应的参数数据。特征提取、特征匹配和全局优化过程中所使用的具体算法可以参见前文介绍。
在具体实施中,可以基于所述多个帧图像,确定各帧图像的深度数据。其中,深度数据可以包括与各帧图像的像素对应的深度值。采集点到现场中各个点的距离可以作为上述深度值,深度值可以直接反映待观看区域中可见表面的几何形状。例如,以拍摄坐标系的原点作为光心,深度值可以是现场中各个点沿着拍摄光轴到光心的距离。本领域技术人员可以理解的是,上述距离可以是相对数值,多个帧图像可以采用相同的基准。
在本发明一实施例中,可以采用双目立体视觉的算法,计算各帧图像来的深度数据。除此之外,深度数据还可以通过对帧图像的光度特征、明暗特征等特征进行分析间接估算得到。
在本发明另一实施例中,可以采用MVS算法进行帧图像重建,可以对每个帧图像的像素点都进行匹配,重建每个像素点的三维坐标,获得具有图像一致性的点,然后计算各个帧图像的深度数据。或者,可以对选取的帧图像的像素点进行匹配,重建各选取的帧图像的像素点的三维坐标,获得具有图像一致性的点,然后计算相应帧图像的深度数据。其中,帧图像的像素数据与计算得到的深度数据对应,选取帧图像的方式可以根据具体情景来设定,比如,可以根据需要计算深度数据的帧图像与其他帧图像之间的距离,选择部分帧图像。
在具体实施中,所述数据处理设备可以基于接收到的视频帧截取指令,截取所述多路视频数据流中所述指定帧时刻的帧级同步的视频帧。
在具体实施中,所述视频帧截取指令可以包括用于截取视频帧的帧时刻信息,所述数据处理设备根据所述视频帧截取指令中的帧时刻信息,从多路视频数据流中截取相应帧时刻的视频帧。并且,数据处理设备将所述视频帧截取指令中的帧时刻信息发送至所述播放控制设备,所述播放控制设备根据接收的帧时刻信息可以获得对应的指定帧时刻,并根据接收的帧时刻信息生成相应的互动标识。
在具体实施中,所述采集阵列中多个采集设备根据预设的多角度自由视角范围置于现场采集区域不同位置,所述数据处理设备可以置于现场非采集区域或云端。
在具体实施中,所述多角度自由视角可以是指使得场景能够自由切换的虚拟视点的 空间位置以及视角。例如,多角度自由视角可以是6自由度(6DoF)的视角,其中,虚拟视点的空间位置可以表示为(x,y,z),视角可以表示为三个旋转方向
Figure PCTCN2020124272-appb-000001
共6个自由度方向,作为6自由度(6DoF)的视角。
并且,多角度自由视角范围可以根据应用场景的需要确定。
在具体实施中,所述播放控制设备可以基于来自数据处理设备的截取视频帧的帧时刻的信息,生成与所述待播放数据流中对应时刻的视频帧关联的互动标识。例如,所述数据处理设备接收到视频帧截取指令后,将所述视频帧截取指令中的帧时刻信息发送至所述播放控制设备。然后,所述播放控制设备可以基于各帧时刻信息,生成相应的互动标识。
在具体实施中,根据现场展示的对象,以及展示对象的关联信息等,可以生成相应的交互数据。例如,所述交互数据还可以包括以下至少一种:现场分析数据、采集对象的信息数据、与采集对象关联的装备的信息数据、现场部署的物品的信息数据、现场展示的徽标的信息数据。然后,基于所述交互数据,进行多角度自由视角展示,可以向用户通过多角度自由视角展示更加丰富的交互信息,从而可以进一步增强用户交互体验。
例如,在进行篮球比赛播放时,交互数据除了可以包括多角度自由视角数据外,还可以包括球赛的分析数据、某一球员的信息数据、球员所穿的鞋子的信息数据、篮球的信息数据、现场赞助商的徽标的信息数据等其中一种或多种。
在具体实施中,为了在图像展示结束了可以便捷地返回待播放数据流,继续参照图6,在所述步骤63之后,还可以包括:
S64,在检测到交互结束信号时,切换至从所述播放控制设备实时获取待播放数据流并进行实时播放展示。
例如,在接收到交互结束操作指示时,切换至从所述播放控制设备实时获取的待播放数据流并进行实时播放展示。
又例如,在检测到所述指定帧时刻的多角度自由视角的图像展示至最后一幅图像时,切换至从所述播放控制设备实时获取的待播放数据流并进行实时播放展示。
在一具体实施例中,步骤63所述的基于所述交互数据,进行多角度自由视角图像展示,具体可以包括如下步骤:
根据所述交互操作确定虚拟视点,所述虚拟视点选自多角度自由视角范围,所述多角度自由视角范围为支持对待观看区域进行虚拟视点的切换观看的范围,然后,展示基于所述虚拟视点对所述待观看区域进行观看的图像,所述图像基于所述交互数据以及所 述虚拟视点生成。
在具体实施时,可以预设虚拟视点路径,所述虚拟视点路径可以包括数个虚拟视点。由于所述虚拟视点选自多角度自由视角范围,因此,根据交互操作时播放展示的图像视角可以确定相应的第一虚拟视点,然后,可以从所述第一虚拟视点开始按照预设的虚拟视点的顺序,依次展示各虚拟视点相应的图像。
在本发明实施例中,可以采用DIBR算法,根据所述多角度自由视角数据中的参数数据和预设的虚拟视点路径,对触发的互动标识的指定帧时刻对应的像素数据和深度数据进行组合渲染,从而实现基于预设的虚拟视点路径的图像重建,获得相应的多角度自由视角视频数据,进而可以从所述第一虚拟视点开始按照预设的虚拟视点的顺序,依次展示相应的图像。
并且,若所述指定帧时刻对应同一帧时刻,获得的多角度自由视角视频数据可以包括按照帧时刻排序的图像的多角度自由视角空间数据,可以展示多角度自由视角的静态图像;若所述指定帧时刻对应不同的帧时刻,获得的多角度自由视角视频数据可以包括按照帧时刻排序的帧图像的多角度自由视角空间数据和多角度自由视角时间数据,可以展示多角度自由视角的动态图像,即展示的是多角度自由视角的视频帧的帧图像。
本发明实施例还提供了与上述数据交互方法相应的系统,为使本领域技术人员更好地理解和实现本发明实施例,以下参照附图,通过具体实施例进行详细介绍。
参照图7所示的数据处理系统的结构示意图,数据处理系统70可以包括:采集阵列71、数据处理设备72、服务器73、播放控制设备74、以及交互终端75,其中:
所述采集阵列71可以包括多个采集设备,所述多个采集设备根据预设的多角度自由视角范围置于现场采集区域不同位置,适于实时同步采集多路视频数据流,并实时上传视频数据流至所述数据处理设备72;
所述数据处理设备72,对于上传的多路视频数据流,适于根据接收到的视频帧截取指令,在指定帧时刻对所述多路视频数据流进行截取,得到对应所述指定帧时刻的多个帧图像以及对应所述指定帧时刻的帧时刻信息,并将所述指定帧时刻的多个帧图像及对应所述指定帧时刻的帧时刻信息上传至所述服务器73,将所述指定帧时刻的帧时刻信息发送至所述播放控制设备74;
所述服务器73,适于接收所述数据处理设备72上传的所述多个帧图像以及所述帧时刻信息,并基于所述多个帧图像,生成用于进行交互的交互数据,所述交互数据包括多角度自由视角数据,所述交互数据与所述帧时刻信息关联;
所述播放控制设备74,适于确定待播放数据流中与所述数据处理设备72上传的所述帧时刻信息对应的指定帧时刻,生成关联所述指定帧时刻的互动标识,并将包含所述互动标识的待播放数据流传输至所述交互终端75;
所述交互终端75,适于基于接收到的待播放数据流,实时播放展示包含所述互动标识的视频,并基于对所述互动标识的触发操作,获取存储于所述服务器73且对应所述指定帧时刻的交互数据,以进行多角度自由视角图像展示。
需要说明的是,在具体实施中,根据用户需求,所述数据处理设备和所述服务器的位置可以灵活部署。例如,所述数据处理设备可以置于现场非采集区域或云端。又如,所述服务器可以置于现场非采集区域,云端或者终端接入侧,比如,在终端接入侧,基站、机顶盒、路由器、家庭数据中心服务器、热点设备等边缘节点设备均可以作为所述服务器,用以获得多角度自由视角数据。或者,所述数据处理设备和所述服务器也可以集中设置在一起,作为一个服务器集群进行协同工作,实现多角度自由视角数据的快速生成,以实现多角度自由视角视频的低时延播放及实时互动。
采用上述方案,在播放过程中,可以根据互动标识的触发操作,获取交互数据,进而进行多角度自由视角展示,以提升用户交互体验。
在具体实施中,所述多角度自由视角可以是指使得场景能够自由切换的虚拟视点的空间位置以及视角。并且,多角度自由视角范围可以根据应用场景的需要确定。多角度自由视角可以是6自由度(6DoF)的视角。
在具体实施中,采集设备本身可以具备编码和封装的功能,从而可以将从相应角度实时同步采集到的原始视频数据进行编码和封装。并且,采集设备可以具备压缩功能。
在具体实施中,所述服务器73适于基于接收到的对应所述指定帧时刻的多个帧图像生成所述多角度自由视角数据,所述多角度自由视角数据包括所述多个帧图像的像素数据、深度数据,以及参数数据,其中每个帧图像的像素数据以及深度数据之间存在关联关系。
在具体实施中,所述采集阵列71中多个采集设备根据预设的多角度自由视角范围可以置于现场采集区域不同位置,所述数据处理设备72可以置于现场非采集区域或云端,所述服务器73可以置于现场非采集区域、云端或者终端接入侧。
在具体实施中,所述播放控制设备74适于基于数据处理设备72截取得到的视频帧的帧信息时刻,生成与所述待播放数据流中对应视频帧关联的互动标识。
在具体实施中,所述交互终端75还适于在检测到交互结束信号时,切换至从所述播 放控制设备74实时获取的待播放数据流并进行实时播放展示。
为使本领域技术人员更好地理解和实现本发明实施例,以下通过具体的应用场景详细说明数据处理系统,如图8所示,为本发明实施例中另一种应用场景中数据处理系统的结构示意图,示出了一场篮球赛播放应用场景,其中现场为左侧的篮球赛场区域,所述数据处理系统80可以包括:由各采集设备组成的采集阵列81、数据处理设备82、云端的服务器集群83、播放控制设备84和交互终端85。
以篮球框作为核心看点,以核心看点为圆心,与核心看点位于同一平面的扇形区域可以作为预设的多角度自由视角范围。相应地,所述采集阵列81中各采集设备可以根据预设的多角度自由视角范围,成扇形置于现场采集区域不同位置,可以分别从相应角度实时同步采集视频数据流。
在具体实施中,采集设备还可以设置在篮球场馆的顶棚区域、篮球架上等。各采集设备可以沿直线、扇形、弧线、圆形或者不规则形状排列分布。具体排列方式可以根据具体的现场环境、采集设备数量、采集设备的特点、成像效果需求等一种或多种因素进行设置。所述采集设备可以是任何具有摄像功能的设备,例如,普通的摄像机、手机、专业摄像机等。
而为了不影响采集设备工作,所述数据处理设备82可以置于现场非采集区域。所述数据处理设备82可以通过无线局域网向所述采集阵列81中各采集设备分别发送拉流指令。所述采集阵列81中各采集设备基于所述数据处理设备82发送的拉流指令,将获得的视频数据流实时传输至所述数据处理设备82。其中,所述采集阵列81中各采集设备可以通过交换机87将获得的视频数据流实时传输至所述数据处理设备82。各采集设备可以将采集到的原始视频数据实时压缩并实时传输至所述数据处理设备,以进一步节约局域网传输资源。
当所述数据处理设备82接收到视频帧截取指令时,从接收到的多路视频数据流中对指定帧时刻的视频帧截取得到多个视频帧对应的帧图像以及对应所述指定帧时刻的帧时刻信息,并将所述指定帧时刻的多个帧图像及对应所述指定帧时刻的帧时刻信息上传至所述云端的服务器集群83,将所述指定帧时刻的帧时刻信息发送至所述播放控制设备84。其中,视频帧截取指令可以为用户手动发出,也可以是数据处理设备自动生成。
服务器可以置于云端,并且为了能够更快速地并行处理数据,可以按照处理数据的不同,由多个不同的服务器或服务器组组成云端的服务器83。
例如所述云端的服务器集群83可以包括:第一云端服务器831、第二云端服务器832、 第三云端服务器833和第四云端服务器834。其中,第一云端服务器831可以用于确定所述多个帧图像相应的参数数据;第二云端服务器832可以用于确定所述多个帧图像中各帧图像的深度数据;第三云端服务器833可以基于所述多个帧图像相应的参数数据、所述多个帧图像中预设帧图像的深度数据和像素数据,使用DIBR算法,对预设的虚拟视点路径进行帧图像重建;所述第四云端服务器834可以用于生成多角度自由视角视频。
可以理解的是,所述第一云端服务器831、第二云端服务器832、第三云端服务器833、第四云端服务器834也可以为服务器阵列或服务器子集群组成的服务器组,本发明实施例不做限制。
在具体实施中,所述多角度自由视角视频数据可以包括:按照帧时刻排序的帧图像的多角度自由视角空间数据和多角度自由视角时间数据。所述交互数据可以包括多角度自由视角数据,所述多角度自由视角数据可以包括多个帧图像的像素数据和深度数据以及参数数据,每个帧图像的像素数据以及深度数据之间存在关联关系。
云端的服务器集群83可以按照所述指定帧时刻信息对交互数据进行存储。
播放控制设备84可以根据数据处理设备上传的帧时刻信息,生成关联所述指定帧时刻的互动标识并将包含所述互动标识的待播放数据流传输至所述交互终端85。
交互终端85可以基于接收到的待播放数据流,实时播放展示视频并在相应视频帧时刻显示互动标识。当一互动标识被触发,交互终端85可以获取存储于所述云端的服务器集群83且对应所述指定帧时刻的交互数据,以进行多角度自由视角图像展示。交互终端85在检测到交互结束信号时,可以切换至从所述播放控制设备84实时获取待播放数据流并进行实时播放展示。
参照图38所示的另一种数据处理系统的结构示意图,数据处理系统380可以包括:采集阵列381、数据处理设备382、播放控制设备383、以及交互终端384;其中:
所述采集阵列381包括多个采集设备,所述多个采集设备根据预设的多角度自由视角范围置于现场采集区域不同位置,适于实时同步采集多路视频数据流,并实时上传视频数据流至所述数据处理设备;
所述数据处理设备382,对于上传的多路视频数据流,适于根据接收到的视频帧截取指令,在指定帧时刻对所述多路视频数据流进行截取,得到对应所述指定帧时刻的多个帧图像以及对应所述指定帧时刻的帧时刻信息,并将所述指定帧时刻的帧时刻信息发送至所述播放控制设备383;
所述播放控制设备383,适于确定待播放数据流中与所述数据处理设备382上传的 所述帧时刻信息对应的指定帧时刻,生成关联所述指定帧时刻的互动标识,并将包含所述互动标识的待播放数据流传输至所述交互终端384;
所述交互终端384,适于基于接收到的待播放数据流,实时播放展示包含所述互动标识的视频,并基于对所述互动标识的触发操作,从所述数据处理设备382获取对应于所述互动标识的指定帧时刻的多个帧图像,并基于所述多个帧图像,生成用于进行交互的交互数据,再进行多角度自由视角图像展示,其中,所述交互数据包括多角度自由视角数据。
在具体实施中,根据用户需求,所述数据处理设备可以灵活部署,例如,所述数据处理设备可以置于现场非采集区域或云端。
采用上述数据处理系统,在播放过程中,可以根据互动标识的触发操作,获取交互数据,进而进行多角度自由视角展示,以提升用户交互体验。
本发明实施例还提供了与上述数据交互方法相应的终端,为使本领域技术人员更好地理解和实现本发明实施例,以下参照附图,通过具体实施例进行详细介绍。
参照图9示出的交互终端的结构示意图,交互终端90可以包括:
数据流获取单元91,适于从播放控制设备实时获取待播放数据流,所述待播放数据流包括视频数据及互动标识,所述互动标识与所述待播放数据流的指定帧时刻关联;
播放展示单元92,适于实时播放展示所述待播放数据流的视频及互动标识;
交互数据获取单元93,适于响应于对所述互动标识的触发操作,获取对应于所述指定帧时刻的交互数据,所述交互数据包括多角度自由视角数据;
交互展示单元94,适于基于所述交互数据,进行所述指定帧时刻的多角度自由视角的图像展示;
切换单元95,适于在检测到交互结束信号时,触发切换至由所述数据流获取单元91从所述播放控制设备实时获取的待播放数据流并由所述播放展示单元92进行实时播放展示。
其中,所述交互数据可以由服务器生成并传输给交互终端,也可以由交互终端生成。
交互终端在播放视频的过程中,可以从播放控制设备实时获取待播放数据流,在相应的帧时刻的时候,可以显示相应的互动标识。在具体实施中,如图4所示,为本发明实施例中一种交互终端的交互界面示意图。
交互终端40从播放控制设备实时获取待播放数据流,在实时播放展示进行到第1个帧时刻T1的时候,可以在进度条41上显示第一个互动标识42,在实时播放展示进行到 第二个帧时刻T2的时候,可以在进度条上显示第二个互动标识43。其中,进度条黑色部分为已播放部分,白色为未播放部分。
所述触发操作可以是用户输入的触发操作,也可以是交互终端自动生成的触发操作,例如,交互终端在检测到存在多角度自由视点数据帧的标识时可以自动发起触发操作。在用户手动触发时,可以是交互终端显示交互提示信息后用户选择触发交互的时刻信息,也可以是交互终端接收到用户操作触发交互的历史时刻信息,所述历史时刻信息可以为位于当前播放时刻之前的时刻信息。
结合图4、图7和图9,当交互终端的系统读取到进度条41上相应的互动标识43,可以显示交互提示信息,当用户未选择触发时,交互终端40可以继续读取后续视频数据,进度条41的已播放部分继续前进。当用户选择触发时,交互终端40接收到反馈后生成相应互动标识的指定帧时刻的图像重建指令,并发送至所述服务器73。
例如,当用户选择触发当前的互动标识43时,交互终端40接收到反馈后生成互动标识43相应指定帧时刻T2的图像重建指令,并发送至所述服务器73。所述服务器根据图像重建指令可以发送指定帧时刻T2相应的交互数据。
用户也可以在观看时选择触发历史互动标识,例如触发进度条上已播放部分41a展示的互动标识42,交互终端40接收到反馈后生成互动标识42相应指定帧时刻T1的图像重建指令,并发送至所述服务器73。所述服务器根据图像重建指令可以发送指定帧时刻T1相应的交互数据。交互终端40可以采用图像重建算法对所述交互数据的多角度自由视角数据进行图像处理,然后进行所述指定帧时刻的多角度自由视角的图像展示。若所述指定帧时刻为一个帧时刻,则展示的是多角度自由视角的静态图像;若所述指定帧时刻对应多个帧时刻,则展示的是多角度自由视角的动态图像。
结合图4、图38和图9,当交互终端的系统读取到进度条41上相应的互动标识43,可以显示交互提示信息,当用户未选择触发时,交互终端40可以继续读取后续视频数据,进度条41的已播放部分继续前进。当用户选择触发时,交互终端40接收到反馈后生成相应互动标识的指定帧时刻的图像重建指令,并发送至所述数据处理设备382。
例如,当用户选择触发当前的互动标识43时,交互终端40接收到反馈后生成互动标识43相应指定帧时刻T2的图像重建指令,并发送至所述数据处理设备。所述数据处理设备382根据图像重建指令可以发送指定帧时刻T2相应的多个帧图像。
用户也可以在观看时选择触发历史互动标识,例如触发进度条上已播放部分41a展示的互动标识42,交互终端40接收到反馈后生成互动标识42相应指定帧时刻T1的图 像重建指令,并发送至所述数据处理设备。所述数据处理设备根据图像重建指令可以发送指定帧时刻T1相应的多个帧图像。
交互终端40可以基于所述多个帧图像,生成用于进行交互的交互数据,并可以采用图像重建算法对所述交互数据的多角度自由视角数据进行图像处理,然后进行所述指定帧时刻的多角度自由视角的图像展示。若所述指定帧时刻为一个帧时刻,则展示的是多角度自由视角的静态图像;若所述指定帧时刻对应多个帧时刻,则展示的是多角度自由视角的动态图像。
在具体实施中,本发明实施例的交互终端可以是具有触屏功能的电子设备、头戴式虚拟现实(Virtual Reality,VR)终端、与显示器连接的边缘节点设备、具有显示功能的IoT(The Internet of Things,物联网)设备。
如图40所示,为本发明实施例中另一种交互终端的交互界面示意图,交互终端为具有触屏功能的电子设备400,当读取到进度条401上相应的互动标识402时,电子设备400的界面可以显示交互提示信息框403。用户可以根据交互提示信息框403的内容进行选择,当用户做出选择“是”的触发操作时,电子设备400接收到反馈后可以生成互动标识402相对应的交互帧时刻的图像重建指令,当用户做出选择“否”的不触发操作时,电子设备400可以继续读取后续视频数据。
如图41所示,为本发明实施例中另一种交互终端的交互界面示意图,交互终端为头戴式VR终端410,当读取到进度条411上相应的互动标识412时,头戴式VR终端410的界面可以显示交互提示信息框413。用户可以根据交互提示信息框413的内容进行选择,当用户做出选择“是”的触发操作(例如点头)时,头戴式VR终端410接收到反馈后可以生成互动标识412相对应的交互帧时刻的图像重建指令,当用户做出选择“否”的不触发操作(例如摇头)时,头戴式VR终端410可以继续读取后续视频数据。
如图42所示,为本发明实施例中另一种交互终端的交互界面示意图,交互终端为与显示器420连接的边缘节点设备421,当边缘节点设备421读取到进度条422上相应的互动标识423时,显示器420可以显示交互提示信息框424。用户可以根据交互提示信息框424的内容进行选择,当用户做出选择“是”的触发操作时,边缘节点设备421接收到反馈后可以生成互动标识423相对应的交互帧时刻的图像重建指令,当用户做出选择“否”的不触发操作时,边缘节点设备421可以继续读取后续视频数据。
在具体实施中,交互终端可以与上述的数据处理设备、服务器中至少一种建立通信连接,可以采用有线连接或无线连接。
如图43所示,为本发明实施例中一种交互终端的连接示意图。边缘节点设备430通过物联网与交互设备431、432和433建立无线连接。
在具体实施中,交互终端在触发交互标识后,可以进行触发的交互标识对应的指定帧时刻的多角度自由视角的图像展示,并基于交互操作确定虚拟视点位置信息,如图44所示,为本发明实施例中一种交互终端的交互操作示意图,用户可以在交互操作界面上水平操作或垂直操作,并且操作轨迹可以是直线或曲线。
在具体实施中,如图45所示,为本发明实施例中另一种交互终端的交互界面示意图。当所述用户点击互动标识后,交互终端获取所述互动标识的指定帧时刻的交互数据。
若用户并未采取新的操作,则触发操作即为交互操作,可以根据交互操作时播放展示的图像视角确定相应的第一虚拟视点。若用户采取新的操作,则新的操作即为交互操作,可以根据交互操作时播放展示的图像视角确定相应的第一虚拟视点。
然后,可以从所述第一虚拟视点开始按照预设的虚拟视点的顺序,依次展示各虚拟视点相应的图像。若所述指定帧时刻对应同一帧时刻,获得的多角度自由视角视频数据可以包括按照帧时刻排序的图像的多角度自由视角空间数据,可以展示多角度自由视角的静态图像;若所述指定帧时刻对应不同的帧时刻,获得的多角度自由视角视频数据可以包括按照帧时刻排序的帧图像的多角度自由视角空间数据和多角度自由视角时间数据,可以展示多角度自由视角的动态图像,即展示的是多角度自由视角的视频帧的帧图像。
在本发明一实施例中,参考图45及46。所述交互终端获得的多角度自由视角视频数据可以包括按照帧时刻排序的帧图像的多角度自由视角空间数据和多角度自由视角时间数据,用户向右水平滑动产生交互操作,确定相应的第一虚拟视点,并且由于不同的虚拟视点可以对应不同的多角度自由视角空间数据和多角度自由视角时间数据,如图46所示,交互界面中展示的帧图像随着交互操作在时间和空间上发生了变化,帧图像展示的内容从图45的运动员奔向终点线变化为图46的运动员即将越过终点线,并且以运动员作为目标对象而言,帧图像展示的视角从左视图变成了正视图。
同理可得图45及47,帧图像展示的内容从图45的运动员奔向终点线变化为图47的运动员已经越过终点线,并且以运动员作为目标对象而言,帧图像展示的视角从左视图变成了右视图。
同理可得图45及48,用户向上垂直滑动产生交互操作,帧图像展示的内容从图45的运动员奔向终点线变化为图48的运动员已经越过终点线,并且以运动员作为目标对象 而言,帧图像展示的视角从左视图变成了俯视图。
可以理解的是,根据用户的操作可以获得不同的交互操作,并根据交互操作时播放展示的图像视角可以确定相应的第一虚拟视点;根据获得的多角度自由视角视频数据,可以展示多角度自由视角的静态图像或动态图像,本发明实施例不做限制。
在具体实施中,所述交互数据还可以包括以下至少一种:现场分析数据、采集对象的信息数据、与采集对象关联的装备的信息数据、现场部署的物品的信息数据、现场展示的徽标的信息数据。
在本发明一实施例中,如图10所示的本发明实施例中另一种交互终端的交互界面示意图。交互终端100在触发交互标识后,可以进行触发的交互标识对应的指定帧时刻的多角度自由视角的图像展示,并且,可以在图像(未示出)上叠加现场分析数据,如图10中的现场分析数据101所示。
在本发明一实施例中,如图11所示的本发明实施例中另一种所述交互终端的交互界面示意图。交互终端110在用户触发交互标识后,可以进行触发的交互标识对应的指定帧时刻的多角度自由视角的图像展示,并且,可以在图像(未示出)上叠加采集对象的信息数据,如图11中的采集对象的信息数据111所示。
在本发明一实施例中,如图12所示的本发明实施例中另一种所述交互终端的交互界面示意图。交互终端120在用户触发交互标识后,可以进行触发的交互标识对应的指定帧时刻的多角度自由视角的图像展示,并且,可以在图像(未示出)上叠加采集对象的信息数据,如图12中的采集对象的信息数据121-123所示。
在本发明一实施例中,如图13所示的本发明实施例中另一种所述终端的交互界面示意图。交互终端130在用户触发交互标识后,可以进行触发的交互标识对应的指定帧时刻的多角度自由视角的图像展示,并且,可以在图像(未示出)上叠加现场部署的物品的信息数据,如图13中的文件包的信息数据131所示。
在本发明一实施例中,如图14所示的本发明实施例中另一种所述终端的交互界面示意图。交互终端140在触发交互标识后,可以进行触发的交互标识对应的指定帧时刻的多角度自由视角的图像展示,并且,可以在图像(未示出)上叠加现场展示的徽标的信息数据,如图14中的徽标信息数据141所示。
由此,用户可以通过交互数据获取更多关联的交互信息,更加深入、全面、专业地了解所观看的内容,从而可以进一步增强用户交互体验。
参照图39示出的另一种交互终端的结构示意图,所述交互终端390可以包括:处理 器391,网络组件392,存储器393和显示部件394;其中:
所述处理器391,适于通过网络组件392实时获取待播放数据流,以及响应于对一互动标识的触发操作,获取对应于所述互动标识的指定帧时刻的交互数据,其中,所述待播放数据流包括视频数据及互动标识,所述互动标识与所述待播放数据流的指定帧时刻关联,所述交互数据包括多角度自由视角数据;
所述存储器393,适于存储实时获取的待播放数据流;
所述显示部件394,适于基于实时获取的待播放数据流,实时播放展示所述待播放数据流的视频及互动标识,以及基于所述交互数据,进行所述指定帧时刻的多角度自由视角的图像展示。
其中,交互终端390可以从存储交互数据的服务器处获取得到所述指定帧时刻的交互数据,也可以从存储帧图像的数据处理设备处获取指定帧时刻相应的多个帧图像,然后生成相应的交互数据。
为使本领域技术人员更好地理解和实现本发明实施例,以下对多角度自由视角视频图像在现场侧的处理方案作进一步详细的描述。
参照图15所示的数据处理方法的流程图,在本发明实施例中,具体可以包括如下步骤:
S151,在确定采集阵列中各采集设备预传输的压缩视频数据流的码率之和不大于预设的带宽阈值时,分别向所述采集阵列中各采集设备发送拉流指令,其中,所述采集阵列中各采集设备根据预设的多角度自由视角范围置于现场采集区域不同位置。
所述多角度自由视角可以是指使得场景能够自由切换的虚拟视点的空间位置以及视角。并且,多角度自由视角范围可以根据应用场景的需要确定。
在具体实施中,预设的带宽阈值可以根据采集阵列中各采集设备所在传输网络的传输能力决定。例如,传输网络的上行带宽为1000Mbps,则预设的带宽值可以为1000Mbps。
S152,接收所述采集阵列中各采集设备基于所述拉流指令实时传输的压缩视频数据流,所述压缩视频数据流为所述采集阵列中各采集设备分别从相应角度实时同步采集和数据压缩获得。
在具体实施中,采集设备本身可以具备编码和封装的功能,从而可以将从相应角度实时同步采集到的原始视频数据进行编码和封装,其中,采集设备采用的封装格式可以是AVI、QuickTime File Format、MPEG、WMV、Real Video、Flash Video、Matroska等格式中的任一种,或者也可以是其他封装格式,采集设备采用的编码格式可以是H.261、 H.263、H.264、H.265、MPEG、AVS等编码格式,或者也可以是其它编码格式。并且,采集设备可以具备压缩功能,压缩率越高,在压缩前数据量相同的情况下可以使得压缩后的数据量更小,可以缓解实时同步传输的带宽压力,因此,采集设备可以采用预测编码、变换编码和熵编码等技术提高视频的压缩率。
采用上述数据处理方法,在拉流前确定了传输带宽是否匹配,可以避免拉流过程中数据传输拥堵,使得各采集设备采集和数据压缩得到的数据能够实时同步传输,加快多角度自由视角视频数据的处理速度,在带宽资源及数据处理资源有限的情况下实现多角度自由视角视频的低时延播放,降低实施成本。
在具体实施中,可以通过获取各采集设备的参数的数值,进行计算得出采集阵列中各采集设备预传输的压缩视频数据流的码率之和不大于预设的带宽阈值。例如,采集阵列中可以包含40个采集设备,各采集设备的压缩视频数据流的码率均为15Mbps,则采集阵列整体的码率为15*40=600Mbps,若预设的带宽阈值为1000Mbps,则确定采集阵列中各采集设备预传输的压缩视频数据流的码率之和不大于预设的带宽阈值。然后,可以根据采集阵列中40个采集设备的IP地址,分别向各采集设备发送拉流指令。
在具体实施中,为了确保采集阵列中各采集设备的参数的数值统一,使得各采集设备能够实时同步采集和数据压缩,在分别向所述采集阵列中各采集设备发送拉流指令之前,可以设置所述采集阵列中各采集设备的参数的数值。其中,所述采集设备的参数可以包括:采集参数和压缩参数,且所述采集阵列中各采集设备按照设置的所述各采集设备的参数的数值,从相应角度实时同步采集和数据压缩获得的压缩视频数据流的码率之和不大于预设的带宽阈值。
由于采集参数和压缩参数相辅相成,在压缩参数的数值不变的情况下,可以通过设置采集参数的数值来减小原始视频数据的数据量大小,使得数据压缩处理的时间缩短;在采集参数的数值不变的情况下,设置压缩参数的数值可以相应减小压缩后的数据量,使得数据传输的时间减短。又如,设置较高的压缩率可以节省传输带宽,设置较低的采样率也可以节省传输带宽。因此,可以根据实际情况,设置采集参数和/或压缩参数。
由此,在开始拉流之前,可以对采集阵列中各采集设备的参数的数值进行设置,确保采集阵列中各采集设备的参数的数值统一,各采集设备可以从相应角度实时同步采集和数据压缩,并且获得的压缩视频数据流的码率之和不大于预设的带宽阈值,从而可以避免网络拥塞,在带宽资源有限的情况下也可以实现多角度自由视角视频的低时延播放。
在具体实施例中,采集参数可以包括焦距参数,曝光参数,分辨率参数、编码码率 参数和编码格式参数等,压缩参数可以包括压缩率参数,压缩格式参数等,通过设置不同的参数的数值,获得最适合各采集设备所在传输网络的数值。
为了简化设置流程,节约设置时间,在设置所述采集阵列中各采集设备的参数的数值之前,可以先确定所述采集阵列中各采集设备按照已设置的参数的数值进行采集和数据压缩获得的压缩视频数据流的码率之和是否大于预设的带宽阈值,当获得的压缩视频数据流的码率之和大于预设的带宽阈值时,在分别向所述采集阵列中各采集设备发送拉流指令之前,可以设置所述采集阵列中各采集设备的参数的数值。可以理解的是,在具体实施中,也可以根据需要展示的多角度自由视角图像的分辨率等成像质量要求设置所述采集参数的数值和压缩参数的数值。
在具体实施中,各采集设备获得的压缩视频数据流从传输到写入的过程是连续发生的,因此,在分别向所述采集阵列中各采集设备发送拉流指令之前,还可以确定所述采集阵列中各采集设备预传输的压缩视频数据流的码率之和是否大于预设的写入速度阈值,并在所述采集阵列中各采集设备预传输的压缩视频数据流的码率之和大于预设的写入速度阈值时,可以设置所述采集阵列中各采集设备的参数的数值,使得所述采集阵列中各采集设备按照设置的所述各采集设备的参数的数值,从相应角度实时同步采集和数据压缩获得的压缩视频数据流的码率之和不大于所述预设的写入速度阈值。
在具体实施中,预设的写入速度阈值可以根据存储介质的数据存储写入速度决定。例如,数据处理设备的固态硬盘(Solid State Disk或Solid State Drive,SSD)的数据存储写入速度上限为100Mbps,则预设的写入速度阈值可以为100Mbps。
采用上述方案,在开始拉流之前,可以确保各采集设备从相应角度实时同步采集和数据压缩获得的压缩视频数据流的码率之和不大于所述预设的写入速度阈值,从而可以避免数据写入拥塞,确保压缩视频数据流在采集、传输和写入的过程中链路畅通,使得各采集设备上传的压缩视频流可以得到实时的处理,进而实现多角度自由视角视频的播放。
在具体实施中,可以对各采集设备获得的压缩视频数据流进行存储。当接收到视频帧截取指令时,可以根据接收到的视频帧截取指令,截取各压缩视频数据流中帧级同步的视频帧,将截取得到的视频帧同步上传至所述指定目标端。
其中,所述指定目标端可以是预先设置的目标端,也可以是视频帧截取指令指定的目标端。并且,可以先将截取得到的视频帧进行封装,并通过网络传输协议上传至所述指定目标端,再进行解析,获得相应的压缩视频数据流中帧级同步的视频帧。
由此,将压缩视频数据流截取的视频帧的后续处理交由所述指定目标端进行,可以节约网络传输资源,降低现场部署大量服务器资源部署的压力和难度,也可以极大地降低数据处理负荷,缩短多角度自由视角视频帧的传输时延。
在具体实施中,为了确保截取各压缩视频数据流中帧级同步的视频帧,如图16所示,可以包括以下步骤:
S161,确定实时接收的所述采集阵列中各采集设备的压缩视频数据流中其中一路压缩视频数据流作为基准数据流;
S162,基于接收到的视频帧截取指令,确定所述基准数据流中的待截取的视频帧,并选取与所述基准数据流中的待截取的视频帧同步的其余各压缩视频数据流中的视频帧,作为其余各压缩视频数据流的待截取的视频帧;
S163,截取各压缩视频数据流中待截取的视频帧。
为使本领域技术人员更好地理解和实现本发明实施例,以下通过一具体的应用场景详细说明如何确定待各压缩视频数据流中待截取的视频帧。
在本发明一实施例中,采集阵列中可以包含40个采集设备,因此,可以实时接收40路压缩视频数据流,假设在实时接收的所述采集阵列中各采集设备的压缩视频数据流中,确定采集设备A1’对应的压缩视频数据流A1作为基准数据流,然后,基于接收到的视频帧截取指令中指示截取的视频帧中对象的特征信息X,确定所述基准数据流中与所述对象的特征信息X一致的视频帧a1作为待截取的视频帧,然后根据所述基准数据流中的待截取的视频帧a1中对象的特征信息x1,选取其余各压缩视频数据流A2-A40中与对象的特征信息x1一致的视频帧a2-a40,作为其余各压缩视频数据流的待截取的视频帧。
其中,对象的特征信息可以包括形状特征信息、颜色特征信息和位置特征信息等其中至少一种。所述视频帧截取指令中指示截取的视频帧中对象的特征信息X,与所述基准数据流中的待截取的视频帧a1中对象的特征信息x1可以是对相同的对象的特征信息的同一表示方式,例如,对象的特征信息X和x1均是二维特征信息;对象的特征信息X和对象的特征信息x1也可以是对相同的对象的特征信息的不同表示方式,例如,对象的特征信息X可以是二维特征信息,而对象的特征信息x1可以是三维特征信息。并且,可以预设一个相似阈值,当满足相似阈值时,可以认为对象的特征信息X与x1一致,或者对象的特征信息x1与其余各压缩视频数据流A2-A40中对象的特征信息x2-x40一致。
对象的特征信息的具体表示方式以及相似阈值可以根据预设的多角度自由视角范围 和现场的场景决定,本发明实施例不做任何限定。
在本发明另一实施例中,采集阵列中可以包含40个采集设备,因此,可以实时接收40路压缩视频数据流,假设在实时接收的所述采集阵列中各采集设备的压缩视频数据流中,确定采集设备B1’对应的压缩视频数据流B1作为基准数据流,然后,基于接收到的视频帧截取指令中指示截取的视频帧的时间戳信息Y,确定所述基准数据流中与所述时间戳信息Y对应的视频帧b1作为待截取的视频帧,然后根据所述基准数据流中的待截取的视频帧b1中的时间戳信息y1,选取其余各压缩视频数据流B2-B40中与时间戳信息y1一致的视频帧b2-b40,作为其余各压缩视频数据流的待截取的视频帧。
其中,所述视频帧截取指令中指示截取的视频帧的时间戳信息Y,与所述基准数据流中的待截取的视频帧b1中的时间戳信息y1可以有一定的误差,例如,所述基准数据流中视频帧对应的时间戳信息均与时间戳信息Y不一致,存在0.1ms的误差,则可以预设一个误差范围,例如,误差范围为±1ms,则0.1ms的误差在误差范围内,因此,可以选取与时间戳信息Y相差0.1ms的时间戳信息y1对应的视频帧b1作为基准数据流中的待截取的视频帧。具体的误差范围以及基准数据流中的时间戳信息y1的选取规则可以根据现场的采集设备和传输网络决定,本实施例不做限定。
可以理解的是,上述实施例中确定待各压缩视频数据流中待截取的视频帧的方法可以单独使用,也可以同时使用,本发明实施例不做限定。
利用上述数据处理方法,使得数据处理设备能够顺利、流畅地拉取各采集设备采集和数据压缩到的数据。
下面将结合本说明书实施例中的附图,对本说明书实施例中采集阵列进行数据处理的技术方案进行清楚、完整地描述。
参照图17所示的数据处理方法的流程图,在本发明实施例中,具体可以包括如下步骤:
S171,采集阵列中根据预设的多角度自由视角范围置于现场采集区域不同位置的各采集设备分别从相应角度实时同步采集原始视频数据,并分别对采集到的原始视频数据进行实时数据压缩,获得相应的压缩视频数据流。
S172,与所述采集阵列链路连接的数据处理设备在确定所述采集阵列中各采集设备预传输的压缩视频数据流的码率之和不大于预设的带宽阈值时,分别向所述采集阵列中各采集设备发送拉流指令。
在具体实施中,预设的带宽阈值可以根据采集阵列中各采集设备所在传输网络的传 输能力决定,例如,传输网络的上行带宽为1000Mbps,则预设的带宽值可以为1000Mbps。
S173,所述采集阵列中各采集设备基于所述拉流指令,将获得的压缩视频数据流实时传输至所述数据处理设备。
在具体实施中,所述数据处理设备可以根据实际情景设置。例如,当现场有适合空间时,数据处理设备可以置于现场非采集区域,作为现场服务器;当现场无适合空间时,所述数据处理设备可以置于云端,作为云端服务器。
采用上述方案,与所述采集阵列链路连接的数据处理设备在确定所述采集阵列中各采集设备预传输的压缩视频数据流的码率之和不大于预设的带宽阈值时,分别向所述采集阵列中各采集设备发送拉流指令,使得各采集设备采集和数据压缩得到的数据能够实时同步传输,从而可以通过所在传输网络进行实时拉流,并且可以避免拉流过程中数据传输拥堵;然后,采集阵列中各采集设备基于所述拉流指令,将获得的压缩视频数据流实时传输至所述数据处理设备,由于各采集设备传输的数据经过压缩,因而可以缓解实时同步传输的带宽压力,加快了多角度自由视角视频数据的处理速度。
由此,可以避免在现场布置大量服务器进行数据处理,也无需通过SDI采集卡汇总采集的原始数据,再通过现场机房中的计算服务器对原始数据进行处理,可以免于采用昂贵的SDI视频传输线缆和SDI接口,而是通过普通传输网络进行数据传输,在带宽资源及数据处理资源有限的情况下可以实现多角度自由视角视频的低时延播放,降低实施成本。
在具体实施中,为了简化设置流程,节约设置时间,在设置所述采集阵列中各采集设备的参数的数值之前,数据处理设备可以先确定所述采集阵列中各采集设备按照已设置的参数的数值进行采集和数据压缩获得的压缩视频数据流的码率之和是否大于预设的带宽阈值,当获得的压缩视频数据流的码率之和大于预设的带宽阈值时,数据处理设备可以设置所述采集阵列中各采集设备的参数的数值,再分别向所述采集阵列中各采集设备发送拉流指令。
在具体实施中,各采集设备获得的压缩视频数据流从传输到写入的过程是连续发生的,还需要确保数据处理设备写入各采集设备获得的压缩视频数据流时畅通,因此,在分别向所述采集阵列中各采集设备发送拉流指令之前,数据处理设备还可以确定所述采集阵列中各采集设备预传输的压缩视频数据流的码率之和是否大于预设的写入速度阈值,并在所述采集阵列中各采集设备预传输的压缩视频数据流的码率之和大于预设的写入速度阈值时,数据处理设备可以设置所述采集阵列中各采集设备的参数的数值,使得 所述采集阵列中各采集设备按照设置的所述各采集设备的参数的数值,从相应角度实时同步采集和数据压缩获得的压缩视频数据流的码率之和不大于所述预设的写入速度阈值。
在具体实施中,预设的写入速度阈值可以根据数据处理设备的数据存储写入速度决定。
在具体实施中,所述采集阵列中各采集设备和所述数据处理设备之间,可以通过以下至少一种方式进行数据传输:
1、通过交换机进行数据传输;
通过交换机将所述采集阵列中各采集设备与数据处理设备进行连接,所述交换机可以将更多的采集设备的压缩视频数据流进行汇总统一传输给数据处理设备,可以减少数据处理设备支持的端口数量。例如,交换机支持40个输入,因此数据处理设备通过所述交换机,则最多可以同时接收40台采集设备组成的采集阵列的视频流,进而可以减少数据处理设备的数量。
2、通过局域网进行数据传输。
通过局域网将所述采集阵列中各采集设备与数据处理设备进行连接,所述局域网可以实时将采集设备的压缩视频数据流传输给数据处理设备,减少数据处理设备支持的端口数量,进而可以减少数据处理设备的数量。
在具体实施中,所述数据处理设备可以对各采集设备获得的压缩视频数据流进行存储(可以是缓存),并在接收到的视频帧截取指令时,所述数据处理设备可以根据接收到的视频帧截取指令截取各压缩视频数据流中帧级同步的视频帧,将截取得到的视频帧同步上传至所述指定目标端。
其中,所述数据处理设备可以预先通过端口或IP地址与一目标端建立连接,也可以将截取得到的视频帧同步上传至所述视频帧截取指令指定的端口或IP地址。并且,所述数据处理设备可以先将截取得到的视频帧进行封装,并通过网络传输协议上传至所述指定目标端,再进行解析,获得相应的压缩视频数据流中帧级同步的视频帧。
采用上述方案,可以将采集阵列中各采集设备实时同步采集和数据压缩获得的压缩视频数据流统一传输至数据处理设备,所述数据处理设备在接收到的视频帧截取指令后,经过打点截帧的初步处理,可以将截取到的各压缩视频数据流中帧级同步的视频帧同步上传至所述指定目标端,将压缩视频数据流截取的视频帧的后续处理交由所述指定目标端,从而可以节约网络传输资源,降低现场部署的压力和难度,也可以极大地降低数据 处理负荷,缩短多角度自由视角视频帧的传输时延。
在具体实施中,为了截取各压缩视频数据流中帧级同步的视频帧,所述数据处理设备可以先确定实时接收的所述采集阵列中各采集设备的压缩视频数据流中其中一路压缩视频数据流作为基准数据流,然后,所述数据处理设备可以基于接收到的视频帧截取指令,确定所述基准数据流中的待截取的视频帧,并选取与所述基准数据流中的待截取的视频帧同步的其余各压缩视频数据流中的视频帧,作为其余各压缩视频数据流的待截取的视频帧,最后,所述数据处理设备截取各压缩视频数据流中待截取的视频帧。具体截帧方法可以参见前述实施例的示例,此处不再赘述。
本发明实施例还提供了与上述实施例中数据处理方法相应的数据处理设备,为使本领域技术人员更好地理解和实现本发明实施例,以下参照附图,通过具体实施例进行详细介绍。
参照图18所示的数据处理设备的结构示意图,在本发明实施例中,如图18所示,数据处理设备180可以包括:
第一传输匹配单元181,适于确定采集阵列中各采集设备预传输的压缩视频数据流的码率之和是否不大于预设的带宽阈值,其中,所述采集阵列中各采集设备根据预设的多角度自由视角范围置于现场采集区域不同位置。
指令发送单元182,适于在确定所述采集阵列中各采集设备预传输的压缩视频数据流的码率之和不大于预设的带宽阈值时,分别向所述采集阵列中各采集设备发送拉流指令。
数据流接收单元183,适于接收所述采集阵列中各采集设备基于所述拉流指令实时传输的压缩视频数据流,所述压缩视频数据流为所述采集阵列中各采集设备分别从相应角度实时同步采集和数据压缩获得。
采用上述数据处理设备,向所述采集阵列中各采集设备发送拉流指令之前,确定了传输带宽是否匹配,可以避免拉流过程中数据传输拥堵,使得各采集设备采集和数据压缩得到的数据能够实时同步传输,加快多角度自由视角视频数据的处理速度,在带宽资源及数据处理资源有限的情况下实现多角度自由视角视频,降低实施成本。
在本发明一实施例中,如图18所示,所述数据处理设备180还可以包括:
第一参数设置单元184,适于在分别向所述采集阵列中各采集设备发送拉流指令之前,设置所述采集阵列中各采集设备的参数的数值;
其中,所述采集设备的参数可以包括:采集参数和压缩参数,且所述采集阵列中各 采集设备按照设置的所述各采集设备的参数的数值,从相应角度实时同步采集和数据压缩获得的压缩视频数据流的码率之和不大于预设的带宽阈值。
在本发明一实施例中,为了简化设置流程,节约设置时间,如图18所示,所述数据处理设备180还可以包括:
第二传输匹配单元185,适于在设置所述采集阵列中各采集设备的参数的数值之前,确定所述采集阵列中各采集设备按照已设置的参数的数值进行采集和数据压缩获得的压缩视频数据流的码率之和是否不大于预设的带宽阈值。
在本发明一实施例中,如图18所示,所述数据处理设备180还可以包括:
写入匹配单元186,适于确定所述采集阵列中各采集设备预传输的压缩视频数据流的码率之和是否大于预设的写入速度阈值;
第二参数设置单元187,适于在所述采集阵列中各采集设备预传输的压缩视频数据流的码率之和大于预设的写入速度阈值时,设置所述采集阵列中各采集设备的参数的数值,使得所述采集阵列中各采集设备按照设置的所述各采集设备的参数的数值,从相应角度实时同步采集和数据压缩获得的压缩视频数据流的码率之和不大于所述预设的写入速度阈值。
因此,在开始拉流之前,可以确保各采集设备从相应角度实时同步采集和数据压缩获得的压缩视频数据流的码率之和不大于所述预设的写入速度阈值,从而可以避免数据写入拥塞,确保压缩视频数据流在采集、传输和写入的过程中链路畅通,使得各采集设备上传的压缩视频流可以得到实时的处理,进而实现多角度自由视角视频的播放。
在本发明一实施例中,如图18所示,所述数据处理设备180还可以包括:
截帧处理单元188,适于根据接收到的视频帧截取指令,截取各压缩视频数据流中帧级同步的视频帧;
上传单元189,适于将截取得到的视频帧同步上传至所述指定目标端。
其中,所述指定目标端可以是预先设置的目标端,也可以是视频帧截取指令指定的目标端。
由此,将压缩视频数据流截取的视频帧的后续处理交由所述指定目标端进行,从而可以节约网络传输资源,降低现场部署的压力和难度,也可以极大地降低数据处理负荷,缩短多角度自由视角视频帧的传输时延。
在本发明一实施例中,如图18所示,所述截帧处理单元188,可以包括:
基准数据流选取子单元1881,适于确定实时接收的所述采集阵列中各采集设备的压 缩视频数据流中其中一路压缩视频数据流作为基准数据流;
视频帧选取子单元1882,适于基于接收到的视频帧截取指令,确定所述基准数据流中的待截取的视频帧,并选取与所述基准数据流中的待截取的视频帧同步的其余各压缩视频数据流中的视频帧,作为其余各压缩视频数据流的待截取的视频帧;
视频帧截取子单元1883,适于截取各压缩视频数据流中待截取的视频帧。
在本发明一实施例中,如图18所示,所述视频帧选取子单元1882,可以包括以下至少一种:
第一视频帧选取模块18821,适于根据所述基准数据流中的待截取的视频帧中对象的特征信息,选取其余各压缩视频数据流中与所述对象的特征信息一致的视频帧,作为其余各压缩视频数据流的待截取的视频帧;
第二视频帧选取模块18822,适于根据所述基准数据流中的待截取的视频帧的时间戳信息,选取其余各压缩视频数据流中与所述时间戳信息一致的视频帧,作为其余各压缩视频数据流的待截取的视频帧。
本发明实施例还提供了与上述数据处理方法相应的数据处理系统,采用上述数据处理设备实现实时接收多路压缩视频数据流,为使本领域技术人员更好地理解和实现本发明实施例,以下参照附图,通过具体实施例进行详细介绍。
参照图19所示的数据处理系统的结构示意图,在本发明实施例中,数据处理系统190可以包括:采集阵列191和数据处理设备192,所述采集阵列191包括根据预设的多角度自由视角范围置于现场采集区域不同位置的多个采集设备,其中:
所述采集阵列191中各采集设备,适于分别从相应角度实时同步采集原始视频数据,并分别对采集到的原始图像数据进行实时数据压缩,获得从相应角度实时同步采集的压缩视频数据流,且基于所述数据处理设备192发送的拉流指令,将获得的压缩视频数据流实时传输至所述数据处理设备192;
所述数据处理设备192,适于在确定所述采集阵列中各采集设备预传输的压缩视频数据流的码率之和不大于预设的带宽阈值时,分别向所述采集阵列191中各采集设备发送拉流指令,并接收所述采集阵列191中各采集设备实时传输的压缩视频数据流。
采用上述方案,可以避免在现场布置大量服务器进行数据处理,也无需通过SDI采集卡汇总采集的原始数据,再通过现场机房中的计算服务器对原始数据进行处理,可以免于采用昂贵的SDI视频传输线缆和SDI接口,而是通过普通传输网络进行数据传输和拉流,在带宽资源及数据处理资源有限的情况下实现多角度自由视角视频的低时延播放, 降低实施成本。
在本发明一实施例中,所述数据处理设备192还适于在分别向所述采集阵列191中各采集设备发送拉流指令之前,设置所述采集阵列中各采集设备的参数的数值;
其中,所述采集设备的参数包括:采集参数和压缩参数,且所述采集阵列中各采集设备按照设置的所述各采集设备的参数的数值,从相应角度实时同步采集和数据压缩获得的压缩视频数据流的码率之和不大于预设的带宽阈值。
由此,在开始拉流之前,数据处理设备可以对采集阵列中各采集设备的参数的数值进行设置,确保采集阵列中各采集设备的参数的数值统一,各采集设备可以从相应角度实时同步采集和数据压缩,并且获得的压缩视频数据流的码率之和不大于预设的带宽阈值,从而可以避免网络拥塞,在带宽资源有限的情况下也可以实现多角度自由视角视频的低时延播放。
在本发明一实施例中,所述数据处理设备192在分别向所述采集阵列191中各采集设备发送拉流指令之前,确定所述采集阵列191中各采集设备预传输的压缩视频数据流的码率之和是否大于预设的写入速度阈值,并在所述采集阵列191中各采集设备预传输的压缩视频数据流的码率之和大于预设的写入速度阈值时,设置所述采集阵列191中各采集设备的参数的数值,使得所述采集阵列192中各采集设备按照设置的所述各采集设备的参数的数值,从相应角度实时同步采集和数据压缩获得的压缩视频数据流的码率之和不大于所述预设的写入速度阈值。
因此,在开始拉流之前,可以确保各采集设备从相应角度实时同步采集和数据压缩获得的压缩视频数据流的码率之和不大于所述预设的写入速度阈值,从而可以避免数据处理设备的数据写入拥塞,确保压缩视频数据流在采集、传输和写入的过程中链路畅通,使得各采集设备上传的压缩视频流可以得到实时的处理,进而实现多角度自由视角视频的播放。
在具体实施中,所述采集阵列中各采集设备与所述数据处理设备适于通过交换机和/或局域网进行连接。
在本发明一实施例中,所述数据处理系统190还可以包括指定目标端193。
所述数据处理设备192,适于根据接收到的视频帧截取指令,截取各压缩视频流中帧级同步的视频帧,将截取得到的视频帧同步上传至所述指定目标端193;
所述指定目标端193,适于接收所述数据处理设备192基于视频帧截取指令截取得到的视频帧。
其中,所述数据处理设备可以预先通过端口或IP地址与一目标端建立连接,也可以将截取得到的视频帧同步上传至所述视频帧截取指令指定的端口或IP地址。
采用上述方案,可以将采集阵列中各采集设备实时同步采集和数据压缩获得的压缩视频数据流统一传输至数据处理设备,所述数据处理设备在接收到的视频帧截取指令后,经过打点截帧的初步处理,可以将截取到的各压缩视频数据流中帧级同步的视频帧同步上传至所述指定目标端,将压缩视频数据流截取的视频帧的后续处理交由指定目标端进行,从而可以节约网络传输资源,降低现场部署的压力和难度,也可以极大地降低数据处理负荷,缩短多角度自由视角视频帧的传输时延。
在本发明一实施例中,所述数据处理设备192适于确定实时接收的所述采集阵列191中各采集设备的压缩视频数据流中其中一路压缩视频数据流作为基准数据流;并基于接收到的视频帧截取指令,确定所述基准数据流中的待截取的视频帧,并选取与所述基准数据流中的待截取的视频帧同步的其余各压缩视频数据流中的视频帧,作为其余各压缩视频数据流的待截取的视频帧;最后,截取各压缩视频数据流中待截取的视频帧。
为使本领域技术人员更好地理解和实现本发明实施例,以下对数据处理设备与采集设备之间的帧同步方案通过具体实施例进行详细的描述。
参照图20所示的数据同步方法的流程图,在本发明实施例中,具体可以包括如下步骤:
S201,向采集阵列中各采集设备分别发送拉流指令,其中,所述采集阵列中各采集设备根据预设的多角度自由视角范围置于现场采集区域不同位置,并且所述采集阵列中各采集设备分别从相应角度实时同步采集视频数据流。
在具体实施中,为实现拉流同步,可以有多种实现方式。例如可以同时向采集阵列中各采集设备同时发出拉流指令;或者,也可以仅向采集阵列中的主采集设备发送拉流指令,触发主采集设备的拉流,之后,由主采集设备将所述拉流指令同步至所有从采集设备,触发所有从采集设备拉流。
S202,实时接收所述采集阵列中各采集设备基于所述拉流指令分别传输的视频数据流,并确定所述采集阵列中各采集设备分别传输的视频数据流之间是否帧级同步。
在具体实施中,采集设备本身可以具备编码和封装的功能,从而可以将从相应角度实时同步采集到的原始视频数据进行编码和封装。并且,各采集设备还可以具备压缩功能,压缩率越高,在压缩前数据量相同的情况下可以使得压缩后的数据量更小,可以缓解实时同步传输的带宽压力,因此,采集设备可以采用预测编码、变换编码和熵编码等 技术提高视频的压缩率。
S203,在所述采集阵列中各采集设备分别传输的视频数据流之间未帧级同步时,重新向所述采集阵列中各采集设备分别发送拉流指令,直至所述采集阵列中各采集设备分别传输的视频数据流之间帧级同步。
采用上述数据同步方法,通过确定采集阵列中各采集设备分别传输的视频数据流之间是否帧级同步,可以确保多路数据同步传输,从而可以避免漏帧、多帧的传输问题,提升数据处理速度,进而满足多角度自由视角视频低时延播放的需求。
在具体实施中,所述采集阵列中各采集设备通过人工启动时,存在启动时间误差,有可能不在同一时刻开始采集视频数据流。因此,可以采用以下至少一种方式,确保所述采集阵列中各采集设备置分别从相应角度实时同步采集视频数据流:
1、在至少一个采集设备获取到采集起始指令时,获取到所述采集起始指令的采集设备将所述采集起始指令同步到其他采集设备,使得所述采集阵列中各采集设备分别基于所述采集起始指令开始从相应角度实时同步采集视频数据流。
例如,采集阵列中可以包含40个采集设备,当其中采集设备A1获取到所述采集起始指令时,采集设备A1同步向其他采集设备A2-A40发送获取到的所述采集起始指令,在所有采集设备均收到所述采集起始指令后,各采集设备分别基于所述采集起始指令开始从相应角度实时同步采集视频数据流。由于各采集设备之间的数据传输速度远远快于人工启动的速度,因此,可以减少人工启动产生的启动时间误差。
2、所述采集阵列中各采集设备分别基于预设的时钟同步信号从相应角度实时同步采集视频数据流。
例如,可以设置一时钟信号同步装置,各采集设备可以分别连接所述时钟信号同步装置,所述时钟信号同步装置在接收到触发信号(如同步采集起始指令),所述时钟信号同步装置可以向各采集设备发射时钟同步信号,各采集设备分别基于所述时钟同步信号开始从相应角度实时同步采集视频数据流。由于时钟信号发射装置可以基于预设的触发信号,向各采集设备发射时钟同步信号,使得各采集设备可以同步采集,不易受到外部条件和人工操作的干扰,因此,可以提高各采集设备的同步精度及同步效率。
在具体实施中,由于网络传输环境的影响,所述采集阵列中各采集设备有可能无法在同一时刻收到拉流指令,各采集设备之间可能有几毫秒或者更少的时间差,导致各采集设备实时传输的视频数据流不同步,如图21所示,采集阵列中包含采集设备1和2,采集设备1和2的采集参数设置相同,其中采集帧率均为X fps,并且采集设备1和2采 集的视频帧帧级同步采集。
采集设备1和2中每一帧的采集间隔T均为
Figure PCTCN2020124272-appb-000002
假设在t0时刻数据处理设备
发送拉流指令r,采集设备1在t1时刻接收到拉流指令r,采集设备2在t2时刻接收到拉流指令r,若采集设备1和2均在同一采集间隔T内收到,则可以认为采集设备1和2在同一时刻收到拉流指令,采集设备1和2可以分别传输帧级同步的视频数据流;若采集设备1和2不在同一采集间隔内收到,则可以认为采集设备1和2未在同一时刻收到拉流指令,采集设备1和2无法实现帧级视频数据流的同步传输。视频数据流传输的帧级同步也可以称为拉流同步。拉流同步一旦实现,会自动持续到停止拉流。
无法传输帧级同步的视频数据流的原因可以是:
1)需要分别向各采集设备发送拉流指令;
2)局域网在传输拉流指令时存在延时。
因此,可以采用以下至少一种方式确定所述采集阵列中各采集设备分别传输的视频数据流之间是否帧级同步:
1、可以在获取所述采集阵列中各采集设备分别传输的视频数据流的第N帧时,匹配各视频数据流的第N帧的对象的特征信息,当各视频数据流的第N帧的对象的特征信息满足预设的相似阈值时,确定所述采集阵列中各采集设备分别传输的视频数据流的第N帧的对象的特征信息一致,进而各采集设备分别传输的视频数据流之间帧级同步。
其中,N为不小于1的整数,各视频数据流的第N帧的对象的特征信息可以包括形状特征信息、颜色特征信息和位置特征信息等其中至少一种。
2、可以在获取所述采集阵列中各采集设备分别传输的视频数据流的第N帧时,匹配各视频数据流的第N帧的时间戳信息,其中,N为不小于1的整数。当各视频数据流的第N帧的时间戳信息一致时,确定各采集设备分别传输的视频数据流之间帧级同步。
在所述采集阵列中各采集设备分别传输的视频数据流之间未帧级同步时,重新向所述采集阵列中各采集设备分别发送拉流指令,可以采用以上至少一种方式确定是否帧级同步,直至所述采集阵列中各采集设备分别传输的视频数据流之间帧级同步。
在具体实施中,还可以截取各采集设备的视频数据流中的视频帧并传输至指定目标端,为了确保截取的视频帧的帧级同步,如图22所示,可以包括以下步骤:
S221,确定实时接收的所述采集阵列中各采集设备的视频数据流中其中一路视频数据流作为基准数据流。
S222,基于接收到的视频帧截取指令,确定所述基准数据流中的待截取的视频帧,并选取与所述基准数据流中的待截取的视频帧同步的其余各视频数据流中的视频帧,作为其余各视频数据流的待截取的视频帧。
S223,截取各视频数据流中待截取的视频帧。
S224,将截取得到的视频帧同步上传至所述指定目标端。
其中,所述指定目标端可以是预先设置的目标端,也可以是视频帧截取指令指定的目标端。
采用上述方案,可以实现截帧同步,提高截帧效率,进一步提高所生成的多角度自由视角视频的显示效果,增强用户体验。并且,可以降低视频帧的选取和截取的过程与生成多角度自由视角视频的过程的耦合性,增强各流程之间的独立性,便于后期维护,将截取得到的视频帧同步上传至所述指定目标端,可以节约网络传输资源和降低数据处理负载,提升数据处理生成多角度自由视角视频的速度。
为使本领域技术人员更好地理解和实现本发明实施例,以下通过具体的应用示例详细说明如何确定待各视频数据流中待截取的视频帧。
一种方式是根据所述基准数据流中的待截取的视频帧中对象的特征信息,选取其余各视频数据流中与所述对象的特征信息一致的视频帧,作为其余各视频数据流的待截取的视频帧。
例如,采集阵列中包含40个采集设备,因此,可以实时接收40路视频数据流,假设在实时接收的所述采集阵列中各采集设备的视频数据流中,确定采集设备A1’对应的视频数据流A1作为基准数据流,然后,基于接收到的视频帧截取指令中指示截取的视频帧中对象的特征信息X,确定所述基准数据流中与所述对象的特征信息X一致的视频帧a1作为待截取的视频帧,然后根据所述基准数据流中的待截取的视频帧a1中对象的特征信息x1,选取其余各视频数据流A2-A40中与对象的特征信息x1一致的视频帧a2-a40,作为其余各视频数据流的待截取的视频帧。
其中,对象的特征信息可以包括形状特征信息、颜色特征信息和位置特征信息等;所述视频帧截取指令中指示截取的视频帧中对象的特征信息X,与所述基准数据流中的待截取的视频帧a1中对象的特征信息x1可以是对相同的对象的特征信息的同一表示方式,例如,对象的特征信息X和x1均是二维特征信息;对象的特征信息X和对象的特征信息x1也可以是对相同的对象的特征信息的不同表示方式,例如,对象的特征信息X可以是二维特征信息,而对象的特征信息x1可以是三维特征信息。并且,可以预设一个 相似阈值,当满足相似阈值时,可以认为对象的特征信息X与x1一致,或者对象的特征信息x1与其余各视频数据流A2-A40中对象的特征信息x2-x40一致。
对象的特征信息的具体表示方式以及相似阈值可以根据预设的多角度自由视角范围和现场的场景决定,本实施例不做限定。
另一种方式是,根据所述基准数据流中的视频帧的时间戳信息,选取其余各视频数据流中与所述时间戳信息一致的视频帧,作为其余各视频数据流的待截取的视频帧。
例如,采集阵列中可以包含40个采集设备,因此,可以实时接收40路视频数据流,假设在实时接收的所述采集阵列中各采集设备的视频数据流中,确定采集设备B1对应的视频数据流B1作为基准数据流,然后,基于接收到的视频帧截取指令中指示截取的视频帧的时间戳信息Y,确定所述基准数据流中与所述时间戳信息Y对应的视频帧b1作为待截取的视频帧,然后根据所述基准数据流中的待截取的视频帧b1中的时间戳信息y1,选取其余各视频数据流B2-B40中与时间戳信息y1一致的视频帧b2-b40,作为其余各视频数据流的待截取的视频帧。
其中,所述视频帧截取指令中指示截取的视频帧的时间戳信息Y,与所述基准数据流中的待截取的视频帧b1中的时间戳信息y1可以有一定的误差,例如,所述基准数据流中视频帧对应的时间戳信息均与时间戳信息Y不一致,存在0.1ms的误差,则可以预设一个误差范围,例如,误差范围为±1ms,则0.1ms的误差在误差范围内,因此,可以选取与时间戳信息Y相差0.1ms的时间戳信息y1对应的视频帧b1作为基准数据流中的待截取的视频帧。具体的误差范围以及基准数据流中的时间戳信息y1的选取规则可以根据现场的采集设备和传输网络决定,本实施例不做限定。
可以理解的是,上述实施例中确定待各视频数据流中待截取的视频帧的方法可以单独使用,也可以同时使用,本发明实施例不做限定。
采用上述方案,可以提高视频帧的同步选取和同步截取的效率和结果准确率,从而可以提升传输数据的完整性和同步性。
本发明实施例还提供了与上述数据处理方法相应的数据处理设备,为使本领域技术人员更好地理解和实现本发明实施例,以下参照附图,通过具体实施例进行详细介绍。
参照图23所示的数据处理设备的结构示意图,在本发明实施例中,如图23所示,数据处理设备230可以包括:
指令发送单元231,适于向采集阵列中各采集设备分别发送拉流指令,其中,所述采集阵列中各采集设备根据预设的多角度自由视角范围置于现场采集区域不同位置,并 且所述采集阵列中各采集设备置分别从相应角度实时同步采集视频数据流;
数据流接收单元232,适于实时接收所述采集阵列中各采集设备基于所述拉流指令分别传输的视频数据流;
第一同步判断单元233,适于确定所述采集阵列中各采集设备分别传输的视频数据流之间是否帧级同步,并在所述采集阵列中各采集设备分别传输的视频数据流之间未帧级同步时,重新触发所述指令发送单元231,直至所述采集阵列中各采集设备分别传输的视频数据流之间帧级同步。
其中,所述数据处理设备可以根据实际情景设置。例如,当现场有空余空间时,所述数据处理设备可以置于现场非采集区域,作为现场服务器;当现场没有空余空间时,所述数据处理设备可以置于云端,作为云端服务器。
采用上述数据处理设备,通过确定采集阵列中各采集设备分别传输的视频数据流之间是否帧级同步,可以确保多路数据同步传输,从而可以避免漏帧、多帧的传输问题,提升数据处理速度,进而满足多角度自由视角视频低时延播放的需求。
在本发明一实施例中,如图23所示,数据处理设备230还可以包括:
基准视频流确定单元234,适于确定实时接收到的所述采集阵列中各采集设备的视频数据流中其中一路视频数据流作为基准数据流;
视频帧选取单元235,适于基于接收到的视频帧截取指令,确定所述基准数据流中的待截取的视频帧,并选取与所述基准数据流中的待截取的视频帧同步的其余各视频数据流中的视频帧,作为其余各视频数据流的待截取的视频帧;
视频帧截取单元236,适于截取各视频数据流中待截取的视频帧;
上传单元237,适于将截取得到的视频帧同步上传至所述指定目标端。
其中,所述数据处理设备230可以预先通过端口或IP地址与一目标端建立连接,也可以将截取得到的视频帧同步上传至所述视频帧截取指令指定的端口或IP地址。
采用上述方案,可以实现截帧同步,提高截帧效率,进一步提高所生成的多角度自由视角视频的显示效果,增强用户体验。并且,降低视频帧的选取和截取的过程与生成多角度自由视角视频的过程的耦合性,增强各流程之间的独立性,便于后期维护,将截取得到的视频帧同步上传至所述指定目标端,可以节约网络传输资源和降低数据处理负载,提升数据处理生成多角度自由视角视频的速度。
在本发明一实施例中,如图23所示,所述视频帧选取单元235包括以下至少一种:
第一视频帧选取模块2351,适于根据所述基准数据流中的待截取的视频帧中对象的 特征信息,选取其余各视频数据流中与所述对象的特征信息一致的视频帧,作为其余各视频数据流的待截取的视频帧;
第二视频帧选取模块2352,适于根据所述基准数据流中的视频帧的时间戳信息,选取其余各视频数据流中与所述时间戳信息一致的视频帧,作为其余各视频数据流的待截取的视频帧。
采用上述方案,可以提高视频帧的同步选取和同步截取的效率和结果准确率,从而可以提升传输数据的完整性和同步性。
本发明实施例还提供了与上述数据处理方法相应的数据同步系统,采用上述数据处理设备实现实时接收多路视频数据流,为使本领域技术人员更好地理解和实现本发明实施例,以下参照附图,通过具体实施例进行详细介绍。
参照图24所示的数据同步系统的结构示意图,在本发明实施例中,所述数据同步系统240可以包括:置于现场采集区域的采集阵列241和置于与所述采集阵列链路连接的数据处理设备242,所述采集阵列241包括多个采集设备,所述采集阵列241中各采集设备根据预设的多角度自由视角范围至于现场采集区域不同位置,其中:
所述采集阵列241中各采集设备,适于分别从相应角度实时同步采集视频数据流,并基于所述数据处理设备242发送的拉流指令,将获得的视频数据流实时传输至所述数据处理设备242;
所述数据处理设备242,适于向所述采集阵列241中各采集设备分别发送拉流指令,且实时接收所述采集阵列241中各采集设备基于所述拉流指令分别传输的视频数据流,并在所述采集阵列241中各采集设备分别传输的视频数据流之间未帧级同步时,重新向所述采集阵列241中各采集设备分别发送拉流指令,直至所述采集阵列241中各采集设备传输的视频数据流之间帧级同步。
采用本发明实施例中的数据同步系统,通过确定采集阵列中各采集设备分别传输的视频数据流之间是否帧级同步,可以确保多路数据同步传输,从而可以避免漏帧、多帧的传输问题,提升数据处理速度,进而满足多角度自由视角视频低时延播放的需求。
在具体实施中,所述数据处理设备242,还适于确定实时接收的所述采集阵列241中各采集设备的视频数据流中其中一路视频数据流作为基准数据流;基于接收到的视频帧截取指令,确定所述基准数据流中的待截取的视频帧,并选取与所述基准数据流中的待截取的视频帧同步的其余各视频数据流中的视频帧,作为其余各视频数据流的待截取的视频帧;截取各视频数据流中待截取的视频帧并将截取得到的视频帧同步上传至指定 目标端243。
其中,所述数据处理设备240可以预先通过端口或IP地址与一目标端建立连接,也可以将截取得到的视频帧同步上传至所述视频帧截取指令指定的端口或IP地址。
在本发明一实施例中,所述数据同步系统240还可以包括云端服务器,适于作为指定目标端243。
在本发明另一实施例中,如图34所示,所述数据同步系统240还可以包括播放控制设备341,适于作为指定目标端243。
在本发明又一实施例中,如图35所示,所述数据同步系统240还可以包括交互终端351,适于作为指定目标端243。
在本发明一实施例中,可以采用以下至少一种方式,确保所述采集阵列241中各采集设备置分别从相应角度实时同步采集视频数据流:
1、所述采集阵列中各采集设备之间通过同步线进行连接,其中,在至少一个采集设备获取到采集起始指令时,获取到所述采集起始指令的采集设备通过同步线将所述采集起始指令同步到其他采集设备,使得所述采集阵列中各采集设备分别基于所述采集起始指令开始从相应角度实时同步采集视频数据流;
2、所述采集阵列中各采集设备分别基于预设的时钟同步信号从相应角度实时同步采集视频数据流。
为使本领域技术人员更好地理解和实现本发明实施例,以下通过具体的应用场景详细说明数据同步系统,如图25所示应用场景中的数据同步系统的结构示意图,其中,所述数据同步系统包括由各采集设备组成的采集阵列251、数据处理设备252、云端的服务器集群253。
所述采集阵列251中各采集设备中的至少一个采集设备获取到采集起始指令,并通过同步线254将获取到的所述采集起始指令同步到其他采集设备,使得所述采集阵列中各采集设备分别基于所述采集起始指令开始从相应角度实时同步采集视频数据流。
所述数据处理设备252可以通过无线局域网向所述采集阵列251中各采集设备分别发送拉流指令。所述采集阵列251中各采集设备基于所述数据处理设备252发送的拉流指令,通过交换机255将获得的视频数据流实时传输至所述数据处理设备252。
所述数据处理设备252确定在所述采集阵列251中各采集设备分别传输的视频数据流之间是否帧级同步,并在所述采集阵列251中各采集设备分别传输的视频数据流之间未帧级同步时,重新向所述采集阵列251中各采集设备分别发送拉流指令,直至所述采 集阵列251中各采集设备传输的视频数据流之间帧级同步。
所述数据处理设备252确定所述采集阵列251中各采集设备传输的视频数据流之间帧级同步后,确定实时接收的所述采集阵列251中各采集设备的视频数据流中其中一路视频数据流作为基准数据流,并且,在接收到的视频帧截取指令之后,根据所述视频帧截取指令确定所述基准数据流中的待截取的视频帧,然后,所述数据处理设备252选取与所述基准数据流中的待截取的视频帧同步的其余各视频数据流中的视频帧,作为其余各视频数据流的待截取的视频帧,再截取各视频数据流中待截取的视频帧并将截取得到的视频帧同步上传至云端。
云端的服务器集群253会对截取得到的视频帧做后续处理,获得用于播放的多角度自由视角视频。
在具体实施中,所述云端的服务器集群253可以包括:第一云端服务器2531,第二云端服务器2532,第三云端服务器2533,第四云端服务器2534。其中,所述第一云端服务器2531可以用于参数计算;所述第二云端服务器2532可以用于深度计算,生成深度图;所述第三云端服务器2533可以用于DIBR对预设的虚拟视点路径进行帧图像重建;所述第四云端服务器2534可以用于生成多角度自由视角视频。
可以理解的是,数据处理设备可以根据实际情景置于现场非采集区域,或者置于云端,所述数据同步系统在实际应用中可以采用云端服务器、播放控制设备或者交互终端中的至少一种作为视频帧截取指令的发射端,也可以采用其他能够发射视频帧截取指令的设备,本发明实施例不做限制。
需要说明的是,前述实施例中的数据处理系统等均可以应用本发明实施例中的数据同步系统。
本发明实施例还提供了与上述数据处理方法相应的采集设备,所述采集设备适于在获取到采集起始指令时,将所述采集起始指令同步到其他采集设备,并开始从相应角度实时同步采集视频数据流,以及在接收到数据处理设备发送的拉流指令时,将获得的视频数据流实时传输至所述数据处理设备。为使本领域技术人员更好地理解和实现本发明实施例,以下参照附图,通过具体实施例进行详细介绍。
参照图36所示的采集设备的结构示意图,在本发明实施例中,所述采集设备360包括:光电转换摄像组件361,处理器362、编码器363、传输部件365,其中:
光电转换摄像组件361,适于采集图像;
所述处理器362,适于在获取到采集起始指令时,将所述采集起始指令通过传输部 件365同步到其他采集设备,并开始将所述光电转换摄像组件361采集到的图像进行实时处理,得到图像数据序列,以及在获取到拉流指令时,通过传输部件365将获得的视频数据流实时传输至数据处理设备;
所述编码器363,适于将所述图像数据序列进行编码,获得相应的视频数据流。
作为一种可选方案,如图36所示,所述采集设备360还可以包括录音组件364,适于采集声音信号,获得音频数据。
通过处理器362可以将采集到的图像数据序列和音频数据进行处理,然后可以通过编码器363将所述采集到的图像数据序列和音频数据进行编码,获得相应的视频数据流。且所述处理器362在获取到采集起始指令时,可以通过传输部件365将所述采集起始指令同步到其他采集设备;在接收到拉流指令时,通过传输部件365将获得的视频数据流实时传输至所述数据处理设备。
在具体实施中,所述采集设备可以根据预设的多角度自由视角范围置于现场采集区域不同位置,所述采集设备可以固定设置于现场采集区域的某一点,也可以在现场采集区域内移动从而组成采集阵列。因此,所述采集设备可以是固定的设备,也可以是移动的设备,由此可以多角度灵活采集视频数据流。
如图37所示,为本发明实施例中一种应用场景中采集阵列的示意图,以舞台中心作为核心看点,以核心看点为圆心,核心看点位于同一平面的扇形区域作为预设的多角度自由视角范围。所述采集阵列中采集设备371-375根据所述预设的多角度自由视角范围,成扇形置于现场采集区域不同位置。采集设备376为可移动设备,可以根据指令移动到指定位置,进行灵活采集。并且,采集设备可以是手持设备,用以在采集设备发生故障时或者在空间狭小区域增补采集数据,例如,图37中位于舞台观众区域的手持设备377可以加入采集阵列中,用以提供舞台观众区域的视频数据流。
如前所述,为生成多角度自由视角数据,需要进行深度图计算,但目前深度图计算的时间较长,如何减少深度图生成的时间,提升深度图生成速率成为亟待解决的问题。
针对上述问题,本发明实施例提供计算节点集群,多个计算节点可以同时对同一采集阵列同步采集到的纹理数据并行地、批处理式地生成深度图。具体而言,深度图计算过程可以分为为通过第一深度计算得到粗略深度图,以及确定粗略深度图中的不稳定区及之后的第二深度计算等多个步骤,在各步骤中,计算节点集群中的多个计算节点可以并行对多个采集设备采集到的纹理数据进行第一深度计算,得到粗略深度图,以及并行地对得到的粗略深度图进行验证及进行第二深度计算,从而可以节约深度图计算的时间, 提升深度图生成速率。以下参照附图,通过具体实施例进一步详细说明。
参照图26所示的一种深度图生成方法的流程图,在本发明实施例中,采用计算节点集群中多个计算节点分别进行深度图生成,为描述方便,将所述计算节点集群中任一计算节点称为第一计算节点。以下通过具体步骤对所述计算节点集群的深度图生成方法进行详细说明:
S261,接收纹理数据,所述纹理数据为同一采集阵列中的多个采集设备同步采集。
在具体实施中,所述多个采集设备可以根据预设的多角度自由视角范围置于现场采集区域不同位置,所述采集设备可以固定设置于现场采集区域的某一点,也可以在现场采集区域内移动从而组成采集阵列。其中,所述多角度自由视角可以是指使得场景能够自由切换的虚拟视点的空间位置以及视角。例如,多角度自由视角可以是6自由度(6DoF)的视角,采集阵列中所采用的采集设备可以为通用的相机、摄像头、录像机、手持设备如手机等,具体实现可以参见本发明其他实施例,此处不再赘述。
所述纹理数据即前述采集设备采集到的二维图像帧的像素数据,可以为一个帧时刻的图像,也可以为连续或非连续的帧图像形成的视频流对应的帧图像的像素数据。
S262,所述第一计算节点根据第一纹理数据和第二纹理数据,进行第一深度计算,得到第一粗略深度图。
这里,为描述更加清楚简洁,将纹理数据中与所述第一计算节点满足预设的第一映射关系的纹理数据称为第一纹理数据;将与所述第一纹理数据的采集设备满足预设的第一空间位置关系的采集设备采集的纹理数据称为第二纹理数据。
在具体实施中,可以基于预先设置的第一映射关系表或通过随机映射,得到所述第一映射关系。例如,可以根据计算节点集群中计算节点的数量以及纹理数据对应的采集阵列中采集设备的数量预先分配各计算节点所处理的纹理数据。可以设置专门的分配节点对计算节点集群中各计算节点的计算任务进行分配,分配节点可以基于预先设置的第一映射关系表或通过随机映射,得到所述第一映射关系。例如,若采集阵列中共有40台采集设备,为达到最高的并发处理效率,可以配置40个计算节点,每台采集设备对应一个计算节点。而若仅有20个计算节点,在各计算节点处理能力相同或大致相当的情况下,则为达到最高的并发处理效率及均衡负载的需求,可以设置每个计算节点对应两台采集设备采集到的纹理数据。具体可以设置纹理数据对应的采集设备标识与每个计算节点的标识之间的映射关系,作为所述第一映射关系,并基于所述第一映射关系直接将采集阵列中相应的采集设备采集到的纹理数据分发至对应的计算节点。在具体实施中,也可以 随机分配计算任务,将采集阵列中各采集设备采集到的纹理数据随机分配到计算节点集群中的各计算节点上,为此,为提高处理效率,可以提前将采集阵列采集到的所有纹理数据在计算节点集群中的每一个计算节点上均复制一份。
作为一个示例,服务器集群中任一服务器均可以根据所述第一纹理数据以及所述第二纹理数据,进行第一深度计算。
其中,对于所述第一纹理数据和所述第二纹理数据的预设的第一空间位置关系,例如所述第二纹理数据可以为与所述第一纹理数据的采集设备满足预设的第一距离关系的采集设备采集到的纹理数据,或者为与所述第一纹理数据的采集设备满足预设的第一数量关系的采集设备采集到的纹理数据,也可以为与所述第一纹理数据的采集设备满足预设的第一距离关系且满足预设的第一数量关系的采集设备采集到的纹理数据。
其中,所述第一预设数量可以取1至N-1的任意整数值,N为所述采集阵列中采集设备的总量。在本发明一实施例中,所述第一预设数量取2,从而可以以尽量少的运算量得到尽可能高的图像质量。例如,假设预设的第一映射关系中计算节点9与相机9对应,则可以利用相机9的纹理数据,以及与相机9相邻的相机5、6、7、10、11、12的纹理数据,计算得到相机9的粗略深度图。
可以理解的是,在具体实施中,所述第二纹理数据也可以是与所述第一纹理数据的采集设备满足其他类型的第一空间位置关系的采集设备采集的数据,例如所述第一空间位置关系还可以是满足预设角度、满足预设相对位置等。
S263,所述第一计算节点将所述第一粗略深度图同步至所述计算节点集群中的其余计算节点,得到粗略深度图集。
在经过了深度图粗略计算后所得到的粗略深度图,需要进行交叉验证,来确定每个粗略深度图中的不稳定区域,以在下一步骤中进行精细化的求解。其中,对于粗略深度图集中的任一粗略深度图,需要通过该粗略深度图对应的采集设备周围多个采集设备对应的粗略深度图来进行交叉验证。(典型情况下为待验证的粗略深度图和所有其他采集设备对应的粗略深度图来一起交叉验证),因此需要将各个计算节点计算得到的粗略深度图分别同步至所述计算节点集群中的其余计算节点,经步骤S263同步后,计算节点集群中各个计算节点均得到计算节点集群中其余计算节点计算得到的粗略深度图,每个服务器得到完全相同的粗略深度图集。
S264,所述第一计算节点对于所述粗略深度图集中的第二粗略深度图,采用第三粗略深度图进行验证,得到所述第二粗略深度图中的不稳定区域。
其中,所述第二粗略深度图与所述第一计算节点可以满足预设的第二映射关系;所述第三粗略深度图可以为与所述第二粗略深度图对应的采集设备满足预设的第二空间位置关系的采集设备对应的粗略深度图。
可以基于预先设置的第二映射关系表或通过随机映射,得到所述第二映射关系。例如,可以根据计算节点集群中计算节点的数量以及纹理数据对应的采集阵列中采集设备的数量预先分配各计算节点所处理的纹理数据。在具体实施中,可以设置专门的分配节点对计算节点集群中各计算节点的计算任务进行分配,分配节点可以基于预先设置的第二映射关系表或通过随机映射,得到所述第二映射关系。设置第二映射关系的具体示例可以参见前述第一映射关系的实现示例。
可以理解的是,在具体实施中,所述第二映射关系可以与所述第一映射关系完全对应,也可以与所述第一映射关系不对应。例如在相机数量与计算节点数量相等的情况下,可以按照硬件标识将数据(包括纹理数据、粗略深度图)对应的采集设备与处理数据的计算节点的标识,建立一一对应的第二映射关系。
可以理解的是,这里第一粗略深度图、第二粗略深度图与第三粗略深度图的描述,仅为描述清楚及简洁。在具体实施中,所述第一粗略深度图可以与所述第二粗略深度图相同,也可以不同;所述第三粗略深度图对应的采集设备与所述第二粗略深度图对应的采集设备满足预设的第二空间位置关系即可。
对于所述第二空间位置关系,作为具体示例,所述第三粗略深度图对应的纹理数据可以为与所述第二粗略深度图对应的采集设备满足预设的第二距离关系的采集设备采集到的纹理数据,或者所述第三纹理深度图对应的纹理数据可以为与所述第二粗略深度图对应的采集设备满足预设的第二数量关系的采集设备采集到的纹理数据,又或者,所述第三粗略深度图对应的纹理数据为与所述第二粗略深度图对应的采集设备满足预设的第二距离关系及第二数量关系的采集设备采集到的纹理数据。
其中,所述第二预设数量可以取1至N-1的任意整数值,N为所述采集阵列中采集设备的总量。在具体实施中,所述第二预设数量可以和所述第一预设数量相等,也可以不等。在本发明一实施例中,所述第二预设数量取2,从而可以以尽量少的运算量得到尽可能高的图像质量。
在具体实施中,所述第二空间位置关系也可以为其他类型的空间位置关系,例如满足预设角度、满足预设相对位置等。
S265,所述第一计算节点根据所述第二粗略深度图中的不稳定区域、所述第二粗略 深度图对应的纹理数据以及所述第三粗略深度图对应的纹理数据,进行第二深度计算,得到对应的精细深度图。
这里需要说明的是,第二深度计算与第一深度计算的不同之处在于,第二深度计算所选取的第二粗略深度图中的深度图候选值不包含所述不稳定区域的深度值,从而可以排除生成的深度图中的不稳定区域,使得所生成的深度图更加精确,进而可以提升生成的多角度自由视角图像的质量。
以一个应用场景示例说明:
可以由服务器S基于分配的相机M的纹理数据以及与所述相机M满足预设的第一空间位置关系的相机的纹理数据,进行第一轮深度计算(第一深度计算),得到粗略深度图。
在步骤S264的交叉验证后,可以继续在同一台服务器上,连贯地进行深度图的精细化求解。具体而言,所述服务器S可以将分配的相机M对应的粗略深度图与所有其他粗略深度图的结果进行交叉验证,可以得到相机M对应的粗略深度图中的不稳定区域,之后,服务器S可以将分配的相机M对应的粗略深度图中的不稳定区域、相机M采集到的纹理数据以及相机M周围N个相机的纹理信息,再进行一轮深度图计算(第二深度计算),即可以得到第一纹理数据(相机M采集到的纹理数据)对应的精细化深度图。
这里相机M对应的粗略深度图为基于相机M采集到的纹理数据以及与所述相机M满足预设的第一空间位置关系的采集设备采集到的纹理数据,计算得到的粗略深度图。
S266,将所述各计算节点得到的精细深度图的精细深度图集作为最终生成的深度图。
采用上述实施例,多个计算节点可以同时对同一采集阵列同步采集到的纹理数据并行地、批处理式地进行深度图生成,因而可以极大地提高深度图生成效率。
此外,采用上述方案通过二次深度计算,排除生成的深度图中的不稳定区域,因此所得到的精细深度图更加精确,进而可以提升生成的多角度自由视角图像的质量。
在具体实施中,根据待处理的纹理数据的数据量的大小及对深度图生成速度的需求,可以选用适当的计算节点集群中计算节点的配置及计算节点的数量。例如,所述计算节点集群可以为由多台服务器组成的服务器集群,所述服务器集群中多台服务器可以集中部署,也可以位于分布式部署。在本发明一些实施例中,所述计算节点集群中部分或全部计算节点设备可以作为本地服务器,或者可以作为边缘节点设备,或者作为云端计算设备。
又如,所述计算节点集群还可以为多个CPU或GPU形成的计算设备。本发明实施 例还提供了一种计算节点,适于与至少另一个计算节点形成计算节点集群,用以生成深度图,参照图27所示的计算节点的结构示意图,计算节点270可以包括:
输入单元271,适于接收纹理数据,所述纹理数据源自同一采集阵列中的多个采集设备同步采集;
第一深度计算单元272,适于根据第一纹理数据和第二纹理数据,进行第一深度计算,得到第一粗略深度图,其中:所述第一纹理数据与所述计算节点满足预设的第一映射关系;所述第二纹理数据为与所述第一纹理数据的采集设备满足预设的第一空间位置关系的采集设备采集的纹理数据;
同步单元273,适于将所述第一粗略深度图同步至所述计算节点集群中的其余计算节点,得到粗略深度图集;
验证单元274,对于所述粗略深度图集中的第二粗略深度图,适于采用第三粗略深度图进行验证,得到所述第二粗略深度图中的不稳定区域,其中:所述第二粗略深度图与所述计算节点满足预设的第二映射关系;所述第三粗略深度图为与所述第二粗略深度图对应的采集设备满足预设的第二空间位置关系的采集设备对应的粗略深度图;
第二深度计算单元275,适于根据所述第二粗略深度图中的不稳定区域、所述第二粗略深度图对应的纹理数据以及所述第三粗略深度图对应的纹理数据,进行第二深度计算,得到对应的精细深度图,其中:第二深度计算所选取的第二粗略深度图中的深度图候选值不包含所述不稳定区域的深度值;
输出单元276,适于将所述精细深度图输出,以使所述计算节点集群得到精细深度图集作为最终生成的深度图。
采用上述计算节点,深度图计算过程可以包括通过第一深度计算得到粗略深度图,以及确定粗略深度图中的不稳定区及之后的第二深度计算等多个步骤,通过上述步骤进行深度图计算,利于多个计算节点分别计算,从而可以提升深度图的生成效率。
本发明实施例还提供了一种计算节点集群,所述计算节点集群可以包括多个计算节点,所述计算节点集群中多个计算节点可以同时对同一采集阵列同步采集到的纹理数据并行地、批处理式地进行深度图生成。为描述方便,将所述计算节点集群中任一计算节点称为第一计算节点。
在本发明一些实施例中,所述第一计算节点,适于根据接收到的纹理数据中的第一纹理数据和第二纹理数据,进行第一深度计算,得到第一粗略深度图;将所述第一粗略深度图同步至所述计算节点集群中的其余计算节点,得到粗略深度图集;对于所述粗略 深度图集中的第二粗略深度图,采用第三粗略深度图进行验证,得到所述第二粗略深度图中的不稳定区域;及根据所述第二粗略深度图中的不稳定区域、所述第二粗略深度图对应的纹理数据以及所述第三粗略深度图对应的纹理数据,进行第二深度计算,得到对应的精细深度图,及将获得的精细深度图输出以使得所述计算节点集群将得到的精细深度图集作为最终生成的深度图;
其中,所述第一纹理数据与所述第一计算节点满足预设的第一映射关系;所述第二纹理数据为与所述第一纹理数据的采集设备满足预设的第一空间位置关系的采集设备采集的纹理数据;所述第二粗略深度图与所述第一计算节点满足预设的第二映射关系;所述第三粗略深度图为与所述第二粗略深度图对应的采集设备满足预设的第二空间位置关系的采集设备对应的粗略深度图;且第二深度计算所选取的第二粗略深度图中的深度图候选值不包含所述不稳定区域的深度值。
参照图28所示的服务器集群进行深度图处理的示意图,其中,相机阵列中N个相机采集的纹理数据分别输入服务器集群中N台服务器,首先分别进行第一深度计算,得到粗略深度图1~N,之后,各服务器将自身计算得到的粗略深度图分别复制至服务器集群其他的服务器上并实现时间同步,之后,各服务器分别对自身分配的粗略深度图进行验证,并进行第二深度计算,得到精细计算后的深度图,作为服务器集群生成的深度图。由上述计算过程可以看出,服务器集群中各服务器可以并行地对多个相机采集到的纹理数据进行第一深度计算、以及对粗略深度图集中的各粗略深度图进行验证以及第二深度计算,整个深度图生成过程有多台服务器并行进行,因而可以极大地节约深度图计算的时间,提升深度图生成效率。
本发明实施例中的计算节点和计算节点集群的具体实现方式和有益效果,可以参见本发明前述实施例中的深度图生成方法,在此不再赘述。
服务器集群进而可以将生成的深度图存储,或者根据请求输出至终端设备,以进一步进行虚拟视点图像的生成及展示,此处不再赘述。
本发明实施例还提供了一种计算机可读存储介质,其上存储有计算机指令,所述计算机指令运行时可以执行前述任一实施例所述深度图生成方法的步骤,具体可以参见前述深度图生成方法的步骤,此处不再赘述。
另外,目前已知的基于深度图像绘制(Depth-Image-Based Rendering,DIBR)的虚拟视点图像生成方法,难以满足播放中多角度自由视角应用的需求。
发明人经研究发现,目前的DIBR虚拟视点图像生成方法并发度不高,通常由CPU 进行处理,然而,对于每一虚拟视点图像,由于生成方法涉及较多步骤,每一步骤也均较为复杂,所以也比较难以通过并行处理方法来实现。
为解决上述问题,本发明实施例提供一种可以通过并行处理生成虚拟视点图像的方法,可以使多角度自由视角的虚拟视点图像生成的时效性能大大加速,从而可以满足多角度自由视角视频低时延播放和实时互动的需求,提升用户体验。
为使本领域技术人员对本发明实施例的目的、特征集优点更加明显易懂,以下结合附图对本发明的具体实施例进行详细的说明。
参照图29所示的虚拟视点图像生成方法的流程图,在具体实施中,可以通过如下步骤生成虚拟视点图像:
S291,获取多角度自由视角的图像组合、所述图像组合的参数数据以及预设的虚拟视点路径数据,其中,所述图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图。
其中,所述多角度自由视角可以是指使得场景能够自由切换的虚拟视点的空间位置及视角。多角度自由视角范围可以根据应用场景的需要确定。
在具体实施中,可以通过在现场布置由多个采集设备组成的采集阵列,所述采集阵列中各采集设备可以根据预设的多角度自由视角范围置于现场采集区域的不同位置,各采集设备可以同步采集现场图像,获得多个角度同步的纹理图。例如,可以通过多个相机、摄像机等对某一场景进行多个角度的同步的图像采集。
所述多角度自由视角的图像组合中的图像,可以为完全的自由视角的图像。在具体实施中,可以为6自由度(Degree of Freedom,DoF)的视角,也即可以自由切换视点的空间位置以及视角。如前所述,视点的空间位置可以表示为坐标(x,y,z),视角可以表示为三个旋转方向
Figure PCTCN2020124272-appb-000003
故可以称为6DoF。
在虚拟视点图像生成过程中,可以先获取多角度自由视角的图像组合,以及所述图像组合的参数数据。
在具体实施中,图像组合中的纹理图与深度图一一对应。其中,纹理图可以采用任意类型的二维图像格式,例如可以是BMP、PNG、JPEG、webp格式等其中任意一种格式。深度图可以表示场景中各点相对于拍摄设备的距离,即深度图中每一个像素值表示场景中某一点与拍摄设备之间的距离。
图像组合中的纹理图即同步的多个二维图像。可以基于所述多个二维图像,确定每个二维图像的深度数据。
其中,深度数据可以包括与二维图像的像素对应的深度值。采集设备到待观看区域中各个点的距离可以作为上述深度值,深度值可以直接反映待观看区域中可见表面的几何形状。例如,深度值可以是待观看区域中各个点沿着相机光轴到光心的距离,相机坐标系的原点可以作为光心。本领域技术人员可以理解的是,该距离,可以是相对数值,多个图像采用同样的基准即可。
深度数据可以包括与二维图像的像素一一对应的深度值,或者,可以是对与二维图像的像素一一对应的深度值集合中选取的部分数值。本领域技术人员可以理解的是,深度值集合可以存储为深度图的形式,在具体实施中,深度数据可以是对原始深度图进行降采样后得到的数据,与二维图像(纹理图)的像素一一对应的深度值集合按照二维图像(纹理图)的像素点排布存储的图像形式为原始深度图。
在本发明实施例中,可以通过如下步骤获得多角度自由视角的图像组合,以及所述图像组合的参数数据,以下通过具体应用场景进行说明。
作为本发明一具体实施例,可以包括如下步骤:第一步是采集和深度图的计算,包括三个主要步骤,分别为:多摄像机的视频采集(Multi-camera Video Capturing)、摄像机内外参计算(Camera Parameter Estimation),以及深度图计算(Depth Map Calculation)。对于多摄像机采集来说,要求各个摄像机采集的视频可以帧级对齐。
通过多摄像机的视频采集可以得到纹理图(Texture Image),也即同步的多个图像;通过摄像机内外参计算,可以得到摄像机参数(Camera Parameter),也即图像组合的参数数据,包括内部参数数据和外部参数数据;通过深度图计算,可以得到深度图(Depth Map)。
图像组合中同步的多组存在对应关系的纹理图和深度图可以拼接在一起,形成一帧拼接图像。拼接图像可以有多种拼接结构。每一帧拼接图像均可以作为一个图像组合。图像组合中多组纹理图及深度图可以按照预设的关系进行拼接及组合排列。具体而言,图像组合的纹理图和深度图根据位置关系可以区分为纹理图区域和深度图区域,纹理图区域分别存储各个纹理图的像素值,深度图区域按照预设的位置关系分别存储各纹理图对应的深度值。纹理图区域和深度图区域可以是连续的,也可以是间隔分布的。本发明实施例中对图像组合中纹理图和深度图的位置关系不做任何限制。
在具体实施中,可以从图像的属性信息中获取到图像组合中各图像的参数数据。其中,所述参数数据可以包括外部参数数据,还可以包括内部参数数据。外部参数数据用于描述拍摄设备的空间坐标及姿态等,内部参数数据用于表述拍摄设备的光心、焦距等 拍摄设备的属性信息。内部参数数据还可以包括畸变参数数据。畸变参数数据包括径向畸变参数数据和切向畸变参数数据。径向畸变发生在拍摄设备坐标系转图像物理坐标系的过程中。而切向畸变是发生在拍摄设备制作过程,其是由于感光元平面跟透镜不平行。基于外部参数数据可以确定图像的拍摄位置、拍摄角度等信息。在虚拟视点图像生成中,结合包括畸变参数数据在内的内部参数数据可以使所确定的空间映射关系更加准确。
在具体实施中,虚拟视点路径可以预先设置。例如,对于一场体育比赛,如篮球赛或足球赛,可以预先规划好一个弧形路径,例如每当出现一个精彩的镜头,都按照这个弧形路径生成相应的虚拟视点图像。
在具体应用过程中,可以基于现场中特定的位置或视角(如篮下、场边、裁判视角、教练视角等等),或者基于特定对象(例如球场上的球员、现场的主持人、观众,以及影视图像中的演员等)设置虚拟视点路径。
所述虚拟视点路径对应的路径数据可以包括路径中一系列的虚拟视点的位置数据。
S292,根据所述预设的虚拟视点路径数据及所述图像组合的参数数据,从图像组合中选择虚拟视点路径中各虚拟视点的相应组的纹理图和深度图。
在具体实施中,可以根据所述虚拟视点路径数据中各虚拟视点的位置数据及所述图像组合的参数数据,从所述图像组合中选择与各虚拟视点位置满足预设位置关系和/或数量关系的相应组的纹理图和深度图。例如,对于在相机密度较大的虚拟视点位置区域,可以仅选择离所述虚拟视点最近的两个相机拍摄的纹理图及对应的深度图,而在相机密度较小的虚拟视点位置区域,可以选择离所述虚拟视点最近的三个或四个相机拍摄的纹理图及对应的深度图。
在本发明一实施例中,可以分别选择离虚拟视点路径中的各虚拟视点位置最近的2至N个采集设备对应的纹理图和深度图,其中,N为采集阵列中所有采集设备的数量。例如,可以默认选择离各虚拟视点位置最近的两个采集设备对应的纹理图和深度图。在具体实施中,用户可以自己设置所选择的离所述虚拟视点位置最近的采集设备的数量,最大不超过所述图像组合所对应的采集设备的数量。
采用这一方式,对采集阵列中采集设备的空间位置分布没有特别的要求(例如可以为线状分布、弧形阵列排布,或者是任何不规则的排布形式),根据获取到的所述虚拟视点位置数据及图像组合对应的参数数据,确定采集设备的实际分布状况,进而采用适应性的策略选择图像组合中相应组的纹理图和深度图的选择,从而可以在减小数据运算量、保证生成的虚拟视点图像质量的情况下,提供较高的选择自由度及灵活性,此外也 降低了对采集阵列中采集设备的安装要求,便于适应不同的场地需求及安装易操作性。
在本发明一实施例中,根据所述虚拟视点位置数据及所述图像组合的参数数据,从图像组合中选择离所述虚拟视点位置最近的预设数量的相应组的纹理图和深度图。
可以理解的是,在具体实施中,也可以采用预设的其他规则从所述图像组合中选择相应组的纹理图和深度图。例如还可以根据虚拟视点图像生成设备的处理能力、或者可以根据用户对生成速度的要求,对生成图像的清晰度要求(如普清、高清或超清,等等)从所述图像组合中选择相应组的纹理图和深度图。
S293,将各虚拟视点的相应组的纹理图和深度图输入至图形处理器中,针对虚拟视点路径中各虚拟视点,以像素点为处理单位,由多个线程分别将选择的图像组合中相应组的纹理图和深度图中的像素点进行组合渲染,得到所述虚拟视点对应的图像。
图形处理器(Graphics Processing Unit,GPU),又称显示核心、视觉处理器,显示芯片等,是一种专门做图像和图形相关运算工作的微处理器,可以配置于个人电脑、工作站、电子游戏机和一些移动终端(如平板电脑、智能手机等)有图像相关运算需求的电子设备中。
为使本领域技术人员更好地理解和实现本发明实施例,以下对本发明一些实施例中采用的一种GPU的架构进行简要介绍。需要说明的是,所述GPU架构仅为具体示例,并不构成对本发明实施例所适用的GPU的限制。
在本发明一些实施例中,GPU可以采用统一设备体系结构(Compute Unified Device Architecture,CUDA)并行编程架构对选择的图像组合中相应组的纹理图和深度图中的像素点进行组合渲染。CUDA是一种新的硬件和软件体系结构,用于将GPU上的计算作为数据并行计算设备进行分配和管理,而无须将它们映射至图形应用程序编程接口(Application Programming Interface,API)。
通过CUDA编程时,GPU可以被视为能够并行执行大量线程的计算设备。它作为主CPU或者主机的协处理器运行,换言之,在主机上运行的应用程序中的数据并行、计算密集型的部分被下放到GPU上。
更确切地说,应用程序中多次执行单独立于不同数据的应用程序的一部分可以隔离到一个函数中,该函数在GPU设备上运行,就像许多不同的线程一样。为此,可以将此类函数编译为GPU设备的指令集,生成的程序(称为内核(Kernel))下载到GPU上。执行内核的线程批处理被组织为线程块(Thread Block)。
线程块是一批线程,可以通过一些快速共享内存有效地共享数据并同步其执行以协 调内存访问来协同。在具体实施中,可以在内核中指定同步点,其线程块中的线程将挂起,直至它们都到达同步点。
在具体实施中,一个线程块可以包含的线程的最大数量是有限的。但是,执行同一内核的相同维度和大小的块可以批处理到一个块网格(Grid of Thread Blocks)中,以便单个内核调用中可以启动的线程总数要大得多。
由上可知,采用CUDA结构,GPU上可以同时有大量线程并行地进行数据处理,因此可以极大地提高虚拟视点图像生成速度。
为使本领域技术人员更好地理解和实现,以下对组合渲染的每一步骤进行以像素点为单位进行处理的过程进行详细介绍。
在具体实施中,参照图30所示的GPU进行组合渲染的方法的流程图,步骤S293可以通过如下步骤实现:
S2931,将相应组的深度图并行地进行前向映射,映射至所述虚拟视点上。
深度图的前向映射是将原始相机(采集设备)的深度图通过坐标空间位置的转换映射到虚拟相机的位置,从而得到虚拟相机位置的深度图。具体而言,深度图的前向映射是将原始相机(采集设备)的深度图的每一个像素,按照预设的坐标映射关系,映射到虚拟视点的操作。
在具体实施中,可以在GPU上运行第一核心(Kernel)函数,将相应组的深度图中的像素并行地进行前向映射,映射至对应的虚拟视点位置上。
发明人在研究和实践过程中发现,在前向映射过程中,可能存在前背景的遮挡问题,以及映射缝隙效应,影响生成的图像质量。首先,针对前背景遮挡问题,在本发明实施例中,对于多个映射到虚拟视点同一个像素的深度值,可以采用原子操作,取像素值最大的值,得到对应的虚拟视点位置的第一深度图。之后,为改善映射缝隙效应带来的影响,可以基于所述虚拟视点位置的第一深度图,创建所述虚拟视点位置的第二深度图,对于所述第二深度图中的每一个像素并行处理,取所述第一深度图中对应像素位置周围预设区域的像素点的最大值。
在前向映射过程中,由于每个像素均可以并行处理,因此可以大大加快前向映射处理速度,提升前向映射的时效性能。
S2932,对前向映射后的深度图并行地进行后处理。
在前向映射结束后,可以对虚拟视点深度图进行后处理,具体而言,可以在GPU上运行预设的第二核心函数,对前向映射得到的第二深度图中的每一个像素,在所述像素 位置周围预设区域进行中值滤波处理。由于可以对第二深度图中的各像素并行地进行中值滤波处理,因而可以大大加快后处理速度,提升后处理的时效性能。
S2933,将相应组的纹理图并行地进行反向映射。
本步骤是将虚拟视点位置根据深度图的值计算出在原始相机纹理图中的坐标,并且通过分像素插值计算计算出相应的值。在GPU中,分像素取值可以直接按照双线性进行插值,因此在本步骤中,只需要根据每个像素计算出的坐标直接在原始相机纹理中取值就可以实现。在具体实施中,可以在GPU上运行预设的第三核心函数,将选择的相应组的纹理图中的像素并行地进行插值运算,即可生成对应的虚拟纹理图。
通过在GPU上运行第三核心函数,将选择的相应组的纹理图中的像素并行地进行插值运算,生成对应的虚拟纹理图,可以大大加快反向映射的处理速度,提升反向映射的时效性能。
S2934,将反向映射后所生成的各虚拟纹理图中的像素并行地进行融合。
在具体实施中,可以在GPU上运行第四核心函数,将反向映射后所生成的各虚拟纹理图中的同一位置的像素,并行地进行加权融合。
在GPU上运行第四核心函数,将反向映射后所生成的各虚拟纹理图中的同一位置的像素,并行地进行加权融合,可以大大地加快虚拟纹理图的融合的速度,提升图像融合的时效性能。
以下通过一个具体示例进行详细说明。
在步骤S2931中,对于深度图的前向映射,首先,可以通过GPU的第一Kernel函数来计算每个像素点的投影映射关系。
假设真实相机的图像中某一个像素点(u,v),首先通过对应相机的透视投影模型,将图像坐标(u,v)变化到相机坐标系下的坐标[X,Y,Z] T。可以理解的是,针对不同相机的透视投影模型,有不同的转换方法。
例如,对于透视投影模型:
Figure PCTCN2020124272-appb-000004
其中,[u,v,1] T是像素(u,v)的齐次坐标,[X,Y.Z] T为(u,v)对应真实物体在相机坐标系中的坐标,f x、f y分别是x,y方向的焦距,c x、c y分别是x,y方向的光心坐标。
所以,对图像中某一像素点(u,v),已知像素的深度值Z、对应相机镜头的物理参数(f x、f y、c x、c y可以从前述图像组合的参数数据中获得),可以通过上述公式(1),得到相机坐标系下的对应点的坐标[X,Y.Z] T
在图像坐标系到相机坐标系的转换之后,可以根据三维空间中的坐标变换,将物体在当前相机坐标系下的坐标,变换到虚拟视点所在相机的坐标系中。具体可以采用如下的变换公式:
Figure PCTCN2020124272-appb-000005
其中,R 12为3x3的旋转矩阵,T 12为平移向量。
假设变换之后的三维坐标为[X 1,Y 1,Z 1] T,通过之前从图像坐标系到相机坐标系的描述,应用其反变换,即可得到变换后的虚拟相机三维坐标与虚拟相机图像坐标的对应关系位置。由此,便建立了从真实视点图像到虚拟视点图像之间的点的投影关系。通过对真实视点中每一个像素点进行变换,并做坐标点的取整操作,可以得到虚拟视点图像中的投影深度图。
在建立了原始相机深度图和虚拟相机深度图的点对点的映射关系后,由于在深度图的投影过程中,原始相机的深度图中可能有多个位置映射到虚拟相机深度图中的同一位置,导致存在深度图前向映射过程中的前背景遮挡关系,针对这一问题,在本发明实施例中,可以采用原子操作,取其中最小的深度图作为映射位置的最终结果。如公式(3)所示:
Depth(u,v)=min[Depth 1-N(u,v)]        (3)
需要说明的是,深度值最小的值同时也是深度图像素值最大的值,因此,在映射得到的深度图上取像素值最大的值,可以得到对应的虚拟视点位置的第一深度图。
在具体实施中,可以在CUDA并行的环境下提供多个点映射的取最大或最小值的操作,具体可以通过调用CUDA具备的原子操作函数atomicMin或者atomicMax来进行。
在上述得到第一深度图的过程中,可能会产生缝隙效应,也即有一部分像素点可能会由于映射精度的问题没有覆盖到。针对这种问题,本发明实施例可以对得到的第一深度图进行缝隙掩盖处理。在本发明一实施例中,对所述第一深度图进行一个3*3的缝隙掩盖处理。具体掩盖处理过程如下:
先创建一个虚拟视点位置的第二深度图,然后,对于所述第二深度图中的每一个像素D(x,y),都取所述虚拟视点位置的第一深度图中的周围3*3范围内已有像素点D_old(x,y),取所述第一深度图中周围3*3范围内像素点的最大值,可以通过如下的内核函数操作实现:
D(x,y)=Max[D_old(X,y)]         (4)
可以理解的是,缝隙掩盖处理过程中周围区域的大小范围可以也可以取其他值,例如5*5。为获得更好的处理效果,具体可以根据经验进行设置。
对于步骤S2932,在具体实施中,可以对所述虚拟视点位置的第二深度图进行3*3或者5*5的中值滤波。例如,对于3*3的中值滤波,所述GPU的第二核心映射函数可以按照如下公式进行操作:
Figure PCTCN2020124272-appb-000006
在步骤S2933中,在GPU上运行的第三核心函数,将虚拟视点位置根据深度图的值计算出在原始相机纹理图中的坐标,第三核心函数可以执行步骤S2391的逆过程实现。
在步骤S2934中,对于虚拟视点位置(x,y)的像素点f(x,y),可以将所有原始相机映射得到的纹理图的相应位置的像素值根据置信度conf(x,y)进行加权。所述第四核心函数可以采用如下公式进行计算:
f(x,y)=∑conf(x,y)*f(x,y)          (6)
通过上述步骤S2931~S2934,可以得到虚拟视点图像。在具体实施中,还可以对加权融合后所得到的虚拟纹理图作进一步的处理和优化。例如,可以对加权融合后的纹理图中的各像素并行地进行空洞填补,得到所述虚拟视点对应的图像。
对于虚拟纹理图的空洞填补,在具体实施中,对于每个像素,可以采用单独的开窗方法来进行并行操作。例如,对于每一个空洞像素,可以开一个N*M大小的窗,之后,根据窗内的非空洞像素值来加权计算出此空洞像素的值。通过如上方法,虚拟视点图像的生成可以完全在GPU上进行并行计算,从而可以使得生成过程得到极大的加速。
如图31所示的空洞填补方法的示意图,对于生成的虚拟视点视图G,存在空洞区F,对空洞区F中的像素f1和像素f2分别开矩形窗a和b。之后,对于像素f1,并从矩形窗中的已有非空洞像素获得所有的像素(也可以降采样得到部分像素),并根据距离加权(或者平均加权)得到空洞区F中像素f1的值。同样地,对于像素f2,采用同样操作,可以得到像素f2的值。在具体实施中,可以在GPU上运行第五核心函数,并行化处理, 加速空洞填补的时间。
所述第五核心函数可以采用如下公式进行计算:
P(x,y)=Average[Window(x,y)]          (7)
其中,P(x,y)为空洞中某一点的值,Window(x,y)为空洞区中所有已有像素的值(或降采样值),Average为这些像素的平均(或者加权平均)值。
在本发明实施例中,除了以像素为单位并行地进行各虚拟视点位置的虚拟视点图像生成外,为进一步加快虚拟视点路径图像的生成效率,可以将虚拟视点路径中的各虚拟视点的相应组的纹理图和深度图分别输入至多个GPU中,并行地生成多个虚拟视点图像。
在具体实施中,为进一步提高处理效率,上述各个步骤可以由不同的块网格分别执行。
参照图32所示的虚拟视点图像生成系统的结构示意图,在本发明实施例中,虚拟视点图像生成系统320可以包括CPU321和GPU322,其中:
CPU 321,适于获取多角度自由视角的图像组合、所述图像组合的参数数据以及预设的虚拟视点路径数据,其中,所述图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;根据所述预设的虚拟视点路径数据及所述图像组合的参数数据,从图像组合中选择虚拟视点路径中各虚拟视点的相应组的纹理图和深度图;
GPU322,适于针对虚拟视点路径中各虚拟视点,调用相应的核心函数,将选择的图像组合中相应组的纹理图和深度图中的像素点并行地进行组合渲染,得到所述虚拟视点对应的图像。
具体地,所述GPU322适于将相应组的深度图并行地进行前向映射,映射至所述虚拟视点上;对前向映射后的深度图并行地进行后处理;将相应组的纹理图并行地进行反向映射;将反向映射后所生成的各虚拟纹理图中的像素并行地进行融合。
其中,GPU322可以采用前述虚拟视点图像生成方法中的步骤S2931~S2934等,以及空洞填补步骤生成各虚拟视点的虚拟视点图像,具体可以参见前述实施例介绍,此处不再赘述。
在具体实施中,GPU可以为一个,也可以为多个,如图32所示。
在具体应用中,GPU可以为一个独立的GPU芯片、或者为一个GPU芯片中的一个GPU核心,或者为一台GPU服务器,也可以为多个GPU芯片或多个GPU核心封装而成的GPU芯片,或者为多台GPU服务器组成的GPU集群。
相应地,可以将所述虚拟视点路径中的各虚拟视点的相应组的纹理图和深度图分别输入至多个GPU芯片、多个GPU核心,或多台GPU服务器中,并行地生成多个虚拟视点图像。例如,某虚拟视点路径对应的虚拟视点路径数据中共包含20个虚拟视点位置坐标,可以将所述20个虚拟视点位置坐标对应的数据并行地输入多个GPU芯片中,例如共有10个GPU芯片,则可以将所述20个虚拟视点位置坐标对应的数据分两批并行处理,各个GPU芯片又可以以像素为单位并行地生成对应虚拟视点位置的虚拟视点图像,因此可以极大地加快虚拟视点图像的生成速度,提升虚拟视点图像生成的时效性能。
本发明实施例还提供了一种电子设备,参照图33所示的电子设备的结构示意图,电子设备330可以包括存储器331、CPU 332和GPU333,其中,所述存储器331上存储有可在所述CPU332和GPU333上运行的计算机指令,所述CPU 332和GPU333协同运行所述计算机指令时适于执行本发明前述任一实施例所述的虚拟视点图像生成方法的步骤,具体可以参见前述实施例的详细介绍,此处不再赘述。
在具体实施中,所述电子设备可以为一台服务器,也可以为多台服务器组成的服务器集群。
以上各实施例均可以适用于直播场景,在应用过程中可以根据需要将两个或两个以上的实施例结合使用。本领域技术人员可以理解的是,以上实施例方案也不限于直播场景,本发明实施例中的方案中对于视频或图像采集、视频数据流的数据处理以及服务器的图像生成等方案也可以适用于非直播场景的播放需求,如录播、转播以及其他有低时延需求的场景。
本发明实施例中各设备或系统的具体实现方式、工作原理和具体作用及效果,可以参见对应方法实施例中的具体介绍。
本发明实施例还提供了一种计算机可读存储介质,其上存储有计算机指令,所述计算机指令运行时可以执行本发明上述任一实施例方法的步骤。
其中,所述计算机可读存储介质可以是光盘、机械硬盘、固态硬盘等各种适当的可读存储介质。所述计算机可读存储介质上存储的指令执行的方法,具体可参照上述各方法的实施例,不再赘述。
本发明实施例还提供了一种服务器,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机指令,所述处理器运行所述计算机指令时可以执行本发明上述任一实施例所述的方法的步骤。所述计算机指令运行时执行的方法具体实现可以参照上述实施例中的方法的步骤,不再赘述。
虽然本发明披露如上,但本发明并非限定于此。任何本领域技术人员,在不脱离本发明的精神和范围内,均可作各种更动与修改,因此本发明的保护范围应当以权利要求所限定的范围为准。

Claims (12)

  1. 一种虚拟视点图像生成方法,其特征在于,包括:
    获取多角度自由视角的图像组合、所述图像组合的参数数据以及预设的虚拟视点路径数据,其中,所述图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;
    根据所述预设的虚拟视点路径数据及所述图像组合的参数数据,从图像组合中选择虚拟视点路径中各虚拟视点的相应组的纹理图和深度图;
    将各虚拟视点的相应组的纹理图和深度图输入至图形处理器中,针对虚拟视点路径中各虚拟视点,以像素点为处理单位,由多个线程分别将选择的图像组合中相应组的纹理图和深度图中的像素点进行组合渲染,得到所述虚拟视点对应的图像。
  2. 根据权利要求1所述的虚拟视点图像生成方法,其特征在于,所述针对虚拟视点路径中各虚拟视点,以像素点为处理单位,由多个线程分别将选择的图像组合中相应组的纹理图和深度图中的像素点进行组合渲染,包括:
    将相应组的深度图并行地进行前向映射,映射至所述虚拟视点上;
    对前向映射后的深度图并行地进行后处理;
    将相应组的纹理图并行地进行反向映射;
    将反向映射后所生成的各虚拟纹理图中的像素并行地进行融合。
  3. 根据权利要求2所述的虚拟视点图像生成方法,其特征在于,所述将相应组的深度图并行地进行前向映射,映射至所述虚拟视点上,包括:
    在所述图形处理器上运行第一核心函数,将相应组的深度图中的像素并行地进行前向映射,映射至对应的虚拟视点位置上,其中:对于多个映射到虚拟视点同一个像素的深度值,采用原子操作,取像素值最大的值,得到对应的虚拟视点位置的第一深度图;基于所述虚拟视点位置的第一深度图,创建所述虚拟视点位置的第二深度图,对于所述第二深度图中的每一个像素并行处理,取所述第一深度图中对应像素位置周围预设区域的像素点的最大值。
  4. 根据权利要求2所述的虚拟视点图像生成方法,其特征在于,所述对前向映射后的深度图并行地进行后处理,包括:
    在所述图形处理器上运行第二核心函数,对前向映射后得到的第二深度图中的每一像素,在所述像素的位置周围预设区域进行中值滤波处理。
  5. 根据权利要求2所述的虚拟视点图像生成方法,其特征在于,所述将相应组的纹理图并行地进行反向映射,包括:
    在所述图形处理器上运行第三核心函数,将选择的相应组的纹理图中的像素并行地进行插值运算,生成对应的虚拟纹理图。
  6. 根据权利要求2所述的虚拟视点图像生成方法,其特征在于,所述将反向映射后所生成的各虚拟纹理图中的像素并行地进行融合,包括:
    在所述图形处理器上运行第四核心函数,将反向映射后所生成的各虚拟纹理图中的同一位置的像素,并行地进行加权融合。
  7. 根据权利要求2至6任一项所述的虚拟视点图像生成方法,其特征在于,在将反向映射后所生成的各虚拟纹理图中的像素并行地进行融合之后,还包括:
    对加权融合后的纹理图中的各像素并行地进行空洞填补,得到所述虚拟视点对应的图像。
  8. 根据权利要求1所述的虚拟视点图像生成方法,其特征在于,所述将各虚拟视点的相应组的纹理图和深度图输入至图形处理器中,包括:
    将所述虚拟视点路径中的各虚拟视点的相应组的纹理图和深度图分别输入至多个图形处理器中,各图形处理器并行处理,生成多个虚拟视点图像。
  9. 一种虚拟视点图像生成系统,其特征在于,包括:
    中央处理器,适于获取多角度自由视角的图像组合、所述图像组合的参数数据以及预设的虚拟视点路径数据,其中,所述图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;根据所述预设的虚拟视点路径数据及所述图像组合的参数数据,从图像组合中选择虚拟视点路径中各虚拟视点的相应组的纹理图和深度图;
    图形处理器,适于针对虚拟视点路径中各虚拟视点,以像素点为处理单位,由多个线程分别将选择的图像组合中相应组的纹理图和深度图中的像素点进行组合渲染,得到所述虚拟视点对应的图像。
  10. 根据权利要求9所述的虚拟视点图像生成系统,其特征在于,所述图形处理器适于将相应组的深度图并行地进行前向映射,映射至所述虚拟视点上;对前向映射后的深度图并行地进行后处理;将相应组的纹理图并行地进行反向映射;将反向映射后所生成的各虚拟纹理图中的像素并行地进行融合。
  11. 一种电子设备,包括:存储器、中央处理器和图形处理器,所述存储器上存储有可在所述中央处理器和图形处理器上运行的计算机指令,其特征在于,所述中央处理器和所述图形处理器协同运行所述计算机指令时执行权利要求1至8任一项所述方法的步骤。
  12. 一种计算机可读存储介质,其上存储有计算机指令,其特征在于,所述计算机指令运行时执行权利要求1至8任一项所述方法的步骤。
PCT/CN2020/124272 2019-10-28 2020-10-28 虚拟视点图像生成方法、系统、电子设备及存储介质 WO2021083174A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911032857.0 2019-10-28
CN201911032857.0A CN112738495B (zh) 2019-10-28 2019-10-28 虚拟视点图像生成方法、系统、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021083174A1 true WO2021083174A1 (zh) 2021-05-06

Family

ID=75588819

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/124272 WO2021083174A1 (zh) 2019-10-28 2020-10-28 虚拟视点图像生成方法、系统、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN112738495B (zh)
WO (1) WO2021083174A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113992679A (zh) * 2021-10-26 2022-01-28 广域铭岛数字科技有限公司 一种汽车图像显示方法、系统及设备
CN114422819A (zh) * 2022-01-25 2022-04-29 纵深视觉科技(南京)有限责任公司 一种视频显示方法、装置、设备、系统及介质
CN118138741A (zh) * 2024-05-08 2024-06-04 四川物通科技有限公司 基于元宇宙的裸眼3d数据通信方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115761096A (zh) * 2021-09-03 2023-03-07 华为云计算技术有限公司 渲染方法、远端装置、计算设备集群、终端装置及设备
CN114416365B (zh) * 2022-01-18 2022-09-27 北京拙河科技有限公司 基于gpu融合处理的超清画质图像数据处理方法与装置
CN114283195B (zh) * 2022-03-03 2022-07-26 荣耀终端有限公司 生成动态图像的方法、电子设备及可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930593A (zh) * 2012-09-28 2013-02-13 上海大学 一种双目系统中基于gpu的实时绘制方法
US20140293010A1 (en) * 2009-11-18 2014-10-02 Quang H. Nguyen System for executing 3d propagation for depth image-based rendering
CN104822059A (zh) * 2015-04-23 2015-08-05 东南大学 一种基于gpu加速的虚拟视点合成方法
CN106157356A (zh) * 2016-07-05 2016-11-23 北京邮电大学 一种图像处理方法及装置
CN107509067A (zh) * 2016-12-28 2017-12-22 浙江工业大学 一种高速高质量的自由视点图像合成方法
CN111385554A (zh) * 2020-03-28 2020-07-07 浙江工业大学 一种自由视点视频的高图像质量虚拟视点绘制方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7142209B2 (en) * 2004-08-03 2006-11-28 Microsoft Corporation Real-time rendering system and process for interactive viewpoint video that was generated using overlapping images of a scene captured from viewpoints forming a grid
CN103581651B (zh) * 2013-10-28 2015-04-29 西安交通大学 一种用于车载多目摄像机环视系统的虚拟视点合成方法
KR20160135660A (ko) * 2015-05-18 2016-11-28 한국전자통신연구원 헤드 마운트 디스플레이를 위한 입체 영상을 제공하는 방법 및 장치
CN111669564B (zh) * 2019-03-07 2022-07-26 阿里巴巴集团控股有限公司 图像重建方法、系统、设备及计算机可读存储介质
CN111667438B (zh) * 2019-03-07 2023-05-26 阿里巴巴集团控股有限公司 视频重建方法、系统、设备及计算机可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140293010A1 (en) * 2009-11-18 2014-10-02 Quang H. Nguyen System for executing 3d propagation for depth image-based rendering
CN102930593A (zh) * 2012-09-28 2013-02-13 上海大学 一种双目系统中基于gpu的实时绘制方法
CN104822059A (zh) * 2015-04-23 2015-08-05 东南大学 一种基于gpu加速的虚拟视点合成方法
CN106157356A (zh) * 2016-07-05 2016-11-23 北京邮电大学 一种图像处理方法及装置
CN107509067A (zh) * 2016-12-28 2017-12-22 浙江工业大学 一种高速高质量的自由视点图像合成方法
CN111385554A (zh) * 2020-03-28 2020-07-07 浙江工业大学 一种自由视点视频的高图像质量虚拟视点绘制方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113992679A (zh) * 2021-10-26 2022-01-28 广域铭岛数字科技有限公司 一种汽车图像显示方法、系统及设备
CN113992679B (zh) * 2021-10-26 2023-10-31 广域铭岛数字科技有限公司 一种汽车图像显示方法、系统及设备
CN114422819A (zh) * 2022-01-25 2022-04-29 纵深视觉科技(南京)有限责任公司 一种视频显示方法、装置、设备、系统及介质
CN118138741A (zh) * 2024-05-08 2024-06-04 四川物通科技有限公司 基于元宇宙的裸眼3d数据通信方法

Also Published As

Publication number Publication date
CN112738495A (zh) 2021-04-30
CN112738495B (zh) 2023-03-28

Similar Documents

Publication Publication Date Title
WO2021083176A1 (zh) 数据交互方法及系统、交互终端、可读存储介质
WO2021083178A1 (zh) 数据处理方法及系统、服务器和存储介质
WO2021083174A1 (zh) 虚拟视点图像生成方法、系统、电子设备及存储介质
KR102241082B1 (ko) 복수의 뷰포인트들에 대한 메타데이터를 송수신하는 방법 및 장치
JP6410918B2 (ja) パノラマ映像コンテンツの再生に使用するシステム及び方法
US20180035134A1 (en) Encoding and decoding virtual reality video
US11941748B2 (en) Lightweight view dependent rendering system for mobile devices
KR102157658B1 (ko) 복수의 뷰포인트들에 대한 메타데이터를 송수신하는 방법 및 장치
TWI848978B (zh) 影像合成
WO2022022501A1 (zh) 视频处理方法、装置、电子设备及存储介质
KR20200065087A (ko) 다중 뷰포인트 기반 360 비디오 처리 방법 및 그 장치
CN111669518A (zh) 多角度自由视角交互方法及装置、介质、终端、设备
WO2021083175A1 (zh) 数据处理方法、设备、系统、可读存储介质及服务器
CN111667438B (zh) 视频重建方法、系统、设备及计算机可读存储介质
WO2022022348A1 (zh) 视频压缩方法、解压方法、装置、电子设备及存储介质
CN111669561B (zh) 多角度自由视角图像数据处理方法及装置、介质、设备
CN111669567A (zh) 多角度自由视角视频数据生成方法及装置、介质、服务器
US11790601B2 (en) Minimal volumetric 3D on demand for efficient 5G transmission
CN112738009B (zh) 数据同步方法、设备、同步系统、介质和服务器
CN111669570B (zh) 多角度自由视角视频数据处理方法及装置、介质、设备
WO2021083177A1 (zh) 深度图生成方法、计算节点及计算节点集群、存储介质
CN111669569A (zh) 一种视频生成方法及装置、介质、终端
CN111669568A (zh) 多角度自由视角交互方法及装置、介质、终端、设备
CN111669604A (zh) 一种采集设备设置方法及装置、终端、采集系统、设备
TWI817273B (zh) 即時多視像視訊轉換方法和系統

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20881180

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20881180

Country of ref document: EP

Kind code of ref document: A1