WO2018095317A1 - 视频数据处理方法、装置及设备 - Google Patents

视频数据处理方法、装置及设备 Download PDF

Info

Publication number
WO2018095317A1
WO2018095317A1 PCT/CN2017/112217 CN2017112217W WO2018095317A1 WO 2018095317 A1 WO2018095317 A1 WO 2018095317A1 CN 2017112217 W CN2017112217 W CN 2017112217W WO 2018095317 A1 WO2018095317 A1 WO 2018095317A1
Authority
WO
WIPO (PCT)
Prior art keywords
video data
image
terminal
left view
view
Prior art date
Application number
PCT/CN2017/112217
Other languages
English (en)
French (fr)
Inventor
叶在伟
曾伟
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to JP2019528723A priority Critical patent/JP2020513704A/ja
Priority to EP17874884.4A priority patent/EP3547672A4/en
Publication of WO2018095317A1 publication Critical patent/WO2018095317A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/152Multipoint control units therefor

Definitions

  • the present disclosure relates to an Augmented Reality (AR) technology, for example, to a video data processing method, apparatus, and device.
  • AR Augmented Reality
  • the embodiment provides a video data processing method, device, and device, which implements augmented reality 3D video call and improves user experience.
  • the embodiment provides a video data processing method, including: acquiring first video data and second video data during a video call between a first terminal and a second terminal, where the first video data is Included in the first left view and the first right view of the target object corresponding to the first terminal, where the second video data includes at least a second left view and a second view of the real scene in which the second terminal is currently located a right view; merging a first image of the target object in the first left view with the second left view, and combining a second image of the target object in the first right view with the second The right view is fused; and the three-dimensional video data is generated according to the fused second left view and the fused second right view.
  • the embodiment provides a video data processing apparatus, including: an obtaining module, a merging module, and a generating module, where the obtaining module is configured to obtain a video call between the first terminal and the second terminal. a first video data and a second video data, where the first video data includes at least a first left view and a first right view of the target object corresponding to the first terminal, where the second video data includes at least a second left view and a second right view of the real scene in which the second terminal is currently located; the fusion module is configured to set the first image of the target object in the first left view to the second left Fusionting the view, and merging the second image of the target object in the first right view with the second right view; the generating module is configured to be based on the merged second left view and the merged second The right view generates 3D video data.
  • the embodiment provides a server, including: a transceiver, and a processor, wherein the transceiver is configured to receive first video data from the first terminal, and receive a second video from the second terminal.
  • Data wherein the first video data includes at least a first left view and a first right view of the target object corresponding to the first terminal, where the second video data includes at least the current location of the second terminal a second left view and a second right view of the real scene; further configured to transmit the three-dimensional video data to the second terminal; the processor configured to set the first image of the target object in the first left view Blending with the second left view, and merging the second image of the target object in the first right view with the second right view; according to the fused second left view and the fused second right View, generate 3D video data.
  • the embodiment provides a terminal, including: a receiver, a stereo camera, a processor, and a display, where the receiver is configured to receive video data from a peer end, where the video data of the peer end is At least a first left view and a first right view of the target object corresponding to the opposite end; the stereo camera is configured to synchronously collect the second left view and the second right view of the current reality scene; The processor is configured to fuse the first image of the target object in the first left view with the second left view, and to image the second image of the target object in the first right view The second right view is fused; generating three-dimensional video data according to the fused second left view and the fused second right view; and the display is configured to display the three-dimensional video data.
  • the embodiment further provides a computer readable storage medium storing computer executable instructions for performing the method of any of the above.
  • the video data processing method, device, and device provided by the embodiment can integrate the image data of the target object of the first terminal into the image data of the current real scene of the second terminal, and enhance the real information during the video call of the second terminal.
  • the user of the second terminal provides augmented reality 3D video call, allowing the user to feel from the senses
  • the target object is in the real environment in which it is currently located, thereby improving the user experience.
  • Embodiment 1 is a schematic flow chart of a video data processing method in Embodiment 1;
  • FIG. 2 is a schematic flowchart of a video data processing method in Embodiment 2;
  • Embodiment 3 is a schematic structural diagram of a video data processing apparatus in Embodiment 3;
  • FIG. 4 is a schematic structural diagram of a server in Embodiment 4.
  • FIG. 5 is a schematic structural diagram of a terminal in Embodiment 5.
  • This embodiment provides a video data processing method.
  • the method can be applied to multiple video communication services where video data processing is required, and may be a video call application product, a social product, or an intelligent terminal. Office products, etc., can also be video data processing in a video service server.
  • the user can use the smart office product on the terminal to implement augmented reality video conversation with another user, and can feel from the sense that another user has come to the conference room where he or she is, and enhances the call. Experience.
  • the video data processing method includes: S110-S130.
  • the first video data includes at least a first left view and a first right view of the target object corresponding to the first terminal, where the second video data includes at least a second left view and a second view of the real scene where the second terminal is currently located. right elevation.
  • the user wants to make the target object appear in the real environment where the user is currently in the video call, and obtain a more realistic video call experience, he can select the augmented reality video call service. .
  • the first video data including the first left view and the first right view of the target object corresponding to the first terminal is obtained, and the at least second is obtained.
  • the image data of the target object corresponding to the first terminal can be
  • the image data of the real scene in which the second terminal is currently located is integrated into the image data of the second terminal side user to enhance the video call experience of the second terminal side user.
  • the first video data may be one frame of data including the target object, such as the ith frame data.
  • the first video data includes the first left view and the first object of the target object collected at the time i. a right view; the first video data may also be multi-frame data including the target object, such as the jth frame to the j+2 frame data.
  • the first video data includes the time j to the time j+ 2 All first left and first right views of the acquired target object.
  • the second video data may be one frame of data including a real scene, or may be multi-frame data including a real scene.
  • the second video data is synchronously corresponding to the first video data.
  • the second video data also includes the i-th of the real scene.
  • Frame data The ith frame data of the first video data or the second video data may be a three-dimensional image from which a left view and a right view of the target object or the real scene, or the first video data or the second video, may be obtained.
  • the ith frame data of the data can also be directly two two-dimensional images, that is, the left view and the right view directly of the target object or the real scene.
  • the first video data is a video data having a duration of 4 seconds and a frame rate of 25 frames/second, then 25 times 4 is obtained, for a total of 100 first left views and 100 first right views.
  • the second video data also corresponds to a video data having a duration of 4 seconds and a frame rate of 25 frames per second.
  • each of the first left views corresponds to a second left view
  • each of the first right views corresponds to a second right view.
  • the first video data and the second video data may be respectively collected by using a binocular camera, and the target objects may be obtained at the same time by two cameras on the same plane and having the same focal length and the acquisition direction.
  • Two images with parallax of the real scene that is, the left view and the right view, can obtain the three-dimensional data of the target object or the real scene by using the two images with parallax.
  • stereo cameras such as a quad camera, can be used to collect video data of a target object or a real scene.
  • the scene in which the target object is located may be a simple background, such as pure white, pure blue or pure green, or may be a complex background. Such as the more chaotic roads.
  • the target object in order to reduce the complexity of the extraction algorithm and to extract an image of the real target object from the left and right views containing the target object, the target object should be in a relatively simple background such as a single color.
  • a background with a large difference in color from the target object is used, for example, because blue and green are far from the skin color of the person, when the target object is a person, a blue background or a green background may be selected.
  • the S110 further includes: receiving the first video data from the first terminal, and receiving the second video data from the second terminal; or receiving the first video data from the first terminal, and simultaneously collecting the current location The second left view and the second right view of the real scene.
  • the first video data and the second video may be obtained by receiving the first video data from the first terminal and receiving the second video data from the second terminal.
  • the first image of the target object in the first left view is merged with the second left view
  • the second image of the target object in the first right view is merged with the second right view
  • the first left view and the first right view of the target object corresponding to the first terminal are obtained, and the second scene of the real scene where the second terminal is currently located is obtained.
  • the first image of the target object in the first left view and the second left view are merged, and the merged target object corresponding to the first terminal and the current location of the second terminal are obtained.
  • a second left view of the real scene, and the second image of the target object in the first right view is merged with the second right view, and the merged target object corresponding to the first terminal and the second terminal are currently located
  • the second right view of the realistic scene is merged.
  • the merged left view may include a person standing in a tree next to.
  • the pixel-based At least one of a common image vision algorithm such as an image fusion algorithm, a wavelet transform-based multi-resolution image fusion algorithm, a pyramid image fusion algorithm, and a Poisson-based image synthesis algorithm may be implemented by a person skilled in the art. It is determined according to the actual situation.
  • the method further includes: extracting the first image from the first left view according to a preset rule, and extracting the second image from the first right view.
  • the first left view is also extracted according to a preset rule. Image and extract the second image from the first right view.
  • the target object model stored in advance may be used to perform target recognition on the first left view to extract the first image, and perform target recognition on the first right view to extract the second image; or may use pre-stored a background model, filtering background data in the first left view to obtain a first image, and filtering background data in the first right view to obtain a second image; of course, other methods, such as a local Poisson map algorithm and Bayeux, may also be used.
  • the Stewart algorithm or the like to obtain the first image and the second image is determined by a person skilled in the art in a specific implementation process.
  • the pre-stored target object model may be a machine learning algorithm that models the sample, pre-generated, or the user manually selects the target area and generates it in real time through a machine vision algorithm.
  • the pre-stored background model may be generated according to preset background color information, or the user may manually calibrate the background area and generate it in real time through a machine vision algorithm.
  • the pre-stored target object model or background model is also obtained in other ways.
  • a sample object such as a person or a car
  • a machine learning algorithm to obtain a related feature library of the target object
  • a visual model of the target object is pre-established, and then the target object in the first video data is identified and matched.
  • the background information may be filtered to obtain an image of the target object Data; or, when the background is significantly different from the foreground target object, the background layer filtering method is used to transparently process the background to obtain image data of the target object; or, a Gaussian background model can be established for the background, and then the matching is recognized. Background data, obtaining image data of the target object.
  • the noise may be external noise caused by light or dust particles in the external environment, or may be caused by the internal circuit of the video capture module or the material of the image sensing module. Internal noise, the presence of such noise can make the object in the image blurred or even indistinguishable, which will result in inaccurate target data.
  • the first left view and the first right view also need to be performed.
  • the noise processing uses the first left view after denoising and the first right view after denoising to extract the first image and the second image.
  • the denoising method used in the denoising process may be a spatial domain denoising method such as a linear filtering method, a median filtering method or a Wiener filtering method, or may be a Fourier transform and a wavelet transform.
  • Equal-frequency domain denoising methods can also be other types of denoising methods such as color histogram equalization.
  • corresponding three-dimensional video data is generated according to the merged second left view and the merged second right view.
  • the three-dimensional imaging technology can be used to generate three-dimensional video data including the target object and the real scene.
  • the method further includes: displaying the three-dimensional video data, or transmitting the three-dimensional video data to the second terminal.
  • the second terminal may directly display the foregoing three-dimensional video data; when the method is applied to the server, the server needs to send the three-dimensional video data to the second terminal, and second After obtaining the three-dimensional video data, the terminal displays the three-dimensional video data.
  • the method for generating the three-dimensional video data correspondingly, the method for viewing the three-dimensional video data is different, for example, using a time division method based 3D technology to generate three-dimensional video data, then the user can use Active shutter 3D glasses to watch.
  • the technical solution provided in this embodiment first obtains the first video data and the second video data during the video call between the first terminal and the second terminal, where the first video data includes at least A first left view and a first right view of the target object corresponding to the first terminal, where the second video data includes at least a second left view and a second right view of the real scene where the second terminal is currently located.
  • the first image of the target object in the first left view is then fused with the second left view, and the second image of the target object in the first right view is merged with the second right view.
  • the video data processing method provided by the embodiment can integrate the target object corresponding to the first terminal into the real scene where the second terminal is located during the video call, so as to enhance the actual information of the second terminal video call.
  • the target object is in the real environment in which it is currently located, and thus, can provide a good user experience for the user.
  • the embodiment provides a video communication system, where the system includes: a terminal 1 and a terminal 2.
  • This embodiment provides a video data processing method, which may be applied to the video communication system.
  • FIG. 2 is a schematic flowchart of a video data processing method in Embodiment 2.
  • the video data processing method includes: S201-S206.
  • the terminal acquires the first video data, and sends the first video data to the terminal 2;
  • the first video data includes at least a first left view and a first right view of the user A.
  • the terminal 1 can capture the user A through a binocular 3D camera to obtain the first left view and the first right view of the user A.
  • the user A may be in a monochrome background, such as white, green or blue.
  • a monochrome background such as white, green or blue.
  • the target to be collected is human, due to green and blue It has a big difference with people's skin color. Therefore, you can choose a blue background or a green background, which is similar to the blue screen or green screen in movie shooting.
  • the terminal 2 receives the first video data, and acquires the second video data;
  • the second video data is a second left view and a second right view of the conference room collected by the user B perspective.
  • the terminal 2 can be a wearable helmet, the outer side of the helmet is provided with a binocular 3D camera, and the user B can wear the terminal 2 to the head, so that the conference room view seen by the user B can be obtained. Two left view and second right view.
  • the terminal 2 extracts the first image from the first left view, and extracts the second image from the first right view;
  • the terminal 2 fuses the first image of the target object in the first left view with the second left view, and fuses the second image of the target object in the first right view with the second right view;
  • the three-dimensional information of the user A can be incorporated into the seat opposite the user B in the conference room.
  • the terminal 2 In S205, the terminal 2 generates corresponding three-dimensional video data according to the merged second left view and the merged second right view.
  • the terminal 2 can generate corresponding three-dimensional video data for display to the user B through the three-dimensional imaging technology.
  • the terminal 2 displays the three-dimensional video data.
  • the terminal 2 can display the 3D video data to the user B for viewing.
  • user B can watch through user two that user A is in the same conference room as himself and sits in the seat opposite himself.
  • the user B can integrate the image of the user A that needs to make a call into the image of the real scene where the user B is located through the terminal 2, and display the image through the three-dimensional imaging technology.
  • the user B can feel the user A in his own real environment from the senses, and improve the video call experience of user B.
  • This embodiment provides a video data processing apparatus.
  • 3 is a schematic structural diagram of a video data processing apparatus in the third embodiment.
  • the video data processing apparatus 30 includes: an obtaining module 301, a fusion module 302, and a generating module 303.
  • the first video data and the second video data are obtained during the video call between the first terminal and the second terminal, where the first video data includes at least a first left view and a first right of the target object corresponding to the first terminal.
  • the second video data includes at least a second left view and a second right view of the real scene in which the second terminal is currently located;
  • the fusion module 302 is configured to set the first image and the second left of the target object in the first left view The view is merged, and the second image of the target object in the first right view is merged with the second right view;
  • the generating module 303 is configured to generate a corresponding three-dimensional image according to the merged second left view and the merged second right view.
  • the video data processing apparatus further includes an extraction module configured to extract the first image from the first left view and extract the second image from the first right view according to a preset rule.
  • the extraction module is further configured to utilize the pre-stored target object model for the first left view The figure performs target recognition to extract the first image, and performs target recognition on the first right view to extract the second image.
  • the extraction module is further configured to: use the background model stored in advance, filter the background data in the first left view to obtain the first image, and filter the background data in the first right view to obtain the second image.
  • the obtaining module is further configured to receive the first video data from the first terminal, and receive the second video data from the second terminal; correspondingly, the video data processing device further includes: a sending module, configured to The second terminal transmits the three-dimensional video data.
  • the obtaining module is further configured to receive the first video data from the first terminal, and synchronously collect the second left view and the second right view of the currently located real scene; correspondingly, the video data processing apparatus further includes : Display module, set to display 3D video data.
  • the above obtaining module, the merging module, the generating module, the extracting module, and the sending module may be implemented by a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MPU, Micro Processor Unit), Digital Signal Processor (DSP) or Field Programmable Gate Array (FPGA).
  • CPU central processing unit
  • GPU graphics processing unit
  • MPU Micro Processor Unit
  • DSP Digital Signal Processor
  • FPGA Field Programmable Gate Array
  • This embodiment provides a server.
  • 4 is a schematic structural diagram of a server in the fourth embodiment.
  • the server 40 includes: a transceiver 401 and a processor 402.
  • the transceiver 401 is configured to receive first video data from the first terminal.
  • the first video data includes at least a first left view and a first right view of the target object corresponding to the first terminal
  • the second video data includes at least the second terminal a second left view and a second right view of the currently located reality scene
  • the processor 402 configured to set the first image and the second left of the target object in the first left view The view is merged, and the second image of the target object in the first right view is merged with the second right view; corresponding three-dimensional video data is generated according to the merged second left view and the merged second right view.
  • the processor 402 is further configured to extract the first image from the first left view and extract the second image from the first right view according to a preset rule.
  • the processor 402 is further configured to perform target recognition on the first left view to extract the first image by using the target object model stored in advance, and perform target recognition on the first right view to extract the second image.
  • the processor 402 is further configured to: use the background model stored in advance, filter the background data in the first left view to obtain the first image, and filter the background data in the first right view to obtain the second image.
  • the embodiment further provides a computer readable storage medium, which may be a server configured in the above embodiment, storing computer executable instructions for executing the video according to any of the above embodiments. Data processing method.
  • the server in this embodiment further includes a memory 403, where the memory 403 is configured to store data, such as a background model and a target object model, and logic instructions.
  • the processor 402 can call the logic instructions in the memory 403 to perform the video data processing method of the above embodiment.
  • the logic instructions in the above-described memories may be implemented in the form of software functional units and sold or used as stand-alone products, and may be stored in a computer readable storage medium.
  • the storage medium may be a non-transitory storage medium, including: a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
  • ROM read-only memory
  • RAM random access memory
  • magnetic disk or an optical disk.
  • optical disk A medium that can store program code or a transient storage medium.
  • the terminal 50 includes: a receiver 501, a stereo camera 502, a processor 503, and a display 504.
  • the receiver 501 is configured to receive from the pair.
  • the video data of the end wherein the video data of the peer end includes at least a first left view and a first right view of the target object corresponding to the opposite end;
  • the stereo camera 502 is configured to synchronously collect the second left view and the first view of the current reality scene a second right view;
  • the processor 503 is configured to fuse the first image of the target object in the first left view with the second left view, and fuse the second image of the target object in the first right view with the second right view;
  • the merged second left view and the merged second right view generate corresponding three-dimensional video data;
  • the display 504 is configured to display three-dimensional video data.
  • the processor 503 is further configured to extract the first image from the first left view and extract the second image from the first right view according to a preset rule.
  • the processor 503 is further configured to perform target recognition on the first left view to extract the first image and perform target recognition on the first right view to extract the second image by using the target object model stored in advance.
  • the processor 503 is further configured to: use the background model stored in advance, filter the background data in the first left view to obtain the first image, and filter the background data in the first right view to obtain the second image.
  • the embodiment further provides a computer readable storage medium, which may be configured in the terminal of the above embodiment, and stores computer executable instructions for performing the method described in any of the above embodiments.
  • Video data processing method may be configured in the terminal of the above embodiment, and stores computer executable instructions for performing the method described in any of the above embodiments.
  • the terminal further includes a memory 505, where the memory 505 is configured to store data such as a background model and a target object model, and logic instructions.
  • the processor 503 can call the logic instructions in the memory 505 to perform the video data processing method of the above embodiment.
  • the logic instructions in the above-described memories may be implemented in the form of software functional units and sold or used as stand-alone products, and may be stored in a computer readable storage medium.
  • the storage medium may be a non-transitory storage medium, including: a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
  • ROM read-only memory
  • RAM random access memory
  • magnetic disk or an optical disk.
  • optical disk A medium that can store program code or a transient storage medium.
  • the present embodiments can be provided as a method, system, or computer program product.
  • the present embodiments can take the form of a hardware embodiment, a software embodiment or an embodiment in combination with software and hardware.
  • the present embodiments can take the form of a computer program product embodied on one or more computer usable storage media (including but not limited to disk storage and optical storage, etc.) in which computer usable program code is embodied.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • the video data processing method, device and device provided by the present disclosure can integrate the image data of the first terminal target object into the image data of the current real scene of the second terminal, and enhance the real information during the video call of the second terminal to the second
  • the user of the terminal provides augmented reality 3D video call, so that the user feels that the target object is in the real environment where he is currently in the sense, thereby improving the user experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Processing Or Creating Images (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

一种视频数据处理方法,包括:在第一终端与第二终端进行视频通话过程中,获得第一视频数据和第二视频数据,其中,所述第一视频数据中至少包括所述第一终端对应的目标物体的第一左视图和第一右视图,所述第二视频数据中至少包括所述第二终端当前所处现实场景的第二左视图和第二右视图;将所述第一左视图中所述目标物体的第一图像与所述第二左视图融合,并将所述第一右视图中所述目标物体的第二图像与所述第二右视图融合;以及根据融合后的第二左视图和融合后的第二右视图,生成三维视频数据。

Description

视频数据处理方法、装置及设备 技术领域
本公开涉及增强现实(AR,Augmented Reality)技术,例如涉及一种视频数据处理方法、装置及设备。
背景技术
近年来,随着通信网络技术飞速发展,移动互联网技术日新月异,传统的音频交流已经不能满足人们的沟通交流的需要。越来越多的人们希望通过视频通信的方式与对方来进行交流和沟通,因此,很多终端上均提供了视频通信的功能。如今,视频通信在人们的生活和工作中扮演着非常重要的角色。
然而,在视频通话过程中,在本端仅仅只能显示对端摄像头拍摄的二维影像,无法将对端所对应地目标物体融入本端所处的现实场景,仅仅只能让用户可以看到对方的二维形象,从感官上而言,与用户进行视频通信的对方还是位于遥远的他方,并不能让人们感受到对方来到了自己所处的现实环境中,从而,本端视频通话视觉信息缺乏现实感,使得用户从感官上无法真切地体验到对方与自己在进行面对面的交流和沟通,用户体验较差。
发明内容
有鉴于此,本实施例提供一种视频数据处理方法、装置及设备,实现了增强现实的三维视频通话,提高了用户体验。
第一方面,本实施例提供一种视频数据处理方法,包括:在第一终端与第二终端进行视频通话过程中,获得第一视频数据和第二视频数据,其中,所述第一视频数据中至少包括所述第一终端对应的目标物体的第一左视图和第一右视图,所述第二视频数据中至少包括所述第二终端当前所处现实场景的第二左视图和第二右视图;将所述第一左视图中所述目标物体的第一图像与所述第二左视图融合,并将所述第一右视图中所述目标物体的第二图像与所述第二右视图融合;以及根据融合后的第二左视图和融合后的第二右视图,生成三维视频数据。
第二方面,本实施例提供一种视频数据处理装置,包括:获得模块、融合模块以及生成模块,其中,所述获得模块,设置为在第一终端与第二终端进行视频通话过程中,获得第一视频数据和第二视频数据,其中,所述第一视频数据中至少包括所述第一终端对应的目标物体的第一左视图和第一右视图,所述第二视频数据中至少包括所述第二终端当前所处现实场景的第二左视图和第二右视图;所述融合模块,设置为将所述第一左视图中所述目标物体的第一图像与所述第二左视图融合,并将所述第一右视图中所述目标物体的第二图像与所述第二右视图融合;所述生成模块,设置为根据融合后的第二左视图和融合后的第二右视图,生成三维视频数据。
第三方面,本实施例提供一种服务器,包括:收发器以及处理器,其中,所述收发器,设置为接收来自第一终端的第一视频数据,并接收来自第二终端的第二视频数据,其中,所述第一视频数据中至少包括所述第一终端对应的目标物体的第一左视图和第一右视图,所述第二视频数据中至少包括所述第二终端当前所处现实场景的第二左视图和第二右视图;还设置为向所述第二终端发送三维视频数据;所述处理器,设置为将所述第一左视图中所述目标物体的第一图像与所述第二左视图融合,并将所述第一右视图中所述目标物体的第二图像与所述第二右视图融合;根据融合后的第二左视图和融合后的第二右视图,生成三维视频数据。
第四方面,本实施例提供一种终端,包括:接收器、立体摄像头、处理器以及显示器,其中,所述接收器,设置为接收来自对端的视频数据,其中,所述对端的视频数据中至少包括所述对端对应的目标物体的第一左视图和第一右视图;所述立体摄像头,设置为同步采集当前所处现实场景的所述第二左视图和所述第二右视图;所述处理器,设置为将所述第一左视图中所述目标物体的第一图像与所述第二左视图融合,并将所述第一右视图中所述目标物体的第二图像与所述第二右视图融合;根据融合后的第二左视图和融合后的第二右视图,生成三维视频数据;所述显示器,设置为显示所述三维视频数据。
第五方面,本实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行上述任一项所述的方法。本实施例提供的视频数据处理方法、装置及设备能够将第一终端目标物体的图像数据融入第二终端当前的现实场景的图像数据中,增强第二终端视频通话过程中的现实信息,给第二终端的用户提供增强现实的三维视频通话,让用户从感官上感 受到目标物体就在当前自己所处的现实环境中,进而,提高了用户体验。
附图概述
图1为实施例一中的视频数据处理方法的流程示意图;
图2为实施例二中的视频数据处理方法的流程示意图;
图3为实施例三中的视频数据处理装置的结构示意图;
图4为实施例四中的服务器的结构示意图;
图5为实施例五中的终端的结构示意图。
具体实施方式
下面将结合本实施例中的附图,对本实施例中的技术方案进行清楚完整地描述。
实施例一
本实施例提供一种视频数据处理方法,在实际应用中,该方法可以应用于多种视频通信业务中需要视频数据处理的场合,可以是终端中的视频通话应用类产品、社交类产品以及智能办公类产品等,也可以是视频业务服务器中的视频数据处理。示例性地,用户可以使用终端上的智能办公类产品来实现与另一用户进行增强现实的视频通话,可以从感官上感受到另一用户来到了自己所处的会议室里,增强自己的通话体验。
那么,图1为实施例一中的视频数据处理方法的流程示意图,参见图1所示,该视频数据处理方法包括:S110-S130。
在S110中,在第一终端与第二终端进行视频通话过程中,获得第一视频数据和第二视频数据;
其中,第一视频数据中至少包括第一终端对应的目标物体的第一左视图和第一右视图,第二视频数据中至少包括第二终端当前所处现实场景的第二左视图和第二右视图。
这里,当用户想要在视频通话过程中,从感官上让目标物体出现在用户当前所处的现实环境中,获得更具有现实感的视频通话体验时,就可以选择进行增强现实的视频通话业务。这样,在第一终端与第二终端进行视频通话过程中,就会获得至少包括第一终端对应的目标物体的第一左视图和第一右视图的第一视频数据,并且获得至少包括第二终端当前所处现实场景的第二左视图和第二右视图的第二视频数据。从而,就可以将第一终端对应的目标物体的图像数据 融入第二终端当前所处的现实场景的图像数据中,来增强第二终端侧用户的视频通话体验。
可选地,第一视频数据可以是包含有目标物体的一帧数据,如第i帧数据,此时,该第一视频数据中包含有在时刻i采集的目标物体的第一左视图和第一右视图;第一视频数据也可以是包含有目标物体的多帧数据,如第j帧至第j+2帧数据,此时,该第一视频数据中包含有在时刻j到时刻j+2采集的目标物体的所有的第一左视图和第一右视图。同样地,第二视频数据可以是包含有现实场景的一帧数据,也可以是包含有现实场景的多帧数据。而且,第二视频数据与第一视频数据是同步对应的,因此,当第一视频数据中包含的是目标物体的第i帧数据时,第二视频数据中也包含的是现实场景的第i帧数据。第一视频数据或第二视频数据的第i帧数据可以是一幅三维图像,从该三维图像中可以获得目标物体或现实场景的左视图与右视图,或者,第一视频数据或第二视频数据的第i帧数据也可以直接是两幅二维图像,即直接就是目标物体或现实场景的左视图与右视图。这里,假设第一视频数据为一个时长为4秒,帧率为25帧/秒的视频数据,那么,就会获得25乘以4,共计100个第一左视图和100个第一右视图。从而,第二视频数据也会对应为一个时长为4秒,帧率为25帧/秒的视频数据。而且,根据时间戳,每一个第一左视图均会对应于一个第二左视图,每一个第一右视图均会对应于一个第二右视图。
在实际应用中,上述第一视频数据与第二视频数据可以分别使用一个双目摄像头来采集,通过处于同一平面且具有相同焦距以及采集方向的左右两个摄像头,能够在同一时刻获得目标物体或者现实场景的两张具有视差的图像,即左视图以及右视图,利用这两幅具有视差的图像能够获得目标物体或者现实场景的三维数据。当然,还可以是其他类型的立体摄像头,如四目摄像头来采集目标物体或者现实场景的视频数据。
在具体实施过程中,第一终端在采集目标物体的左视图和右视图时,目标物体所处的场景可以是简单背景,如纯白色、纯蓝色或纯绿色等,也可以是复杂背景,如环境较为杂乱的马路。然而,为了能够降低提取算法的复杂度,并且便于从包含有目标物体的左视图和右视图中提取出真实的目标物体的图像,目标物体应该尽量处于较为简单例如颜色单一的背景中。可选地,采用与目标物体颜色差距较大的背景,例如由于蓝色和绿色与人的肤色相差较远,当目标物体为人时,可以选择蓝色背景或者绿色背景。
可选地,S110还包括:接收来自第一终端的第一视频数据,并接收来自第二终端的第二视频数据;或者,接收来自第一终端的第一视频数据,并同步采集当前所处现实场景的第二左视图和第二右视图。
可选地,当该方法应用于服务器时,就可以通过接收来自第一终端的第一视频数据,并接收来自第二终端的第二视频数据的方式,来获得第一视频数据和第二视频数据;当该方法应用于第二终端时,就可以通过接收来自第一终端的第一视频数据,并同步采集当前所处现实场景的第二左视图和第二右视图的方式,来获得第一视频数据和第二视频数据。
在S120中,将第一左视图中目标物体的第一图像与第二左视图融合,并将第一右视图中目标物体的第二图像与第二右视图融合;
这里,为了增强第二终端用户的视频通话的现实感,在获得了第一终端对应的目标物体的第一左视图和第一右视图,并且获得了第二终端当前所处现实场景的第二左视图和第二右视图后,就可以将第一左视图中目标物体的第一图像与第二左视图融合,得到融合后的同时包含有第一终端对应的目标物体和第二终端当前所处现实场景的第二左视图,并将第一右视图中目标物体的第二图像与第二右视图融合,得到融合后的同时包含有第一终端对应的目标物体和第二终端当前所处现实场景的第二右视图。
示例性地,假设第一左视图中目标物体的第一图像为一个站立着的人,而第二左视图中为一棵树,那么,融合后的左视图可以包含一个人站在一棵树旁边。
在实际应用中,在将第一左视图中目标物体的第一图像与第二左视图融合,并将第一右视图中目标物体的第二图像与第二右视图融合时,可采用基于像素的图像融合算法、基于小波变换的多分辨率图像融合算法、金字塔图像融合算法和基于泊松法的图像合成算法等常用的机器视觉算法中的至少一种,可以是由本领域技术人员在具体实施时根据实际情况来确定。
可选地,在S120之前,还包括:按照预设规则,从第一左视图中提取第一图像,并从第一右视图中提取第二图像。
这里,在将目标物体的第一图像与第二左视图融合,并将目标物体的第二图像与第二右视图融合之前,还需要按照预设的规则,从第一左视图中提取第一图像,并从第一右视图中提取第二图像。
在具体实施过程中,可以利用预先存储的目标物体模型,对第一左视图进行目标识别提取出第一图像,并对第一右视图进行目标识别提取出第二图像;也可以利用预先存储的背景模型,过滤第一左视图中的背景数据获得第一图像,并过滤第一右视图中的背景数据获得第二图像;当然,还可以采用其他方法,如局部泊松抠图算法以及贝叶斯抠图算法等来获得第一图像与第二图像,由本领域技术人员在具体实施过程中确定。
在实际应用中,预先存储的目标物体模型可以是机器学习算法对样本进行建模,预先生成的,也可以是用户手动选择目标区域,通过机器视觉算法实时生成的。同样地,预先存储的背景模型可以是根据预先设定的背景颜色信息生成,也可以用户手动标定背景区域,通过机器视觉算法实时生成的。当然,预先存储的目标物体模型或者背景模型还采用其他方式来获得。
示例性地,可以通过机器学习算法对样本目标,如人或汽车等进行学习,获得目标物体的相关特征库,预先建立出目标物体的视觉模型,然后识别匹配出第一视频数据中的目标物体,获得第一左视图中目标物体的第一图像以及第一右视图中目标物体的第二图像;或者,当背景与前景目标物体的颜色差异时,可以过滤掉背景信息,获得目标物体的图像数据;或者,当背景与前景目标物体有明显差异,利用背景图层过滤法,对背景进行透明化处理,获得目标物体的图像数据;或者,可以为背景建立一个高斯背景模型,然后匹配识别出背景数据,获得目标物体的图像数据。
此外,由于得到的图像往往会存在各种各样的噪声,这些噪声可以是外界环境的光线或灰尘颗粒等引起的外部噪声,也可以是视频采集模块内部电路或图像传感模块材料等引起的内部噪声,这些噪声的存在会使得图像中的目标物模糊甚至无法辨别,从而,会导致获得的目标数据不准确。
因此,在具体实施过程中,为了确保能够准确地从第一左视图中提取第一图像,并从第一右视图中提取第二图像,还需要对第一左视图和第一右视图进行去噪处理,进而使用去噪后的第一左视图和去噪后的第一右视图,来提取第一图像和第二图像。
在实际应用中,在进行去噪处理时,所采用的去噪方法可以是线性滤波法、中值滤波法以及维纳滤波法等空间域去噪方法,也可以是傅里叶变换和小波变换等频域去噪方法,当然,还可以是其他类型的去噪方法如颜色直方图均衡化等。
在S130中,根据融合后的第二左视图和融合后的第二右视图,生成对应的三维视频数据。
这里,在获得了融合后的第二左视图和融合后的第二右视图后,就可以利用三维成像技术,生成包含有目标物体与现实场景相融合的三维视频数据。
在实际应用中,在根据融合后的第二左视图和融合后的第二右视图,生成对应的三维视频数据时,可采用色分法、光分法以及时分法等常用的三维成像技术,由本领域技术人员在具体实施时根据实际情况来确定。
在具体实施过程中,为了让第二终端的用户感受到增强现实的三维视频数据,在S130之后,上述方法还包括:显示上述三维视频数据,或者,向第二终端发送三维视频数据。
可选地,当该方法应用于第二终端时,就第二终端就可以直接显示上述三维视频数据;当该方法应用于服务器时,就需要服务器向第二终端发送上述三维视频数据,第二终端在获得了三维视频数据后,再显示该三维视频数据。
在实际应用中,在用户观看该三维视频数据时,可以采用被动式偏光眼镜,也可以采用主动式快门3D(Three Dimensions,三维)眼镜,当然,还可以采用其他方式如VR(Virtual Reality,虚拟现实)眼镜。一般情况下,根据生成该三维视频数据方法的不同,对应地,用于观看该三维视频数据方法也是不同的,如使用基于时分法的3D技术来生成三维视频数据的,那么,用户就可以使用主动式快门3D眼镜来观看。
至此,便完成了对第二视频数据的处理。
由上述内容可知,本实施例所提供的技术方案,首先会在第一终端与第二终端进行视频通话过程中,获得第一视频数据和第二视频数据,其中,第一视频数据中至少包括第一终端对应的目标物体的第一左视图和第一右视图,第二视频数据中至少包括第二终端当前所处现实场景的第二左视图和第二右视图。然后将第一左视图中目标物体的第一图像与第二左视图融合,并将第一右视图中目标物体的第二图像与第二右视图融合。最后,根据融合后的第二左视图和融合后的第二右视图,生成第一终端对应的目标物体与第二终端用户当前所处的现实场景相融合的三维视频数据。这样,通过本实施例提供的视频数据处理方法,可以在视频通话过程中,将第一终端对应的目标物体融入到第二终端所处的现实场景中,以增强第二终端视频通话的现实信息,给第二终端的用户提供增强现实的三维视频通话,从而,让第二终端所对应的用户从感官上感受到 目标物体就在当前自己所处的现实环境中,进而,可以为用户提供良好的用户体验。
实施例二
基于上述实施例,本实施例提供一种视频通信系统,该系统包括:终端一以及终端二。本实施例提供一种视频数据处理方法,可以是应用于该视频通信系统。
示例性地,以视频会议作为实际应用场景,假设用户B与用户A需要沟通项目方案,但是由于两人不在同一城市,不方便进行面对面的交流,用户B可以在会议室中通过该视频通信系统与用户A进行增强现实的视频通话,从而,通过将坐着的用户A融入到用户B所在会议室中用户B对面椅子中,让用户B从感官上觉得用户A处于自己当前所处的现实环境中。
下面详细地说明增强终端二的用户B视频通话体验的过程。
那么,图2为实施例二中的视频数据处理方法的流程示意图,参见图2所示,该视频数据处理方法包括:S201-S206。
在S201中,终端一获取第一视频数据,并将第一视频数据发送给终端二;
其中,第一视频数据中至少包括用户A的第一左视图和第一右视图。
在具体实施过程中,终端一可以通过一个双目3D摄像头来对用户A进行拍摄,来获取用户A的第一左视图和第一右视图。
在实际应用中,为了便于后续提取用户A的图像数据,用户A可以处于单色背景中,如白色、绿色或蓝色等,一般情况下,当需要采集的目标为人时,由于绿色和蓝色与人的肤色相差较大,因此,可以选择蓝色背景或者绿色背景,作用类似于电影拍摄中的蓝幕或者绿幕。
在S202中,终端二接收第一视频数据,并获取第二视频数据;
其中,第二视频数据为以用户B视角采集的会议室的第二左视图和第二右视图。
在实际应用中,终端二可以是一个可佩戴式头盔,头盔的外侧设置有双目3D摄像头,用户B可以将终端二佩戴于头部,这样就可以获得用户B视角看到的会议室的第二左视图和第二右视图。
在S203中,终端二从第一左视图中提取第一图像,并从第一右视图中提取第二图像;
在S204中,终端二将第一左视图中目标物体的第一图像与第二左视图融合,并将第一右视图中目标物体的第二图像与第二右视图融合;
这里,在获得了用户A的第一图像和第二图像后,可以将用户A的三维信息融入到会议室中用户B对面的座位中。
在S205中,终端二根据融合后的第二左视图和融合后的第二右视图,生成对应的三维视频数据;
这里,在获得了融合后的第二左视图和融合后的第二右视图,终端二可以通过三维成像技术,生成对应的三维视频数据以便显示给用户B来看。
在S206中,终端二显示三维视频数据。
这里,在生成了三维视频数据后,终端二就可以将三维视频数据显示给用户B来看。示例性地,用户B可以通过终端二观看到用户A与自己处于同一会议室中,且坐在自己对面的座位中。
至此,便完成了对于终端二获取的第二视频数据的处理。
由上述内容可知,通过本实施例所提供的方法,用户B可以通过终端二来将需要进行通话的用户A的影像融入到用户B所处的现实场景的影像中,并通过三维成像技术来显示出来,从而达到增强现实的效果,使得用户B可以从感官上感受到用户A处于自己所在的现实环境中,提高了用户B的视频通话体验。
实施例三
本实施例提供一种视频数据处理装置。图3为实施例三中的视频数据处理装置的结构示意图,参见图3所示,该视频数据处理装置30包括:获得模块301、融合模块302以及生成模块303;其中,获得模块301,设置为在第一终端与第二终端进行视频通话过程中,获得第一视频数据和第二视频数据,其中,第一视频数据中至少包括第一终端对应的目标物体的第一左视图和第一右视图,第二视频数据中至少包括第二终端当前所处现实场景的第二左视图和第二右视图;融合模块302,设置为将第一左视图中目标物体的第一图像与第二左视图融合,并将第一右视图中目标物体的第二图像与第二右视图融合;生成模块303,设置为根据融合后的第二左视图和融合后的第二右视图,生成对应的三维视频数据。
可选地,该视频数据处理装置,还包括提取模块,设置为按照预设规则,从第一左视图中提取第一图像,并从第一右视图中提取第二图像。
可选地,提取模块,还设置为利用预先存储的目标物体模型,对第一左视 图进行目标识别提取出第一图像,并对第一右视图进行目标识别提取出第二图像。
可选地,提取模块,还设置为利用预先存储的背景模型,过滤第一左视图中的背景数据获得第一图像,并过滤第一右视图中的背景数据获得第二图像。
可选地,获得模块,还设置为接收来自第一终端的第一视频数据,并接收来自第二终端的第二视频数据;相应地,该视频数据处理装置还包括:发送模块,设置为向第二终端发送三维视频数据。
可选地,获得模块,还设置为接收来自第一终端的第一视频数据,并同步采集当前所处现实场景的第二左视图和第二右视图;相应地,该视频数据处理装置还包括:显示模块,设置为显示三维视频数据。
在实际应用中,上述获得模块、融合模块、生成模块、提取模块以及发送模块均可由中央处理器(CPU,Central Processing Unit)、图形处理器(GPU,Graphics Processing Unit)、微处理器(MPU,Micro Processor Unit)、数字信号处理器(DSP,Digital Signal Processor)或现场可编程门阵列(FPGA,Field Programmable Gate Array)等实现。
以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果,因此不做赘述。对于装置实施例中未披露的技术细节,请参照方法实施例的描述而理解,为节约篇幅,因此不再赘述。
实施例四
本实施例提供一种服务器。图4为实施例四中的服务器的结构示意图,参见图4所示,该服务器40包括:收发器401以及处理器402;其中,收发器401,设置为接收来自第一终端的第一视频数据,并接收来自第二终端的第二视频数据,其中,第一视频数据中至少包括第一终端对应的目标物体的第一左视图和第一右视图,第二视频数据中至少包括第二终端当前所处现实场景的第二左视图和第二右视图;还设置为向第二终端发送三维视频数据;处理器402,设置为将第一左视图中目标物体的第一图像与第二左视图融合,并将第一右视图中目标物体的第二图像与第二右视图融合;根据融合后的第二左视图和融合后的第二右视图,生成对应的三维视频数据。
可选地,处理器402,还设置为按照预设规则,从第一左视图中提取第一图像,并从第一右视图中提取第二图像。
可选地,处理器402,还设置为利用预先存储的目标物体模型,对第一左视图进行目标识别提取出第一图像,并对第一右视图进行目标识别提取出第二图像。
可选地,处理器402,还设置为利用预先存储的背景模型,过滤第一左视图中的背景数据获得第一图像,并过滤第一右视图中的背景数据获得第二图像。
本实施例还提供了一种计算机可读存储介质,可以是配置于上述实施例的服务器,存储有计算机可执行指令,所述计算机可执行指令用于执行上述任一任一实施例所述的视频数据处理方法。
可选的,本实施例所述服务器还包括存储器403,所述存储器403用于存储上述如背景模型以及目标物体模型等数据以及逻辑指令。处理器402可以调用存储器403中的逻辑指令,以执行上述实施例的视频数据处理方法。
上述的存储器中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在计算机可读取存储介质中。所述存储介质可以是非暂态存储介质,包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等多种可以存储程序代码的介质,也可以是暂态存储介质。
以上服务器实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果,因此不做赘述。对于服务器实施例中未披露的技术细节,请参照方法实施例的描述而理解,为节约篇幅,因此不再赘述。
实施例五
本实施例提供一种终端。图5为实施例五中的终端的结构示意图,参见图5所示,该终端50包括:接收器501、立体摄像头502、处理器503以及显示器504;其中,接收器501,设置为接收来自对端的视频数据,其中,对端的视频数据中至少包括对端对应的目标物体的第一左视图和第一右视图;立体摄像头502,设置为同步采集当前所处现实场景的第二左视图和第二右视图;处理器503,设置为将第一左视图中目标物体的第一图像与第二左视图融合,并将第一右视图中目标物体的第二图像与第二右视图融合;根据融合后的第二左视图和融合后的第二右视图,生成对应的三维视频数据;显示器504,设置为显示三维视频数据。
可选地,处理器503,还设置为按照预设规则,从第一左视图中提取第一图像,并从第一右视图中提取第二图像。
可选地,处理器503,还设置为利用预先存储的目标物体模型,对第一左视图进行目标识别提取出第一图像,并对第一右视图进行目标识别提取出第二图像。
可选地,处理器503,还设置为利用预先存储的背景模型,过滤第一左视图中的背景数据获得第一图像,并过滤第一右视图中的背景数据获得第二图像。
本实施例还提供了一种计算机可读存储介质,可以是配置于上述实施例的终端中,存储有计算机可执行指令,所述计算机可执行指令用于执行上述任一任一实施例所述的视频数据处理方法。
可选的,所述终端还包括存储器505,所述存储器505用于存储上述如背景模型以及目标物体模型等数据以及逻辑指令。处理器503可以调用存储器505中的逻辑指令,以执行上述实施例的视频数据处理方法。
上述的存储器中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在计算机可读取存储介质中。所述存储介质可以是非暂态存储介质,包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等多种可以存储程序代码的介质,也可以是暂态存储介质。
以上终端实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果,因此不做赘述。对于终端实施例中未披露的技术细节,请参照方法实施例的描述而理解,为节约篇幅,因此不再赘述。
本领域内的技术人员应明白,本实施例可提供为方法、系统或计算机程序产品。因此,本实施例可采用硬件实施例、软件实施例或结合软件和硬件方面的实施例的形式。而且,本实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本实施例是参照根据本实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程 数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
工业实用性
本公开提供的视频数据处理方法、装置及设备能够将第一终端目标物体的图像数据融入第二终端当前的现实场景的图像数据中,增强第二终端视频通话过程中的现实信息,给第二终端的用户提供增强现实的三维视频通话,让用户从感官上感受到目标物体就在当前自己所处的现实环境中,进而,提高了用户体验。

Claims (13)

  1. 一种视频数据处理方法,包括:
    在第一终端与第二终端进行视频通话过程中,获得第一视频数据和第二视频数据,其中,所述第一视频数据中至少包括所述第一终端对应的目标物体的第一左视图和第一右视图,所述第二视频数据中至少包括所述第二终端当前所处现实场景的第二左视图和第二右视图;
    将所述第一左视图中所述目标物体的第一图像与所述第二左视图融合,并将所述第一右视图中所述目标物体的第二图像与所述第二右视图融合;以及
    根据融合后的第二左视图和融合后的第二右视图,生成三维视频数据。
  2. 根据权利要求1所述的方法,其中,在所述将所述第一左视图中所述目标物体的第一图像与所述第二左视图融合,并将所述第一右视图中所述目标物体的第二图像与所述第二右视图融合之前,还包括:
    按照预设规则,从所述第一左视图中提取所述第一图像,并从所述第一右视图中提取所述第二图像。
  3. 根据权利要求2所述的方法,其中,所述按照预设规则,从所述第一左视图中提取所述第一图像,并从所述第一右视图中提取所述第二图像,包括:
    利用预先存储的目标物体模型,对所述第一左视图进行目标识别提取出所述第一图像,并对所述第一右视图进行目标识别提取出所述第二图像。
  4. 根据权利要求2所述的方法,其中,所述按照预设规则,从所述第一左视图中提取所述第一图像,并从所述第一右视图中提取所述第二图像,包括:
    利用预先存储的背景模型,过滤所述第一左视图中的背景数据获得所述第一图像,并过滤所述第一右视图中的背景数据获得所述第二图像。
  5. 根据权利要求1-4任一项所述的方法,其中,所述在第一终端与第二终端进行视频通话过程中,获得第一视频数据和第二视频数据,包括:
    接收来自所述第一终端的所述第一视频数据,并接收来自所述第二终端的所述第二视频数据;
    在所述生成三维视频数据之后,所述方法还包括:
    向所述第二终端发送所述三维视频数据。
  6. 根据权利要求1-4任一项所述的方法,其中,所述在第一终端与第二终端进行视频通话过程中,获得第一视频数据和第二视频数据,包括:
    接收来自所述第一终端的所述第一视频数据,并同步采集所述第二终端当前所处现实场景的所述第二视频数据;
    在所述生成三维视频数据之后,所述方法还包括:
    显示所述三维视频数据。
  7. 一种视频数据处理装置,包括:获得模块、融合模块以及生成模块,其中,
    所述获得模块,设置为在第一终端与第二终端进行视频通话过程中,获得第一视频数据和第二视频数据,其中,所述第一视频数据中至少包括所述第一终端对应的目标物体的第一左视图和第一右视图,所述第二视频数据中至少包括所述第二终端当前所处现实场景的第二左视图和第二右视图;
    所述融合模块,设置为将所述第一左视图中所述目标物体的第一图像与所述第二左视图融合,并将所述第一右视图中所述目标物体的第二图像与所述第二右视图融合;
    所述生成模块,设置为根据融合后的第二左视图和融合后的第二右视图,生成三维视频数据。
  8. 根据权利要求7所述的装置,还包括:提取模块,设置为按照预设规则,从所述第一左视图中提取所述第一图像,并从所述第一右视图中提取所述第二图像。
  9. 根据权利要求8所述的装置,其中,所述提取模块是设置为利用预先存储的目标物体模型,对所述第一左视图进行目标识别提取出所述第一图像,并对所述第一右视图进行目标识别提取出所述第二图像。
  10. 根据权利要求8所述的装置,其中,所述提取模块是设置为利用预先存储的背景模型,过滤所述第一左视图中的背景数据获得所述第一图像,并过滤所述第一右视图中的背景数据获得所述第二图像。
  11. 一种服务器,包括:收发器以及处理器,其中,
    所述收发器,设置为接收来自第一终端的第一视频数据,并接收来自第二终端的第二视频数据,其中,所述第一视频数据中至少包括所述第一终端对应的目标物体的第一左视图和第一右视图,所述第二视频数据中至少包括所述第二终端当前所处现实场景的第二左视图和第二右视图;所述收发器还设置为向所述第二终端发送三维视频数据;
    所述处理器,设置为将所述第一左视图中所述目标物体的第一图像与所述第二左视图融合,并将所述第一右视图中所述目标物体的第二图像与所述第二 右视图融合;以及根据融合后的第二左视图和融合后的第二右视图,生成三维视频数据。
  12. 一种终端,包括:接收器、立体摄像头、处理器以及显示器,其中,
    所述接收器,设置为接收来自对端的视频数据,其中,所述对端的视频数据中至少包括所述对端对应的目标物体的第一左视图和第一右视图;
    所述立体摄像头,设置为同步采集当前所处现实场景的所述第二左视图和所述第二右视图;
    所述处理器,设置为将所述第一左视图中所述目标物体的第一图像与所述第二左视图融合,并将所述第一右视图中所述目标物体的第二图像与所述第二右视图融合;根据融合后的第二左视图和融合后的第二右视图,生成三维视频数据;
    所述显示器,设置为显示所述三维视频数据。
  13. 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1-6任一项所述的方法。
PCT/CN2017/112217 2016-11-28 2017-11-22 视频数据处理方法、装置及设备 WO2018095317A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2019528723A JP2020513704A (ja) 2016-11-28 2017-11-22 ビデオデータ処理方法、装置および機器
EP17874884.4A EP3547672A4 (en) 2016-11-28 2017-11-22 DATA PROCESSING METHOD, DEVICE AND APPARATUS

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611063055.2 2016-11-28
CN201611063055.2A CN108377355A (zh) 2016-11-28 2016-11-28 一种视频数据处理方法、装置及设备

Publications (1)

Publication Number Publication Date
WO2018095317A1 true WO2018095317A1 (zh) 2018-05-31

Family

ID=62194807

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/112217 WO2018095317A1 (zh) 2016-11-28 2017-11-22 视频数据处理方法、装置及设备

Country Status (4)

Country Link
EP (1) EP3547672A4 (zh)
JP (1) JP2020513704A (zh)
CN (1) CN108377355A (zh)
WO (1) WO2018095317A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3599763A3 (en) * 2018-07-25 2020-05-20 Beijing Xiaomi Mobile Software Co., Ltd. Method and apparatus for controlling image display
CN113489920A (zh) * 2021-06-29 2021-10-08 维沃移动通信(杭州)有限公司 一种视频合成方法、装置及电子设备
CN115082639A (zh) * 2022-06-15 2022-09-20 北京百度网讯科技有限公司 图像生成方法、装置、电子设备和存储介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112788274A (zh) * 2019-11-08 2021-05-11 华为技术有限公司 一种基于增强现实的通信方法及装置
CN112788273B (zh) * 2019-11-08 2022-12-02 华为技术有限公司 一种增强现实ar通信系统及基于ar的通信方法
CN112887258B (zh) * 2019-11-29 2022-12-27 华为技术有限公司 一种基于增强现实的通信方法及装置
CN111556271B (zh) * 2020-05-13 2021-08-20 维沃移动通信有限公司 视频通话方法、视频通话装置和电子设备
CN111929323A (zh) * 2020-06-02 2020-11-13 珠海诚锋电子科技有限公司 一种瓦楞纸包装箱印刷瑕疵视觉检测方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030067536A1 (en) * 2001-10-04 2003-04-10 National Research Council Of Canada Method and system for stereo videoconferencing
CN101610421A (zh) * 2008-06-17 2009-12-23 深圳华为通信技术有限公司 视频通讯方法、装置及系统
CN104349111A (zh) * 2013-07-24 2015-02-11 华为技术有限公司 视频会议会场创建方法及系统
CN104685858A (zh) * 2012-09-28 2015-06-03 阿尔卡特朗讯 沉浸式视频会议方法和系统
CN105955456A (zh) * 2016-04-15 2016-09-21 深圳超多维光电子有限公司 虚拟现实与增强现实融合的方法、装置及智能穿戴设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6873723B1 (en) * 1999-06-30 2005-03-29 Intel Corporation Segmenting three-dimensional video images using stereo
JP4738870B2 (ja) * 2005-04-08 2011-08-03 キヤノン株式会社 情報処理方法、情報処理装置および遠隔複合現実感共有装置
CN101668219B (zh) * 2008-09-02 2012-05-23 华为终端有限公司 3d视频通信方法、发送设备和系统
JP5754044B2 (ja) * 2010-09-21 2015-07-22 オリンパス株式会社 撮像装置及び画像通信システム
CN102340648A (zh) * 2011-10-20 2012-02-01 鸿富锦精密工业(深圳)有限公司 用于视频通信系统的视频通信装置、图像处理器及方法
US9524588B2 (en) * 2014-01-24 2016-12-20 Avaya Inc. Enhanced communication between remote participants using augmented and virtual reality
CN106131530B (zh) * 2016-08-26 2017-10-31 万象三维视觉科技(北京)有限公司 一种裸眼3d虚拟现实展示系统及其展示方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030067536A1 (en) * 2001-10-04 2003-04-10 National Research Council Of Canada Method and system for stereo videoconferencing
CN101610421A (zh) * 2008-06-17 2009-12-23 深圳华为通信技术有限公司 视频通讯方法、装置及系统
CN104685858A (zh) * 2012-09-28 2015-06-03 阿尔卡特朗讯 沉浸式视频会议方法和系统
CN104349111A (zh) * 2013-07-24 2015-02-11 华为技术有限公司 视频会议会场创建方法及系统
CN105955456A (zh) * 2016-04-15 2016-09-21 深圳超多维光电子有限公司 虚拟现实与增强现实融合的方法、装置及智能穿戴设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3547672A4 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3599763A3 (en) * 2018-07-25 2020-05-20 Beijing Xiaomi Mobile Software Co., Ltd. Method and apparatus for controlling image display
US11265529B2 (en) 2018-07-25 2022-03-01 Beijing Xiaomi Mobile Software Co., Ltd. Method and apparatus for controlling image display
CN113489920A (zh) * 2021-06-29 2021-10-08 维沃移动通信(杭州)有限公司 一种视频合成方法、装置及电子设备
CN115082639A (zh) * 2022-06-15 2022-09-20 北京百度网讯科技有限公司 图像生成方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
EP3547672A1 (en) 2019-10-02
CN108377355A (zh) 2018-08-07
JP2020513704A (ja) 2020-05-14
EP3547672A4 (en) 2020-07-08

Similar Documents

Publication Publication Date Title
WO2018095317A1 (zh) 视频数据处理方法、装置及设备
US10235560B2 (en) Image processing apparatus, image processing method, and image communication system
CN109615703B (zh) 增强现实的图像展示方法、装置及设备
US10853625B2 (en) Facial signature methods, systems and software
CN105704479B (zh) 3d显示系统用的测量人眼瞳距的方法及系统和显示设备
CN109997175B (zh) 确定虚拟对象的大小
KR20190112712A (ko) 헤드 마운트 디스플레이(hmd)를 이용한 화상회의를 위한 개선된 방법 및 시스템
JP2016537903A (ja) バーチャルリアリティコンテンツのつなぎ合わせおよび認識
US9380263B2 (en) Systems and methods for real-time view-synthesis in a multi-camera setup
CN102761768A (zh) 一种实现立体成像的方法及装置
WO2018121699A1 (zh) 视频通信方法、设备和终端
KR20230071588A (ko) 디오라마 적용을 위한 다수 참여 증강현실 콘텐츠 제공 장치 및 그 방법
CN105894571B (zh) 一种处理多媒体信息的方法及装置
CN103747236A (zh) 结合人眼跟踪的立体视频处理系统及方法
US20220114784A1 (en) Device and method for generating a model of an object with superposition image data in a virtual environment
CN105893452B (zh) 一种呈现多媒体信息的方法及装置
JP2023551864A (ja) 自動立体テレプレゼンスシステムのための3次元(3d)顔特徴追跡
US20230152883A1 (en) Scene processing for holographic displays
TWI542194B (zh) 立體影像處理系統、裝置與方法
US20230122149A1 (en) Asymmetric communication system with viewer position indications
CN105894581B (zh) 一种呈现多媒体信息的方法及装置
WO2017124871A1 (zh) 一种呈现多媒体信息的方法及装置
US20240236288A9 (en) Method And Apparatus For Generating Stereoscopic Display Contents
US20240137481A1 (en) Method And Apparatus For Generating Stereoscopic Display Contents
Rastogi et al. StereoCam3D (An Android App. That Lets You Capture Realtime 3D Pics And Videos)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17874884

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019528723

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017874884

Country of ref document: EP

Effective date: 20190628