WO2021213017A1 - 一种视频叠加方法、装置及系统 - Google Patents

一种视频叠加方法、装置及系统 Download PDF

Info

Publication number
WO2021213017A1
WO2021213017A1 PCT/CN2021/079028 CN2021079028W WO2021213017A1 WO 2021213017 A1 WO2021213017 A1 WO 2021213017A1 CN 2021079028 W CN2021079028 W CN 2021079028W WO 2021213017 A1 WO2021213017 A1 WO 2021213017A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
video frame
display device
capture device
video capture
Prior art date
Application number
PCT/CN2021/079028
Other languages
English (en)
French (fr)
Inventor
刘宗奇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021213017A1 publication Critical patent/WO2021213017A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • H04N21/4316Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations for displaying supplemental content in a region of the screen, e.g. an advertisement in a separate window
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working

Definitions

  • This application relates to the technical field of video processing, and in particular to a video superimposition method, device and system.
  • large-screen display devices are developing in the direction of intelligence, with more and more functions.
  • Some large-screen display devices are equipped with video capture devices, such as cameras, so that users can use large-screen display devices to take pictures, videos, and even use large-screen display devices to establish video call connections with other remote devices.
  • some large-screen display devices also allow users to superimpose some image materials such as photo frames, watermarks, animations, and/or expressions on the video captured by the camera during a video call, so that these image materials are displayed in the remote location along with the video captured by the camera.
  • image materials such as photo frames, watermarks, animations, and/or expressions
  • the superimposition behavior of image materials can be implemented in a large-screen display device.
  • the large-screen display device decodes the video frame received from the video capture device, then superimposes the image material in the decoded video frame, and then encodes the video frame with the superimposed material, and finally the encoded video
  • the frame is sent to the remote device. It can be seen that from the large-screen display device receiving the video frame to the large-screen display device sending the video frame to the remote device, it has undergone a decoding process and an encoding process during the period.
  • the decoding and encoding of the video frame requires a certain amount of consumption. This method will increase the video call delay between the large-screen display device and the remote device and reduce the user experience.
  • the present application provides a video overlay method, device, and system, which can reduce the video call delay of a display device and a remote device in a video overlay scenario, and improve user experience.
  • this application provides a video superimposition method, including: a display device responds to a first user instruction to send image materials to a video capture device; the video capture device superimposes the received image materials on the original video frame to obtain the superimposition Video frame, the original video frame includes the unencoded video frame collected by the video acquisition device; the video acquisition device encodes the superimposed video frame to obtain the encoded video frame; the video acquisition device sends the encoded video frame to the display device.
  • the image material superimposition process is completed in the video capture device, so the display device can directly send the coded video frame to the remote device, and there is no need to perform any operations related to the superimposed image material on the coded video frame. Understanding the code and encoding time, thus reducing the video call delay between the display device and the remote device, and improving the user experience.
  • the method before the display device responds to the first user instruction, before sending the image material to the video capture device, the method further includes: establishing a video connection between the display device and the remote device.
  • the method further includes: the display device sends the encoded video frame to the remote device.
  • the display device and the video capture device establish a data connection through a physical link, and the physical link is used to carry the first transmission protocol channel and/or the second transmission protocol channel between the display device and the video capture device Data transfer.
  • the display device responds to the first user instruction and sends the image material to the video capture device through the first transmission protocol channel.
  • the video capture device sends the encoded video frame to the display device through the second transmission protocol channel.
  • the image material and the encoded video frame are transmitted in two different protocol channels without affecting each other, which is beneficial to improve the stability of data transmission.
  • the method further includes: in response to the second user instruction, the display device sends a first instruction message to the video capture device through the first transmission protocol channel, and the first instruction message is used to instruct the video capture device to stop sending the image The material is superimposed on the original video frame.
  • the method further includes: the display device detects the data flow of the first transmission protocol channel in real time; the display device determines the available bandwidth resource of the second transmission protocol channel according to the data flow and the total bandwidth of the physical link; and the display device The encoding parameters of the superimposed video frame are determined according to the available bandwidth resources; the display device sends the encoding parameters to the video acquisition device through the first transmission protocol channel.
  • the display device can dynamically adjust the encoding parameters according to the size of the available bandwidth resources of the second transmission protocol channel, so that the available bandwidth of the second transmission protocol channel can always meet the requirements for transmitting encoded video frames, improving bandwidth resource utilization, and Ensure that the video call does not freeze.
  • the video capture device encodes the superimposed video frame according to the encoding parameters to obtain the encoded video frame.
  • the physical link is a universal serial bus USB link.
  • the first transmission protocol channel is a remote network driver interface specification RNDIS channel.
  • the second transmission protocol channel is a USB video specification UVC channel.
  • the advantages of the UVC channel in the transmission of video data can be used between the display device and the video capture device, the transmission of video frames, the use of the reliable, stable, and high-efficiency advantages of the RNDIS channel, the transmission of image materials and other video frames.
  • Data such as dimensional measurement data (user positioning and processing abnormal or erroneous data), etc., ensure efficient transmission of image materials without affecting the transmission of video frames.
  • the present application provides an electronic device that includes a display device and a video capture device; wherein the display device is used to send image materials to the video capture device in response to a first user instruction; the video capture device is used to The received image material is superimposed on the original video frame to obtain the superimposed video frame.
  • the original video frame includes the unencoded video frame collected by the video capture device; the video capture device is also used to encode the superimposed video frame to obtain the encoded video Frame; the video capture device is also used to send the encoded video frame to the display device.
  • the image material superimposition process is completed in the video capture device, so the display device can directly send the coded video frame to the remote device, and there is no need to perform any operations related to the superimposed image material on the coded video frame.
  • the time for decoding and encoding is eliminated, so the video call delay between the display device and the remote device is reduced, and the user experience is improved.
  • the display device is also used to establish a video connection with the remote device.
  • the display device is also used to send the encoded video frame to the remote device.
  • the display device and the video capture device establish a data connection through a physical link, and the physical link is used to carry the first transmission protocol channel and/or the second transmission protocol channel between the display device and the video capture device Data transfer.
  • the display device is specifically configured to send image materials to the video capture device through the first transmission protocol channel in response to the first user instruction.
  • the video capture device is specifically configured to send the encoded video frame to the display device through the second transmission protocol channel.
  • the display device is further configured to send a first instruction message to the video capture device through the first transmission protocol channel in response to a second user instruction, and the first instruction message is used to instruct the video capture device to stop superimposing image material To the original video frame.
  • the display device is also used to detect the data flow of the first transmission protocol channel in real time; and determine the available bandwidth resource of the second transmission protocol channel according to the data flow and the total bandwidth of the physical link; and according to the available bandwidth resource Determine the encoding parameters of the superimposed video frame; and send the encoding parameters to the video capture device through the first transmission protocol channel.
  • the video capture device is also used to encode the superimposed video frame according to the encoding parameters to obtain the encoded video frame.
  • the physical link is a universal serial bus USB link;
  • the first transmission protocol channel is a remote network driver interface specification RNDIS channel;
  • the second transmission protocol channel is a USB video specification UVC channel.
  • the present application provides an electronic device that includes a memory and a processor; wherein the memory stores image materials and computer instructions, and when the computer instructions are executed by the processor, the electronic device executes the following steps: In the first user instruction, send the image material to the video capture device; receive the encoded video frame sent by the video capture device, where the encoded video frame is obtained by the video capture device encoding the superimposed video frame, and the superimposed video frame is the video capture device The material is superimposed on the original video frame, and the original video frame includes the unencoded video frame collected by the video capture device.
  • the image material superimposition process is completed in the video capture device, so the electronic device can directly send the coded video frame to the remote device, and there is no need to perform any operations related to the superimposed image material on the coded video frame.
  • the time for decoding and encoding is eliminated, thereby reducing the video call delay between the electronic device and the remote device, and improving the user experience.
  • the electronic device further executes: before sending the image material to the video capture device in response to the first user instruction, establish a video connection with the remote device.
  • the electronic device further executes: sending the encoded video frame to the remote device.
  • the electronic device and the video capture device establish a data connection through a physical link, and the physical link is used to carry the first transmission protocol channel and/or the second transmission protocol channel between the electronic device and the video capture device Data transfer.
  • the electronic device responds to the first user instruction and sends the image material to the video capture device through the first transmission protocol channel.
  • the electronic device receives the encoded video frame sent by the video capture device through the second transmission protocol channel.
  • the electronic device further executes: in response to a second user instruction, send a first instruction message to the video capture device through the first transmission protocol channel, the first instruction message is used to instruct the video capture device to stop superimposing image material To the original video frame.
  • the electronic device also performs: real-time detection of the data flow of the first transmission protocol channel; and determining the available bandwidth resource of the second transmission protocol channel according to the total bandwidth of the data flow and the physical link; and according to the available bandwidth resource Determine the encoding parameters of the superimposed video frame; and send the encoding parameters to the video acquisition device through the first transmission protocol channel, so that the video acquisition device encodes the superimposed video frame according to the encoding parameters to obtain the encoded video frame.
  • the physical link is a universal serial bus USB link;
  • the first transmission protocol channel is a remote network driver interface specification RNDIS channel;
  • the second transmission protocol channel is a USB video specification UVC channel.
  • this application provides a video overlay system, which includes a display device and a video capture device; wherein the display device is used to send image materials to the video capture device in response to a first user instruction; the video capture device Used to superimpose the received image material on the original video frame to obtain the superimposed video frame; the video capture device is also used to encode the superimposed video frame to obtain the coded video frame; the video capture device is also used to send the coded video frame to the display Device; the display device is also used to send the encoded video frame to the remote device after receiving the encoded video frame.
  • this application also provides a computer storage medium.
  • the computer storage medium computer instruction when the computer instruction runs on the display device, causes the display device to execute the method in the first aspect and its implementation.
  • this application also provides a computer storage medium.
  • the computer storage medium computer instruction, when the computer instruction runs on the video capture device, causes the video capture device to execute the method in the first aspect and its implementation.
  • the present application also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method executed by the display device in the first aspect and its implementation.
  • this application also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method executed by the video capture device in the first aspect and its implementation.
  • FIG. 1 is a schematic structural diagram of a large-screen display device provided by an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of another large-screen display device provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of a video call between a large-screen display device and a remote device
  • FIG. 4 is a schematic diagram of a data transmission architecture of a video capture device and a display device provided by an embodiment of the present application;
  • FIG. 5 is a flowchart of a video superimposing method provided by an embodiment of the present application.
  • FIG. 6 is a method of triggering a first user instruction provided by an embodiment of the present application.
  • FIG. 7 is another way of triggering a first user instruction provided by an embodiment of the present application.
  • FIG. 8 is another way of triggering a first user instruction provided by an embodiment of the present application.
  • Figure 9 is a schematic diagram of the video capture device superimposing the photo frame material onto the original video frame
  • Figure 10 is a schematic diagram of a video capture device superimposing expression materials on an original video frame
  • FIG. 11 is a schematic diagram of obtaining an encoded video frame from an original video frame according to an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a post-processing method for superimposed video frames provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of another post-processing method for superimposed video frames provided by an embodiment of the present application.
  • FIG. 14 is a schematic diagram of a display device with a picture-in-picture provided by an embodiment of the present application displaying coded video frames;
  • FIG. 15 is a method of triggering a second user instruction provided by an embodiment of the present application.
  • FIG. 16 is another way of triggering a second user instruction provided by an embodiment of the present application.
  • FIG. 17 is another way of triggering a second user instruction provided by an embodiment of the present application.
  • FIG. 18 is a schematic diagram of superimposing image materials on both the display device and the remote device provided by an embodiment of the present application.
  • FIG. 19 is a schematic diagram of a display device provided by an embodiment of the present application sharing image material with a remote device;
  • FIG. 20 is a flowchart of an adaptive bandwidth adjustment mechanism provided by an embodiment of the present application.
  • first and second are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, the features defined with “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the present embodiment, unless otherwise specified, “plurality” means two or more.
  • large-screen display devices are developing in the direction of intelligence, with more and more functions.
  • Some large-screen display devices are equipped with video capture devices, such as cameras, so that users can use large-screen display devices to take pictures, videos, and even use large-screen display devices to establish video call connections with other remote devices.
  • the large-screen display device may be, for example, a smart TV, a smart screen, a laser TV, a large-screen projection device, an interactive whiteboard device based on a display screen, and the like.
  • the remote device can be, for example, a mobile phone, a tablet computer, a large-screen display device, a notebook computer, a desktop personal computer, a workstation, a smart speaker with a display, a smart alarm clock with a display, and an electrical device with a display (e.g. : Smart refrigerator), augmented reality (AR), virtual reality (VR) and mixed reality (MR), etc.
  • Large-screen display devices and remote devices can be connected through various wired or wireless networks.
  • Fig. 1 is a schematic structural diagram of a large-screen display device provided by an embodiment of the present application.
  • the large-screen display device 110 is equipped with a video capture device 200, and the video capture device 200 may be disposed on the frame 101 of the large-screen display device 110 (for example, the upper frame).
  • the video capture device 200 can be fixedly arranged on the frame 101 or can be movably arranged relative to the frame 101.
  • the video capture device 200 when the video capture device 200 is movably set relative to the frame 101, the video capture device 200 may be a sliding telescopic structure.
  • the video capture device 200 can be hidden within the frame 101.
  • the large-screen display device 110 can remove the video capture device 200 from The frame 101 pops out.
  • the scene that triggers the large-screen display device 110 to eject the video capture device 200 from the frame 101 may include, for example, the user operates the large-screen display device 110 to start the camera application APP, and the user operates the large-screen display device 110 to initiate to the remote device.
  • a video call request, an application in the large-screen display device 110 calls the video capture device 200, and so on.
  • the user can operate the large-screen display device 110 through various methods such as remote control operation, gesture operation, voice operation, and mobile phone APP operation.
  • the video capture device can also be set inside the large-screen display device, for example, within the frame, or located behind the screen of the large-screen display device.
  • Fig. 2 is a schematic structural diagram of another large-screen display device provided by an embodiment of the present application.
  • the large-screen display device 110 and the video capture device 200 may be two independent devices, that is, the video capture device 200 is externally connected to the large-screen display device 110.
  • the large-screen display device 110 and the video capture device 200 may establish a communication connection in a wired or wireless manner.
  • the large-screen display device 110 can control the video capture device 200 to turn on and off through the above-mentioned wired or wireless communication connection, and the video capture device 200
  • the video collected after being turned on may also be transmitted to the large-screen display device 110 through the above-mentioned wired or wireless communication connection.
  • FIG. 3 is a schematic diagram of a video call between the large-screen display device 110 and the remote device 300.
  • the user on either end of the large-screen display device 110 or the remote device 300 wants to make a video call with the other end user, he can operate the device he is using to initiate a video call request to the other end user, if the other end user is on the device he is using Upon receiving the video call request, the large-screen display device 110 and the remote device 300 will establish a video call connection.
  • the large-screen display device 110 will send the video captured by the video capture device to the remote device 300 for display, and the remote device 300 will also send the video captured by its video capture device to the large-screen display
  • the device 110 performs display.
  • some large-screen display devices 110 allow users to superimpose some image materials 102 such as photo frames, watermarks, animations, and/or expressions on the video collected by the video capture device during a video call, so that these image materials 102 can be captured by the video capture device.
  • the video is displayed on the display screen of the remote device 300 together to increase the fun and interactivity of the video call and improve the video call experience.
  • the superimposition of image materials is implemented in the video capture device.
  • This method requires that the image material used for superimposition be built into the internal storage of the video capture device (for example: flash memory) in advance.
  • the video capture device first superimposes the built-in image material to its collection.
  • the video frame superimposed with the image material is encoded, and then sent to the large-screen display device.
  • Table 1 provides the file sizes of image materials of different resolutions in the ARGB8888 color format.
  • the size of an image material with 720P resolution is 3.5MB (megabytes). If the internal storage capacity of the video capture device is 8MB, then only 2 image materials can be stored at most. The internal storage capacity of the device is 16MB, and it can only store up to 4 image materials. Similarly, the size of a 1080P resolution image material is 7.9MB (megabytes). If the internal storage capacity of the video capture device is 8MB, then only one image material can be stored at most. If the internal storage of the video capture device The storage capacity is 16MB, and it can only store up to 2 image materials. Moreover, as the resolution of the image material is further improved, the size of the image material will further increase, making it more difficult to store in the internal storage of the video capture device. In addition, in addition to the limited number of image materials, the image materials stored in the video capture device are not easy to replace flexibly, resulting in a poor user experience in the first method.
  • the superimposition of image materials is implemented in a large-screen display device.
  • the video capture device collects the video frame, it encodes the video frame and then sends it to the large-screen display device; the large-screen display device first decodes the received video frame, and then superimposes the image on the decoded video frame Then, encode the video frame superimposed with the material, and finally send the encoded video frame to the remote device.
  • the large-screen display device receiving the video frame to the large-screen display device sending the video frame to the remote device, it has undergone a decoding process and an encoding process during the period.
  • the decoding and encoding of the video frame requires a certain amount of consumption.
  • Table 2 provides a delay in decoding and encoding video frames of different resolutions in the H264 format by a current large-screen display device.
  • the decoding delay is 5ms (milliseconds)
  • the encoding delay is 10ms
  • the total delay is 15ms
  • the resolution is 1080P.
  • the decoding delay is 5ms
  • the encoding delay is 15ms
  • the total delay is 20ms; for a resolution of 1440P or 2560P, the decoding delay and encoding delay will be longer.
  • the large-screen display device not only sends the encoded video frame to the remote device, but also sends the image material used for superimposition to the remote device.
  • the remote device After receiving the video frame and image material, the remote device first decodes the video frame, then superimposes the image material into the decoded video frame, and finally displays it to the user. It is understandable that because the large-screen display device needs to send the image material separately to the remote device, it will consume additional network resources, and the action of adding image material will also increase the power consumption of the remote device, causing the remote device to appear Heat and battery life decline.
  • an embodiment of the present application provides a video superimposing method.
  • This method can be applied to display devices equipped with video capture devices, such as large-screen display devices.
  • the video capture device can be included in the display device as shown in Figure 1, as a part of the display device, or it can be two independent devices with the display device as shown in Figure 2, and establish a communication connection through wired or wireless means. .
  • Fig. 4 is a schematic diagram of a data transmission architecture of a video capture device and a display device provided by an embodiment of the present application.
  • the video capture device may include, for example, an optical lens 201, an image sensor 202, at least one processor 203, an interface module 204, and a memory 205.
  • the optical lens 201 is composed of multiple lenses, which are used to collect the light source in the field of view and project the light source to the image sensor 202.
  • the local user is generally located in front of the video capture device 200, so that the local user is located within the viewing field of the optical lens 201 .
  • the image sensor 202 includes millions, tens of millions or even hundreds of millions of photosensitive pixels.
  • the image sensor 202 can convert the received light source signal into an electrical signal and send it to the processor 203.
  • the processor 203 may include, for example, an image signal processor (ISP) and a digital signal processor (digital signal processor, DSP); wherein the image signal processor ISP can sample the electrical signal from the image sensor 202 And processing to obtain the video frame, the image signal processor can also preprocess the video frame, such as noise removal, white balance correction, color correction, gamma correction, color space conversion, etc.
  • the digital signal sensor DSP may be used to encode the video frame, and after performing post-processing such as cropping and scaling on the encoded video frame, it is sent to the display device 100 through the interface module 204.
  • the display device 100 may include a memory 103, a processor 104, an interface module 105, and a network module 106, for example.
  • the memory 103 may be used to store image materials
  • the processor 104 may receive the video frame sent by the video capture device 200 through the interface module 105, and send the received video frame to the remote device through the network module 106.
  • the interface module 105 may be, for example, a universal serial bus (USB) interface.
  • USB universal serial bus
  • the interface module 105 may be, for example, a USB-Type-A interface, a USB-Type-B interface, a USB-Type-C interface, a USB-Micro-B interface, and the like.
  • the interface module 105 may be, for example, a USB1.0 interface, a USB2.0 interface, a USB3.0 interface, and the like.
  • the embodiment of the present application does not specifically limit the physical form and interface standard of the interface module 105.
  • the video capture device 200 and the display device 100 can establish a data connection based on a physical link through a USB transmission line 400 connecting the interface modules at both ends. Moreover, based on the USB connection, the embodiment of the present application establishes two virtual transmission protocol channels between the video capture device 200 and the display device 100, for example, the first transmission protocol channel 401 and the second transmission protocol channel 402 for transmission. Different types of data.
  • the first transmission protocol channel 401 may be, for example, a remote network driver interface specification (RNDIS) channel for transmitting non-video data.
  • RNDIS is a communication protocol that can implement Ethernet connections based on USB, such as TCP/IP protocol suite (TCP/IP protocol suite) connections. Based on the TCP/IP protocol, RNDIS can use the socket programming interface and take advantage of the lightweight and portability of the socket programming interface to provide reliable, stable, and efficient transmission capabilities between the video capture device 200 and the display device 100. Suitable for the transmission of non-video data.
  • the second transmission protocol channel 402 may be, for example, a USB video class (UVC) channel for transmitting video data.
  • UVC is a video transmission protocol standard, which enables the video capture device 200 to connect to the display device 100 without installing any driver, and perform video transmission.
  • FIG. 5 is a flowchart of a video superimposing method provided by an embodiment of the present application.
  • the method can be implemented based on the data transmission architecture shown in FIG. 4. As shown in Fig. 5, the method may include the following steps S101 to S105:
  • Step S101 The display device sends image materials to the video capture device in response to the first user instruction.
  • Step S101 may occur after the video call connection between the display device and the remote device is established.
  • the display device can display the video image collected by the remote device on its display screen.
  • the display device can also use the picture-in-picture
  • the video images collected by the video collection device are displayed in a partial area of the display screen (for example, the upper left corner, the upper right corner, etc.).
  • the size of the picture-in-picture displayed by the display device can be determined according to the ratio of the display screen of the remote device. The ratio to the height can be 9:18.5.
  • the ratio of the width to the height of the display screen of the remote device is 16:9, then the ratio of the width to the height of the picture-in-picture can be 16:9. It is understandable that the size of the picture-in-picture may also be determined in other ways, such as the size preset by the display device or the size set by the local user, which is not specifically limited in the embodiment of the present application.
  • the display screen can also display at least one available image material or thumbnails of these available image materials in a predetermined area for the user to browse and select; or, after the video call connection between the display device and the remote device is established, The user can call up at least one available image material or thumbnails of these available image materials through certain operations.
  • the user wants to superimpose image materials in the video frames collected by the video capture device, they can select an image material to be superimposed from the above available image materials through remote control operation, air gesture operation, or voice control.
  • the user when the user uses the remote control 501 to operate the display device, the user can switch the display device to display different thumbnails 107 by operating the keys of the remote control 501, and select an image material to be superimposed. Thumbnail 107; then, the user can click the OK button of the remote control 501 to trigger the first user instruction to make the display device send the image material to the video capture device.
  • the user when the user uses the space gesture 502 to operate the display device, the user can switch the display device to display different thumbnails by space swiping operations (for example, space left swipe, right swipe). , And select a thumbnail 107 of the image material to be superimposed through a space-click operation (such as a space-click operation); then, the user can use a specific space-click (such as a space-double-click operation) or a space-swipe operation (For example, if the thumbnail 107 is selected and swiped upwards, etc.), the first user instruction is triggered to cause the display device to send the image material selected by the user to the video capture device.
  • space swiping operations for example, space left swipe, right swipe.
  • the display device when the user uses voice to control the display device, the display device can directly send the image material in response to the user's voice instruction.
  • the available image materials include multiple photo frames, and when the user says "add the first photo frame", the display device may send the image material corresponding to the first photo frame to the video capture device.
  • the display device can have voice recognition capabilities.
  • the voice recognition capabilities can be achieved by the built-in artificial intelligence AI module in the display device, which can be achieved with the help of the AI speaker docked with the display device. It can also be achieved with the help of a cloud AI server.
  • the above-mentioned AI module may include hardware modules, such as neural-network processing unit (NPU), or software modules, such as convolutional neural network (CNN) and deep neural network (deep neural network). networks, DNN) and so on.
  • NPU neural-network processing unit
  • CNN convolutional neural network
  • DNN deep neural network
  • the AI module can be awakened by a specific voice wake-up word or a specific action to perform voice control.
  • the image material can adopt a variety of color formats, for example: ARGB color format (a color format composed of transparency channel Alpha, red channel Red, green channel Green, and blue channel Blue, commonly used for 32-bit The storage structure of the figure), the ARGB color format can specifically include: ARGB8888 (using 8 bits to record the A, R, G, B data of each pixel), ARGB4444 (using 4 bits to record the A of each pixel, respectively) , R, G, B data), etc.
  • the image material can also adopt a variety of file formats, such as: bitmap BMP, image interchange format (graphics interchange format, GIF), portable network graphics (portable network graphics, PNG), and so on.
  • the image material may be stored locally in the display device in advance.
  • the display device When the display device establishes a video call connection with the remote device, the display device directly displays the thumbnail of the locally stored image material to the user.
  • the image material can also be stored in the cloud server.
  • the display device can obtain the image material from the cloud server and generate a thumbnail of the image material to display to the user.
  • Part of the image material can also be pre-stored in the local storage of the display device, and the other part is stored in the cloud server.
  • the display device When the display device establishes a video call connection with the remote device, the display device first displays the thumbnail of the locally stored image material to the user. The user wants to browse more image materials, the display device then obtains the image materials from the cloud server, and generates thumbnails of the image materials to display to the user.
  • the UVC channel can be used between the display device and the video capture device in terms of video data transmission advantages, to transmit video frames, and use
  • the advantages of reliability, stability, and high efficiency of the RNDIS channel, transmission of image materials and other data other than video frames, such as dimension measurement data (user positioning and processing of abnormal or incorrect data), etc., ensure efficient transmission of image materials at the same time, It does not affect the transmission of video frames.
  • Step S102 After receiving the image material, the video capture device superimposes the image material on the original video frame to obtain the superimposed video frame.
  • the image material may be a single frame material, or it may be a multi-frame material.
  • single-frame materials refer to materials that only contain one frame of images, such as photo frame materials
  • multi-frame materials refer to materials that contain multiple frames of images, such as expressions, animation materials, and so on.
  • the video capture device will continuously generate video frames according to the preset video resolution and frame rate. If the video frame rate is 30Hz, it means that the video capture device will generate 30 video frames per second, and if the video frame rate is 60Hz, it means that the video capture device will generate 60 video frames per second. And generally, the video frames collected by the video capture device need to be encoded and then sent to the display device. In order to distinguish between pre-encoded and post-encoded video frames, the embodiment of this application will The frame is called the original video frame, and the encoded video frame is called the coded video frame.
  • the way the video capture device superimposes them on the original video frame can be different.
  • the following takes photo frame material (generally a single-frame material) and expression material (generally a multi-frame material) as examples to illustrate the way in which the video capture device superimposes the image material onto the original video frame.
  • Figure 9 is a schematic diagram of the video capture device superimposing the photo frame material onto the original video frame. As shown in Figure 9, assuming that the video capture device receives the frame material at time t0, the video capture device will superimpose the frame material on each of the original video frames collected after time t0 until the video call ends or the video The capture device receives an instruction message sent by the display device for instructing it to stop superimposing image materials.
  • the emoticon material is generally composed of multiple frames of images, which are arranged in order, and can present a dynamic effect during playback.
  • the file format of common emoticons is gif, etc.
  • the emoticon material usually only needs to be played once and then disappears without continuous display. This is the difference between the emoticon material and the display of the frame material.
  • Figure 10 is a schematic diagram of the video capture device superimposing the expression material on the original video frame.
  • the expression material includes 30 frames of images E1-E30
  • the video capture device can superimpose image E1 on the first original video captured after time t0 In frame P1, the image E2 is superimposed on the second original video frame P2 collected after t0, and the image E3 is superimposed on the third original video frame P3 collected after t0, and so on, until the The image E30 is superimposed on the second original video frame P30 collected at time t0.
  • a total of 30 original video frames (P1-P30) will be superimposed with the expression material to form 30 superimposed video frames.
  • These 30 superimposed video frames can play a complete emoticon animation. If the frame rate of the video call is 30Hz, the playing time of the emoticon animation is 1 second.
  • the number of video frames here is 30 frames for illustration only, and this application does not limit the number of image frames superimposed on the expression material.
  • 30 frames of expression material can be superimposed into several image frames according to time sequence, one or more image frames can be superimposed with the same expression material image; for example, the image E1 can be superimposed on the original video frames P1 and P2, or can be superimposed Into more original video frames.
  • the video capture device also needs to add the original video frame to the original video frame according to the user's selection. Overlay different image materials.
  • the video capture device has received a previous image material, real-time detection of whether it has received new image material from the display device; if no new image material is received, the video capture device will continue The previous image material is superimposed on the original video frame; if a new image material is received, the video capture device will stop superimposing the previous image material on the original video frame, and then superimpose the new image material on the original video frame.
  • Step S103 The video capture device encodes the superimposed video frame to obtain an encoded video frame.
  • Fig. 11 is a schematic diagram of obtaining an encoded video frame from an original video frame according to an embodiment of the present application.
  • the video capture device receives the image material at time t0, then, for the original video frames generated before time t0, the video capture device will preprocess them and directly encode the encoded video frames; for t0 For the original video frames generated after the moment, the video capture device will preprocess them, then superimpose the image material on the original video frame to obtain the superimposed video frame, and finally, encode the superimposed video frame to obtain the encoded video frame.
  • the embodiments of this application can use multiple encoding methods to obtain encoded video frames, such as: advanced video coding H.264 or high-efficiency video coding (also known as H.265) (high efficiency video coding, HEVC), etc., which will not be repeated here. .
  • advanced video coding H.264 or high-efficiency video coding also known as H.265
  • high efficiency video coding HEVC
  • the superimposed video frame may be post-processed first.
  • the post-processing may include, for example, cropping or scaling the superimposed video frame, and the parameters of the post-processing may be determined according to the resolution of the image material or the resolution of the display screen of the remote device.
  • the video capture device can crop the part of the superimposed video frame outside the photo frame to obtain the superimposed video frame whose size is 1920 ⁇ 1080 pixels and the photo frame is located at the edge of the content.
  • the video capture device can scale the superimposed video frame to add The video frame is reduced from 2160P to 1080P.
  • the reduced size of the superimposed video frame and the encoded video frame generated by the encoding is smaller, which can reduce the occupancy rate of the USB bandwidth resource between the display device and the video capture device, and can reduce the encoded video frame from the subsequent display device The occupancy rate of network resources when transmitting to remote devices.
  • Step S104 The video capture device sends the encoded video frame to the display device.
  • the video capture device can send the encoded video frame to the display device through the UVC channel, so as to take advantage of the UVC channel’s advantages in transmitting video data, which can achieve the efficient transmission of encoded video frames without affecting the transmission of image materials through the RNDIS channel. .
  • Step S105 After receiving the encoded video frame, the display device sends the encoded video frame to the remote device.
  • the remote device After the remote device receives the encoded video frame, it can decode the encoded video frame, render and generate a video image containing the image material, and display the video image on the display screen of the remote device. Therefore, the remote user can see the image material added by the local user during the video call.
  • the display screen displays the video images collected by the video capture device in the form of picture-in-picture 108
  • the display device will not only send the encoded video frame
  • the encoded video frame can also be decoded and rendered, and displayed in the picture-in-picture 108, so that the local user can see the display effect of the selected image material 102 after being superimposed, which is convenient for the local The user decides whether to continue to superimpose the image material 102 or change other image materials according to the display effect.
  • the process of superimposing image materials is completed in the video capture device, so the display device can directly send the encoded video frame to the remote device, and there is no need to perform any superimposed image material related to the encoded video frame.
  • Operations, such as decoding, encoding, etc. save the time of decoding and encoding, thereby reducing the video call delay between the display device and the remote device, and improving the user experience.
  • the method of the embodiment of this application can reduce the time delay of about 15ms; for the video quality of the H264 format and the resolution of 1080P, the method of the embodiment of this application The time delay of about 20ms can be reduced; for the video quality of the H264 format and the resolution of 1440P, the method of this embodiment can reduce the time delay of about 35ms; for the video quality of the H264 format and the resolution of 2560P, the embodiment of this application The method can reduce the time delay of about 60ms. It can be seen that the higher the video quality, the more significant the effect of reducing the delay of the video call by using the method of the embodiment of the present application.
  • the image materials in the embodiments of the present application can be stored in the display device, which is convenient for flexible replacement, and does not occupy the memory space of the video capture device.
  • the display device in the embodiment of the present application does not need to separately send the image material to the remote device, so it will not cause additional network resource consumption.
  • the user when the user superimposes a photo frame in the video for a period of time, he sometimes wants to cancel the superimposition of the photo frame.
  • the user can trigger a second user instruction to the display device through remote control operation, air gesture operation, or voice control.
  • the display device may respond to the first user instruction and send an instruction message to the video capture device through the RNDIS channel, and the instruction message is used to instruct the video capture device to stop superimposing the image material onto the original video frame.
  • the user when the user uses the remote control 501 to operate the display device, the user can operate the arrow keys of the remote control to select the "close photo frame" icon 109 in the display screen; then, the user can click the confirmation ( OK) key to trigger a second user instruction to make the display device send an instruction message to the video capture device.
  • the user when the user uses the space gesture 502 to operate the display device, the user can select the "close photo frame" icon in the display screen through the space swipe operation (for example: space left swipe, right swipe) 109; Then, the user can trigger a second user instruction through a specific air-click (for example, air-double-tap operation) or air-swipe operation (for example, select a thumbnail to move upward, etc.) to make the display device send an instruction to the video capture device information.
  • a specific air-click for example, air-double-tap operation
  • air-swipe operation for example, select a thumbnail to move upward, etc.
  • the display device may directly send an instruction message to the video capture device in response to the user's voice instruction. For example: when the user says "turn off the photo frame", the display device sends an instruction message to the video capture device.
  • the user can cancel the superimposition of the image material at any time, which is flexible and convenient, and improves the user experience.
  • the display device can also send the image material to the remote device, so that the remote device can also display the image material collected by the remote device.
  • the image material is superimposed on the video image, and then sent to the display device for display. In this way, as shown in Figure 18, both users can see each other's video images superimposed with image materials.
  • the display device when the local user selects an image material in the display device and triggers the first user instruction, the display device can pop up on the display screen to ask the user whether to add the image material
  • the dialog box recommended to remote users is shown in Figure 19(a).
  • the user can select the "Yes” or “No” option through remote control operation, air gesture operation or voice control; if the user selects "No", the dialog box is closed and the display device will not recommend images to the remote user Material; if the user selects "Yes", the dialog box is closed, and the display device sends a message recommending image material to the remote device.
  • the message may contain a thumbnail of the image material.
  • the remote device After the remote device receives the message of the recommended image material of the display device, it can display a dialog box with the image thumbnail and ask the remote user whether to experience the image material through the dialog box, as shown in Figure 19 (b) ); if the user selects "don't experience", the remote device closes the dialog box and does not superimpose image material on the captured video image; if the user selects "experience”, the remote device closes the dialog box. And request the file of the image material from the display device. After receiving the image material file sent by the display device, the remote device superimposes the image material on the video image it collects.
  • the display device can also directly send a message recommending the image material to the remote device without the user Perform other operations.
  • the method in the embodiment of the present application also provides an adaptive bandwidth adjustment mechanism, which can reasonably allocate USB bandwidth resources to the first transmission protocol channel and the second transmission protocol channel, and improve the USB bandwidth utilization rate.
  • the bandwidth resources of USB are limited.
  • the theoretical maximum transmission bandwidth of the USB2.0 protocol is 60MB/s, but in actual applications, the actual maximum transmission bandwidth that can be achieved is affected by the data transmission protocol and encoding method. It is only 30MB/s; while the USB3.0 protocol can achieve higher transmission bandwidth, but due to factors such as the memory read and write performance of video capture devices and display devices, the maximum bandwidth that it can actually achieve is also limited.
  • the display device can allocate part of the bandwidth resources to the first transmission protocol channel and another part of the bandwidth resources to the second transmission protocol channel.
  • the display device can allocate 4MB/s bandwidth resources for the second transmission protocol channel; when the video call quality is 1080P@60Hz, the display device The second transmission protocol channel can allocate 10MB/s bandwidth resources; when the video call quality is 1440P@30Hz, the display device can allocate 15MB/s bandwidth resources to the second transmission protocol channel; when the video call quality is 2560P@30Hz At this time, the display device can allocate 25MB/s bandwidth resources for the second transmission protocol channel; the remaining bandwidth resources can be allocated to the first transmission protocol channel.
  • the display device does not send image materials to the video capture device, there is only a small amount of non-video data transmission (such as dimension measurement data) or no data transmission on the first transmission protocol channel, which leads to a waste of USB bandwidth resources .
  • non-video data transmission such as dimension measurement data
  • no data transmission on the first transmission protocol channel which leads to a waste of USB bandwidth resources .
  • the video capture device sends the image material to the display device, if the bandwidth resource allocated to the first transmission protocol channel is too small, the transmission time of the image material will be too long, and the display of the image material will be significantly delayed.
  • the adaptive bandwidth adjustment mechanism provided by the embodiment of the present application may include step S201 to step S205 as shown in FIG. 20.
  • step S201 the display device can detect the data flow of the first transmission protocol channel in real time.
  • step S202 the display device determines the available bandwidth resources of the first transmission protocol channel and the second transmission protocol channel according to the detected data flow and the total bandwidth of the USB (that is, the maximum transmission bandwidth that can be actually achieved).
  • the display device can perform data header analysis on the data traffic on the USB link, such as analyzing the protocol version number in the data header to identify Out of the TCP/IP protocol traffic, the TCP/IP protocol traffic is the data traffic of the first transmission protocol channel, and the other unidentified traffic is the data traffic of the second transmission protocol channel.
  • Data header analysis can be performed at the driver layer of the display device. For example, when a data packet comes, the driver layer can try to analyze the protocol version number of the data header of the data packet. If it can resolve to a protocol version number, such as IPv4 1000, it means that this data packet belongs to the TCP/IP protocol traffic transmitted on the first transmission protocol channel.
  • a traffic model can be constructed to identify the data traffic of the first transmission protocol channel and the second transmission protocol channel.
  • the traffic model may be a neural network model, such as convolutional neural network (CNN), deep neural network (DNN), support vector machine (SVM), and so on.
  • CNN convolutional neural network
  • DNN deep neural network
  • SVM support vector machine
  • a display device can build a convolutional neural network CNN for traffic recognition, and use a large number of labeled data pairs (A, Z) to train the convolutional neural network CNN to enable it to recognize unknown traffic types.
  • a in the data pair (A, Z) represents a known type of traffic segment
  • Z is the labeling result of traffic segment A.
  • A is used as the input of the convolutional neural network CNN
  • Z is used as the neural network
  • the display device when the display device detects that the data flow of the first transmission protocol channel is small, it can allocate most of the bandwidth resources of the USB to the first transmission protocol channel to ensure smooth transmission of video frames. For example, assuming that the maximum transmission bandwidth that the USB link can actually achieve is 30MB/s, when the data flow of the first transmission protocol channel is less than 256KB/s, the display device can allocate 1MB/s of available bandwidth for the first transmission protocol channel.
  • the device can allocate 15MB/s of available bandwidth for the first transmission protocol channel, and allocate the remaining 15MB/s of available bandwidth for the second transmission protocol channel, so as to shorten the transmission time of image materials and reduce the response delay of the display device.
  • the display device may also dynamically determine the encoding parameters of the superimposed video frame according to the available bandwidth resource of the second transmission protocol channel, thereby changing the data traffic consumed by the encoded video frame during transmission. For example, when the available bandwidth of the second transmission protocol channel is 29MB/s, the display device determines that the encoding parameter of the superimposed video frame is 1080P@60Hz, or even higher, such as 1440P@30Hz, 2560P@30Hz, etc.; when the second transmission protocol When the available bandwidth of the channel is 15MB/s, the display device determines that the encoding parameter of the superimposed video frame is 1080P@60Hz, or even lower, such as 1080P@30Hz, 720P@30Hz, etc.
  • step S204 after determining the encoding parameters, the display device may send the encoding parameters to the video capture device through the first transmission protocol channel.
  • step S205 the video capture device encodes the superimposed video frame according to the encoding parameters to obtain an encoded video frame.
  • the embodiment of the present application dynamically adjusts the encoding parameters according to the size of the available bandwidth of the second transmission protocol channel, so that the available bandwidth of the second transmission protocol channel can always meet the requirements for transmitting encoded video frames, improving bandwidth resource utilization, and Ensure that the video call does not freeze.
  • the solutions of the video superimposing method provided in this application are introduced from the perspective of the display device itself, the video capture device, and the interaction between the display device and the video capture device.
  • the above-mentioned display device and the video capture device include hardware structures and/or software modules corresponding to each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraints of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • the embodiment of the present application also provides an electronic device, which may include the display device 100 and the video capture device 200 shown in FIG. Image material is sent; the video capture device 200 is used to superimpose the received image material on the original video frame to obtain a superimposed video frame.
  • the original video frame includes the unencoded video frame captured by the video capture device 200; the video capture device 200 also It is used to encode the superimposed video frame to obtain an encoded video frame; the video capture device 200 is also used to send the encoded video frame to the display device 100.
  • the embodiment of the present application also provides another electronic device.
  • the electronic device may be, for example, the display device 100 shown in FIG. 4.
  • the interface module 105 is used to implement data transmission with the video capture device 200, for example, send image materials to the video capture device 200, and receive encoded video frames sent by the video capture device 200.
  • the memory 103 is used to store image materials and computer program codes.
  • the computer program codes include computer instructions; when the processor 104 executes the computer instructions, the display device executes the methods involved in the above-mentioned embodiments, for example: responding to the first The user instruction sends the image material to the video acquisition device; receives the encoded video frame sent by the video acquisition device, where the encoded video frame is obtained by the video acquisition device encoding the superimposed video frame, and the superimposed video frame is the video acquisition device superimposing the image material on the Obtained from the original video frame, the original video frame includes the unencoded video frame collected by the video capture device.
  • the embodiment of the present application also provides a video capture device, such as the video capture device 200 shown in FIG. 4.
  • the interface module 204 is used to implement data transmission with the display device 100, for example, to receive image materials sent by the display device 100, and to send encoded video frames to the display device 100.
  • the memory 205 is used to store computer program code, the computer program code includes computer instructions; when the processor 203 executes the computer instructions, the video capture device 200 executes the methods involved in the foregoing embodiments, for example: receiving image materials sent by the display device ; Superimpose the image material to the original video frame to obtain the superimposed video frame, the original video frame includes the unencoded video frame collected by the video capture device; encode the superimposed video frame to obtain the encoded video frame; send the encoded video frame to display screen.
  • An embodiment of the present application also provides a video overlay system, which includes a display device and a video capture device.
  • the display device is used to send image materials to the video capture device in response to the first user instruction; the video capture device is used to superimpose the image materials on the original video frame after receiving the image material to obtain the superimposed video frame; the video capture device also Used to encode the superimposed video frame to obtain the encoded video frame; the video capture device is also used to send the encoded video frame to the display device; the display device is also used to send the encoded video frame to the remote device after receiving the encoded video frame .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本申请提供了一种视频叠加方法、装置及系统。其中,该方法包括:显示设备响应于第一用户指令,向视频采集设备发送图像素材;视频采集设备将接收到的图像素材叠加到原始视频帧,得到叠加视频帧,原始视频帧包括视频采集设备采集到的未经编码的视频帧;视频采集设备对叠加视频帧进行编码,得到编码视频帧;视频采集设备将编码视频帧发送给显示设备。根据上述技术方案,图像素材的叠加过程在视频采集设备中完成,因此显示设备可以直接将编码视频帧发送给远端设备,不需要再对编码视频帧进行任何与叠加图像素材相关的操作,省去了解码和编码的时间,因此降低了显示设备和远端设备的视频通话时延,提高了用户使用体验。

Description

一种视频叠加方法、装置及系统
本申请要求于2020年04月24日提交到国家知识产权局、申请号为202010332758.0、发明名称为“一种视频叠加方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频处理技术领域,尤其涉及一种视频叠加方法、装置及系统。
背景技术
目前,大屏显示设备正朝着智能化的方向发展,功能越来越丰富。一些大屏显示设备配备了视频采集设备,例如摄像头,使得用户可以使用大屏显示设备进行拍照、录像,甚至还可以使用大屏显示设备与其他的远端设备建立视频通话连接。另外,一些大屏显示设备还允许用户在视频通话时向摄像头采集到的视频中叠加一些相框、水印、动画和/或表情等图像素材,使得这些图像素材与摄像头采集到的视频一同显示在远端设备的显示屏上,以增加视频通话的趣味性和互动性。
一般而言,图像素材的叠加行为可以在大屏显示设备中实现。具体来说,大屏显示设备对接收自视频采集设备的视频帧进行解码,然后在解码后的视频帧中叠加图像素材,然后再对叠加有素材的视频帧进行编码,最后将编码后的视频帧发送给远端设备。由此可见,从大屏显示设备接收到视频帧,到大屏显示设备将视频帧发送给远端设备,期间经历了一次解码过程和一次编码过程,由于视频帧的解码和编码都需要消耗一定的时间才能完成,即存在一定的解码时延和编码时延,因此这种方式会增加大屏显示设备和远端设备的视频通话时延,降低用户使用体验。
发明内容
本申请提供了一种视频叠加方法、装置及系统,能够降低视频叠加场景下的显示设备和远端设备的视频通话时延,提高用户使用体验。
第一方面,本申请提供了一种视频叠加方法,包括:显示设备响应于第一用户指令,向视频采集设备发送图像素材;视频采集设备将接收到的图像素材叠加到原始视频帧,得到叠加视频帧,原始视频帧包括视频采集设备采集到的未经编码的视频帧;视频采集设备对叠加视频帧进行编码,得到编码视频帧;视频采集设备将编码视频帧发送给显示设备。
根据上述方法,图像素材的叠加过程在视频采集设备中完成,因此显示设备可以直接将编码视频帧发送给远端设备,不需要再对编码视频帧进行任何与叠加图像素材相关的操作,省去了解码和编码的时间,因此降低了显示设备和远端设备的视频通话时延,提高了用户使用体验。
在一种实现方式中,显示设备响应于第一用户指令,向视频采集设备发送图像素材之前,还包括:显示设备与远端设备建立视频连接。
在一种实现方式中,该方法还包括:显示设备将编码视频帧发送给远端设备。
在一种实现方式中,显示设备和视频采集设备通过物理链路建立数据连接,物理链路用于承载显示设备和视频采集设备之间的基于第一传输协议通道和/或第二传输协议通道的数据传输。
在一种实现方式中,显示设备响应于第一用户指令,通过第一传输协议通道向视频采集设备发送图像素材。
在一种实现方式中,视频采集设备通过第二传输协议通道将编码视频帧发送给显示设备。
由此,图像素材和编码视频帧在两个不同的协议通道内传输,彼此互不影响,有利于提高数据传输的稳定性。
在一种实现方式中,该方法还包括:显示设备响应于第二用户指令,通过第一传输协议通道向视频采集设备发送第一指示消息,第一指示消息用于指示视频采集设备停止将图像素材叠加到原始视频帧。
在一种实现方式中,该方法还包括:显示设备实时检测第一传输协议通道的数据流量;显示设备根据数据流量和物理链路的总带宽确定第二传输协议通道的可用带宽资源;显示设备根据可用带宽资源确定叠加视频帧的编码参数;显示设备通过第一传输协议通道将编码参数发送给视频采集设备。
由此,显示设备能够根据第二传输协议通道的可用带宽资源的大小,动态调整编码参数,使得第二传输协议通道的可用带宽始终能够满足传输编码视频帧的需求,提高带宽资源利用率,并且保证视频通话不卡顿。
在一种实现方式中,视频采集设备根据编码参数对叠加视频帧进行编码,得到编码视频帧。
在一种实现方式中,物理链路为通用串行总线USB链路。
在一种实现方式中,第一传输协议通道为远程网络驱动程序接口规范RNDIS通道。
在一种实现方式中,第二传输协议通道为USB视频规范UVC通道。
由此,显示设备和视频采集设备之间可以利用UVC通道在传输视频数据方面的优势,传输视频帧,利用RNDIS通道的可靠、稳定、高效率的优势,传输图像素材以及其他的视频帧以外的数据,例如维测数据(用户定位和处理异常或错误的数据)等,确保了高效传输图像素材的同时、又不影响视频帧的传输。
第二方面,本申请提供了一种电子设备,该电子设备包括显示设备和视频采集设备;其中,显示设备用于响应于第一用户指令,向视频采集设备发送图像素材;视频采集设备用于将接收到的图像素材叠加到原始视频帧,得到叠加视频帧,原始视频帧包括视频采集设备采集到的未经编码的视频帧;视频采集设备还用于对叠加视频帧进行编码,得到编码视频帧;视频采集设备还用于将编码视频帧发送给显示设备。
根据上述电子设备,图像素材的叠加过程在视频采集设备中完成,因此显示设备可以直接将编码视频帧发送给远端设备,不需要再对编码视频帧进行任何与叠加图像素材相关的操作,省去了解码和编码的时间,因此降低了显示设备和远端设备的视频通话时延,提高了用户使用体验。
在一种实现方式中,显示设备还用于与远端设备建立视频连接。
在一种实现方式中,显示设备还用于将编码视频帧发送给远端设备。
在一种实现方式中,显示设备和视频采集设备通过物理链路建立数据连接,物理链 路用于承载显示设备和视频采集设备之间的基于第一传输协议通道和/或第二传输协议通道的数据传输。
在一种实现方式中,显示设备具体用于响应于第一用户指令,通过第一传输协议通道向视频采集设备发送图像素材。
在一种实现方式中,视频采集装置具体用于通过第二传输协议通道将编码视频帧发送给显示设备。
在一种实现方式中,显示设备还用于响应于第二用户指令,通过第一传输协议通道向视频采集设备发送第一指示消息,第一指示消息用于指示视频采集设备停止将图像素材叠加到原始视频帧。
在一种实现方式中,显示设备还用于实时检测第一传输协议通道的数据流量;以及根据数据流量和物理链路的总带宽确定第二传输协议通道的可用带宽资源;以及根据可用带宽资源确定叠加视频帧的编码参数;以及通过第一传输协议通道将编码参数发送给视频采集设备。
在一种实现方式中,视频采集设备还用于根据编码参数对叠加视频帧进行编码,得到编码视频帧。
在一种实现方式中,物理链路为通用串行总线USB链路;第一传输协议通道为远程网络驱动程序接口规范RNDIS通道;第二传输协议通道为USB视频规范UVC通道。
第三方面,本申请提供了一种电子设备,该电子设备包括存储器和处理器;其中,存储器存储有图像素材和计算机指令,当计算机指令被处理器执行时,使得电子设备执行以下步骤:响应于第一用户指令,向视频采集设备发送图像素材;接收视频采集设备发送的编码视频帧,其中,编码视频帧是视频采集设备对叠加视频帧编码得到的,叠加视频帧是视频采集设备将图像素材叠加到原始视频帧中得到的,原始视频帧包括视频采集设备采集到的未经编码的视频帧。
根据上述电子设备,图像素材的叠加过程在视频采集设备中完成,因此电子设备可以直接将编码视频帧发送给远端设备,不需要再对编码视频帧进行任何与叠加图像素材相关的操作,省去了解码和编码的时间,因此降低了电子设备和远端设备的视频通话时延,提高了用户使用体验。
在一种实现方式中,电子设备还执行:在响应于所述第一用户指令,向所述视频采集设备发送图像素材之前,与远端设备建立视频连接。
在一种实现方式中,电子设备还执行:将编码视频帧发送给远端设备。
在一种实现方式中,电子设备和视频采集设备通过物理链路建立数据连接,物理链路用于承载电子设备和视频采集设备之间的基于第一传输协议通道和/或第二传输协议通道的数据传输。
在一种实现方式中,电子设备响应于第一用户指令,通过第一传输协议通道向视频采集设备发送图像素材。
在一种实现方式中,电子设备接收视频采集设备通过第二传输协议通道发送的编码视频帧。
在一种实现方式中,电子设备还执行:响应于第二用户指令,通过第一传输协议通道向视频采集设备发送第一指示消息,第一指示消息用于指示视频采集设备停止将图像素材叠加到原始视频帧。
在一种实现方式中,电子设备还执行:实时检测第一传输协议通道的数据流量;以及根据数据流量和物理链路的总带宽确定第二传输协议通道的可用带宽资源;以及根据可用带宽资源确定叠加视频帧的编码参数;以及通过第一传输协议通道将编码参数发送给视频采集设备,以使得视频采集设备根据编码参数对叠加视频帧进行编码,得到编码视频帧。
在一种实现方式中,物理链路为通用串行总线USB链路;第一传输协议通道为远程网络驱动程序接口规范RNDIS通道;第二传输协议通道为USB视频规范UVC通道。
第四方面,本申请提供了一种视频叠加系统,该视频叠加系统包括显示设备和视频采集设备;其中,显示设备用于响应于第一用户指令,向视频采集设备发送图像素材;视频采集设备用于将接收到的图像素材叠加到原始视频帧,得到叠加视频帧;视频采集设备还用于对叠加视频帧进行编码,得到编码视频帧;视频采集设备还用于将编码视频帧发送给显示设备;显示设备还用于接收到编码视频帧之后,将编码视频帧发送给远端设备。
第五方面,本申请还提供了一种计算机存储介质。该计算机存储介质计算机指令,当计算机指令在显示设备上运行时,使得显示设备执行上述第一方面及其实现方式中的方法。
第六方面,本申请还提供了一种计算机存储介质。该计算机存储介质计算机指令,当计算机指令在视频采集设备上运行时,使得视频采集设备执行上述第一方面及其实现方式中的方法。
第七方面,本申请还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面及其实现方式中显示设备执行的方法。
第八方面,本申请还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面及其实现方式中视频采集设备执行的方法。
附图说明
图1是本申请实施例提供的一种大屏显示设备的结构示意图;
图2是本申请实施例提供的另一种大屏显示设备的结构示意图;
图3是大屏显示设备与远端设备进行视频通话的场景示意图;
图4是本申请实施例提供的视频采集设备和显示设备的数据传输架构的示意图;
图5是本申请实施例提供的视频叠加方法的流程图;
图6是本申请实施例提供的一种触发第一用户指令的方式;
图7是本申请实施例提供的另一种触发第一用户指令的方式;
图8是本申请实施例提供的又一种触发第一用户指令的方式;
图9是视频采集设备将相框素材叠加到原始视频帧的示意图;
图10是视频采集设备将表情素材叠加到原始视频帧的示意图;
图11是本申请实施例示出的从原始视频帧得到编码视频帧的示意图;
图12是本申请实施例提供的一种对叠加视频帧的后处理方式的示意图;
图13是本申请实施例提供的另一种对叠加视频帧的后处理方式的示意图;
图14是本申请实施例提供的带有画中画的显示设备显示编码视频帧的示意图;
图15是本申请实施例提供的一种触发第二用户指令的方式;
图16是本申请实施例提供的另一种触发第二用户指令的方式;
图17是本申请实施例提供的又一种触发第二用户指令的方式;
图18是本申请实施例提供的显示设备和远端设备均叠加图像素材的示意图;
图19是本申请实施例提供的显示设备向远端设备分享图像素材的示意图;
图20是本申请实施例提供的自适应的带宽调整机制的流程图。
具体实施方式
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
目前,大屏显示设备正朝着智能化的方向发展,功能越来越丰富。一些大屏显示设备配备了视频采集设备,例如摄像头,使得用户可以使用大屏显示设备进行拍照、录像,甚至还可以使用大屏显示设备与其他的远端设备建立视频通话连接。
本申请实施例中,大屏显示设备例如可以是智能电视、智慧屏、激光电视、大屏投影设备和基于显示屏的互动式白板设备等。远端设备例如可以是手机、平板电脑、大屏显示设备、笔记本电脑、台式个人电脑、工作站、带有显示屏的智能音箱、带有显示屏的智能闹钟、带有显示屏的电器设备(例如:智能冰箱)、增强现实设备(augmented reality,AR)、虚拟现实设备(virtual reality,VR)和混合现实设备(mixed reality,MR)等。大屏显示设备和远端设备可以通过各类有线或者无线网络建立连接。
图1是本申请实施例提供的一种大屏显示设备的结构示意图。如图1所示,大屏显示设备110配备有视频采集设备200,该视频采集设备200可以设置在大屏显示设备110的边框101之上(例如上方边框)。在不同结构的大屏显示设备110中,视频采集设备200可以固定设置于边框101之上,也可以相对于边框101活动设置。示例地,如图1所示,当视频采集设备200相对于边框101活动设置时,视频采集设备200可以为滑动伸缩结构。在一种实现方式中,当用户不使用视频采集设备200时,视频采集设备200可以隐藏在边框101之内,当用户使用视频采集设备200时,大屏显示设备110可以将视频采集设备200从边框101内弹出。其中,触发大屏显示设备110将视频采集设备200从边框101内弹出动作的场景例如可以包括:用户操作大屏显示设备110启动相机应用程序APP、用户操作大屏显示设备110向远端设备发起视频通话请求、大屏显示设备110中的应用程序调用视频采集设备200等。用户操作大屏显示设备110可以通过遥控器操作、手势操作、语音操作、手机APP操作等多种方式实现。或者,视频采集设备也可以设置在大屏显示设备内部,例如位于边框之内,或者位于大屏显示设备的屏幕后方。
图2是本申请实施例提供的另一种大屏显示设备的结构示意图。如图2所示,大屏显示设备110和视频采集设备200可以是两个独立的设备,即视频采集设备200外接于大屏显示设备110。大屏显示设备110和视频采集设备200可以通过有线或者无线的方式建立通信连接。当用户使用大屏显示设备110进行拍照、摄像、或者与远端设备进行视频通话时,大屏显示设备110可以通过上述有线或者无线的通信连接控制视频采集设备200开启和关闭,视频采集设备200开启后采集到的视频也可以通过上述有线或者无线的通信连接传输给大屏显示设备110。
下面对大屏显示设备与远端设备进行视频通话的场景进行示例性的说明。图3是大屏显示设备110与远端设备300进行视频通话的场景示意图。当大屏显示设备110或者远端设备300的任意一端用户希望与另一端用户进行视频通话时,可以操作其使用的设备向另一端用户发起视频通话请求,如果另一端用户在其使用的设备上接收视频通话请求,那么大屏显示设备110与远端设备300会建立视频通话连接。视频通话连接建立之后,大屏显示设备110会将视频采集设备采集到的视频发送给远端设备300进行显示,同时远端设备300也会将其视频采集设备采集到的视频发送给大屏显示设备110进行显示。目前,一些大屏显示设备110允许用户在视频通话时向视频采集设备采集到的视频中叠加一些相框、水印、动画和/或表情等图像素材102,使得这些图像素材102与视频采集设备采集到的视频一同显示在远端设备300的显示屏上,以增加视频通话的趣味性和互动性,提高视频通话体验。
目前,向视频采集设备采集到的视频中叠加图像素材可以通过以下三种方式实现:
第一种方式,图像素材的叠加行为在视频采集设备中实现。这种方式要求预先把用于叠加的图像素材内置到视频采集设备的内部存储中(例如:闪存flash中),当需要进行图像叠加时,视频采集设备首先将内置的图像素材叠加到其采集到的未经编码的视频帧中,然后对叠加有图像素材的视频帧进行编码,再发送给大屏显示设备。可以理解的是,由于视频采集设备的内部存储的容量十分有限,通常无法存放数量较多的图像素材,导致图像叠加的样式比较单一。示例地,表1提供了在ARGB8888色彩格式下的不同分辨率的图像素材的文件大小。如表1所示,一张720P分辨率的图像素材的大小为3.5MB(兆字节),如果视频采集设备的内部存储的容量为8MB,那么最多只能存储2张图像素材,如果视频采集设备的内部存储的容量为16MB,也最多只能存储4张图像素材。同理,一张1080P分辨率的图像素材的大小为7.9MB(兆字节),如果视频采集设备的内部存储的容量为8MB,那么最多只能存储1张图像素材,如果视频采集设备的内部存储的容量为16MB,也最多只能存储2张图像素材。并且,随着图像素材的分辨率进一步地提高,图像素材的大小会进一步增大,更难以存储在视频采集设备的内部存储中。另外,除了图像素材的数量受限以外,在视频采集设备中存储的图像素材也不便于灵活更换,综合导致了第一种方式的用户体验较差。
分辨率 素材大小(MB) 编码格式
720P 3.5 ARGB8888
1080P 7.9 ARGB8888
1440P 14 ARGB8888
2560P 35 ARGB8888
表1
第二种方式,图像素材的叠加行为在大屏显示设备中实现。具体来说,视频采集设备采集视频帧后,对视频帧进行编码,然后发送给大屏显示设备;大屏显示设备首先对接收到的视频帧进行解码,然后在解码后的视频帧中叠加图像素材,然后再对叠加有素材的视频帧进行编码,最后将编码后的视频帧发送给远端设备。由此可见,从大屏显示 设备接收到视频帧,到大屏显示设备将视频帧发送给远端设备,期间经历了一次解码过程和一次编码过程,由于视频帧的解码和编码都需要消耗一定的时间才能完成,即存在一定的解码时延和编码时延,因此这种方式会增加大屏显示设备和远端设备的视频通话时延。示例地,表2示提供了目前一种大屏显示设备对H264格式的不同分辨率的视频帧进行解码和编码的时延。如表2所示,对于H264格式、分辨率为720P的视频帧,解码时延为5ms(毫秒)、编码时延为10ms,总时延为15ms;对于H264格式、分辨率为1080P的视频帧,解码时延为5ms、编码时延为15ms,总时延为20ms;对于分辨率为1440P或者2560P,其解码时延和编码时延会更长。
分辨率 解码时延(ms) 编码时延(ms) 总时延(ms)
720P 5 10 15
1080P 5 15 20
1440P 15 20 35
2560P 30 30 60
表2
第三种方式,图像素材的叠加动作在远端设备中实现。具体来说,大屏显示设备除了将编码后的视频帧发送给远端设备之外,还将用于叠加的图像素材发送给远端设备。远端设备接收到视频帧和图像素材之后,首先对视频帧进行解码,然后将图像素材叠加到解码后的视频帧中,最后显示给用户。可以理解的是,由于大屏显示设备需要将图像素材单独发送给远端设备,因此会消耗额外的网络资源,并且添加图像素材的动作也会增加远端设备的功耗,导致远端设备出现发热和续航下降。
为了解决上述向视频中叠加图像素材的各种方式所存在的问题,本申请实施例提供了一种视频叠加方法。该方法可以应用于配备了视频采集设备的显示设备,例如大屏显示设备。其中,视频采集设备既可以如图1所示包含在显示设备中,为显示设备的一部分,也可以如图2所示与显示设备成为两个独立的设备,通过有线或者无线的方式建立通信连接。
下面对本申请实施例提供的视频叠加方法的各个实施例进行具体地解释说明。
图4是本申请实施例提供的视频采集设备和显示设备的数据传输架构的示意图。
如图4所示,视频采集设备例如可以包括光学镜头201、图像传感器(image sensor)202、至少一个处理器203、接口模块204和存储器205。其中,光学镜头201有多个镜片组成,用于采集视野中的光源,将光源投影到图像传感器202。当显示设备100的用户(即本地用户)与远端设备的用户(即远端用户)进行视频通话时,本地用户一般会位于视频采集设备200前方,使得本地用户位于光学镜头201的取景视野内。图像传感器202包括几百万、几千万甚至上亿个感光像素,通过这些感光像素,图像传感器202能够将接收到的光源信号转换成电信号,并发送给处理器203。处理器203例如可以包括图像信号处理器(image signal processor,ISP)和数字信号处理器(digital signal processor,DSP);其中,图像信号处理器ISP能够对来自的图像传感器202的电信号的进行采样和处理,得到视频帧,图像信号处理器还可以对视频帧进行预处理,例如噪声去除、白平衡矫正、色彩矫正、伽马矫正、色彩空间转换等。数字信号传感器DSP可以用于对视频帧进行编码,以及对编码后的视频帧进行裁剪、缩放等后处理之后,通过接口模块204 发送给显示设备100。
如图4所示,显示设备100例如可以包括存储器103、处理器104、接口模块105和网络模块106。其中,存储器103可以用于存储图像素材,处理器104可以通过接口模块105接收视频采集设备200发送的视频帧,并通过网络模块106将接收到的视频帧发送给远端设备。
本申请实施例中,接口模块105例如可以是通用串行总线(universal serial bus,USB)接口。在物理形态上,接口模块105例如可以是:USB-Type-A接口、USB-Type-B接口、USB-Type-C接口、USB-Micro-B接口等。在接口标准上,接口模块105例如可以是USB1.0接口、USB2.0接口、USB3.0接口等。本申请实施例对接口模块105的物理形态和接口标准不做具体限定。
如图4所示,视频采集设备200和显示设备100可以通过一根连接两端接口模块的USB传输线400建立基于物理链路的数据连接。并且,基于该USB连接,本申请实施例在视频采集设备200和显示设备100之间建立了两条虚拟的传输协议通道,例如第一传输协议通道401和第二传输协议通道402,用于传输不同类型的数据。
其中,第一传输协议通道401例如可以是远程网络驱动程序接口规范(remote network driver interface specification,RNDIS)通道,用于传输非视频数据。RNDIS是一个通信协议,能够基于USB实现以太网连接,例如TCP/IP协议(TCP/IP protocol suite)连接等。基于TCP/IP协议,RNDIS可以使用socket编程接口,利用socket编程接口的轻量化、移植性好等优点,可以在视频采集设备200和显示设备100之间提供可靠、稳定、高效率的传输能力,适合非视频数据的传输。
其中,第二传输协议通道402例如可以是USB视频规范(USB video class,UVC)通道,用于传输视频数据。UVC是一种视频传输协议标准,能够使视频采集设备200在不需要安装任何驱动程序的情况下连接到显示设备100,并进行视频传输。
图5是本申请实施例提供的视频叠加方法的流程图,该方法可以基于图4所示的数据传输架构实现。如图5所示,该方法可以包括以下步骤S101-步骤S105:
步骤S101,显示设备响应于第一用户指令,向视频采集设备发送图像素材。
步骤S101可以发生在显示设备与远端设备建立视频通话连接之后。具体实现中,如图6所示,在显示设备与远端设备建立视频通话连接之后,显示设备可以在其显示屏内显示远端设备采集到的视频图像,另外,显示设备还可以通过画中画108的方式,在显示屏的部分区域(例如,左上角、右上角等)显示视频采集设备采集到的视频图像。在本申请实施例中,显示设备显示画中画的尺寸可以根据远端设备显示屏的比例确定,例如:如果远端设备显示屏宽度和高度的比例为9:18.5,那么画中画的宽度和高度的比例就可以是9:18.5,如果远端设备显示屏宽度和高度的比例为16:9,那么画中画的宽度和高度的比例就可以是16:9。可以理解的是,画中画的尺寸也可以根据其他的方式确定,例如显示设备预设的尺寸或者本地用户设置的尺寸等,本申请实施例对此不作具体限定。
除此之外,显示屏还可以在预定的区域显示至少一个可用图像素材或者这些可用图像素材的缩略图,以供用户浏览和选取;或者,在显示设备与远端设备建立视频通话连接之后,用户可以通过某些操作调出至少一个可用图像素材或者这些可用图像素材的缩略图。此时,如果用户想要在视频采集设备采集的视频帧中叠加图像素材,则可以通过 遥控器操作、隔空手势操作或者语音控制等方式从上述可用图像素材中选取一个要叠加的图像素材,并触发用于指示显示设备向视频采集设备发送图像素材的第一用户指令,使得显示设备响应于第一用户指令,将用户选取的图像素材发送给视频采集设备。
示例地,如图6所示,当用户使用遥控器501操作显示设备时,用户可以通过操作遥控器501的按键,使显示设备切换显示不同的缩略图107,并选中一个要叠加的图像素材的缩略图107;然后,用户可以点击遥控器501的确认(OK)键触发第一用户指令,使显示设备向视频采集设备发送图像素材。
示例地,如图7所示,当用户使用隔空手势502操作显示设备时,用户可以通过隔空划动操作(例如:隔空左划、右划)使显示设备切换显示不同的缩略图107,并通过隔空点击操作(例如隔空单击操作)选中一个要叠加的图像素材的缩略图107;然后,用户可以通过特定的隔空点击(例如隔空双击操作)或者隔空划动操作(例如选中缩略图107向上划动等)触发第一用户指令,使显示设备将用户选中的图像素材发送给视频采集设备。
示例地,如图8所示,当用户使用语音控制显示设备时,显示设备可以直接响应于用户的语音指令发送图像素材。例如,可用图像素材包括多个相框,当用户说“添加第一个相框”时,显示设备可以将第一个相框对应的图像素材发送给视频采集设备。这里需要补充说明的是,显示设备为了能够识别用户的语音指令,可以具备语音识别能力,语音识别能力可以通过在显示设备中内置人工智能AI模块实现,可以借助与显示设备对接的AI音箱实现,也可以借助云端AI服务器实现。其中,上述AI模块可以包括硬件模块,例如神经网络处理单元(neural-network processing unit,NPU),也可以包括软件模块,例如卷积神经网络(convolutional neural network,CNN)、深度神经网络(deep neural networks,DNN)等。需要说明的是,当用户正在使用显示设备进行视频通信时,可以通过某个特定的语音唤醒词或者某个特定的动作唤醒AI模块来进行语音控制。
本申请实施例中,图像素材可以采用多种色彩格式,例如:ARGB色彩格式(一种由透明通道Alpha、红色通道Red、绿色通道Green、蓝色通道Blue组成的色彩格式,常用于32位位图的存储结构),ARGB色彩格式具体可以包括:ARGB8888(分别用8个bit来记录每个像素的A、R、G、B数据)、ARGB4444(分别用4个bit来记录每个像素的A、R、G、B数据)等。图像素材还可以采用多种文件格式,例如:位图BMP、图像互换格式(graphics interchange format,GIF)、便携式网络图形(portable network graphics,PNG)等。
本申请实施例中,图像素材可以预先在显示设备本地存储,当显示设备与远端设备建立视频通话连接时,显示设备直接将本地存储的图像素材的缩略图展示给用户。图像素材也可以存储在云端服务器,当显示设备与远端设备建立视频通话连接时,显示设备可以从云端服务器获取图像素材,并生成图像素材的缩略图展示给用户。图像素材还可以一部分预先保存在显示设备本地存储,另一部分在云端服务器存储,当显示设备与远端设备建立视频通话连接时,显示设备首先将本地存储的图像素材的缩略图展示给用户,如果用户要浏览更多的图像素材,显示设备再从云端服务器获取图像素材,并生成图像素材的缩略图展示给用户。
本申请实施例中,当显示设备与视频采集设备通过USB连接建立有RNDIS通道和UVC通道时,显示设备和视频采集设备之间可以利用UVC通道在传输视频数据方面的 优势,传输视频帧,利用RNDIS通道的可靠、稳定、高效率的优势,传输图像素材以及其他的视频帧以外的数据,例如维测数据(用户定位和处理异常或错误的数据)等,确保了高效传输图像素材的同时、又不影响视频帧的传输。
步骤S102,视频采集设备接收到图像素材之后,将图像素材叠加到原始视频帧,得到叠加视频帧。
本申请实施例中,图像素材可能是单帧素材,也可能是多帧素材。其中,单帧素材是指仅包含一帧图像的素材,例如相框素材;多帧素材是指包含多帧图像的素材,例如表情、动画素材等。
一般来说,在显示设备与远端设备建立视频通话连接之后,视频采集设备会按照预设的视频分辨率和帧率不断地产生视频帧。如果视频帧率为30Hz,意味着视频采集设备每秒钟会产生30个视频帧,如果视频帧率为60Hz,意味着视频采集设备每秒钟会产生60个视频帧。并且通常情况下,视频采集设备采集的视频帧需要经过编码后,再发送给显示设备,为了区分编码前和编码后的视频帧,本申请实施例将编码前(即未经编码的)的视频帧称作原始视频帧,将编码后的视频帧称作编码视频帧。
具体实现中,对于不同的素材,视频采集设备将其叠加到原始视频帧的方式可以不同。下面分别以相框素材(一般为单帧素材)和表情素材(一般为多帧素材)为例,对视频采集设备将图像素材叠加到原始视频帧的方式进行具体说明。
图9是视频采集设备将相框素材叠加到原始视频帧的示意图。如图9所示,假设视频采集设备在t0时刻接收到相框素材,那么视频采集设备会将相框素材叠加到在t0时刻之后采集到的原始视频帧的每一帧中,直到视频通话结束或者视频采集设备接收到显示设备发送的用于指示其停止叠加图像素材的指示消息。
表情素材一般是由多帧图像组成的,这些图像按照先后顺序排列,在播放时能够呈现出动态效果。常见的表情素材的文件格式例如gif等。一般来说,当用户想要在视频通话时向对方发送一个表情时,表情素材通常只需要播放一次,随即消失即可,不需要持续显示,这也是表情素材与相框素材显示方式的区别所在。
图10是视频采集设备将表情素材叠加到原始视频帧的示意图。如图10所示,假设表情素材包括30帧图像E1-E30,当视频采集设备在t0时刻接收到表情素材时,视频采集设备可以将图像E1叠加到t0时刻之后采集到的第一个原始视频帧P1中,将图像E2叠加到t0时刻之后采集到的第二个原始视频帧P2中,将图像E3叠加到t0时刻之后采集到的第三个原始视频帧P3中,以此类推,直到将图像E30叠加到t0时刻采集到的第二个原始视频帧P30中。由此,在视频采集设备接收到表情素材之后,总共会有30个原始视频帧(P1-P30)被叠加了表情素材,形成30个叠加视频帧。这30个叠加视频帧能够播放出一个完整的表情动画,如果视频通话的帧率为30Hz,那么表情动画的播放时长为1秒。需要说明的是,此处视频帧的数量为30帧仅作为举例说明,本申请并不限定表情素材叠加的图像帧数量。例如:30帧的表情素材可以按照时序叠加到若干个图像帧中,一个或多个图像帧中可以叠加相同的表情素材图像;如图像E1可以叠加到原始视频帧P1、P2中,也可以叠加到更多的原始视频帧中。
可以理解的是,在一次视频通话中,用户可能会在显示设备中先后选择多个图像素材,在这种情况下,视频采集设备也需要根据用户的选择,在不同的时段内向原始视频帧中叠加不同的图像素材。为了实现这一目的,视频采集设备在已经接收到一个在先的 图像素材的情况下,实时检测是否从显示设备接收到了新的图像素材;如果未接收到新的图像素材,视频采集设备会继续向原始视频帧中叠加在先的图像素材;如果接收到新的图像素材,视频采集设备会停止向原始视频帧叠加在先的图像素材,转而向原始视频帧叠加新的图像素材。
步骤S103,视频采集设备对叠加视频帧进行编码,得到编码视频帧。
图11是本申请实施例示出的从原始视频帧得到编码视频帧的示意图。如图11所示,假设视频采集设备在t0时刻接收到图像素材,那么,对于t0时刻之前产生的原始视频帧,视频采集设备会将它们进行预处理之后,直接编码得到编码视频帧;对于t0时刻之后产生的原始视频帧,视频采集设备会将它们进行预处理,然后,将图像素材叠加到原始视频帧,得到叠加视频帧,最后,将叠加视频帧进行编码,得到编码视频帧。
本申请实施例可以使用多种编码方式得到编码视频帧,例如:高级视频编码H.264或者高效率视频编码(又称H.265)(high efficiency video coding,HEVC)等,此处不再赘述。
另外,需要补充说明的是,本申请实施例在对叠加视频帧进行编码之前,可以首先对叠加视频帧进行后处理。后处理例如可以包括对叠加视频帧进行裁剪或者缩放等,后处理的参数可以根据图像素材的分辨率或者远端设备显示屏的分辨率确定。
示例地,如图12所示,当图像素材是尺寸为1920×1080像素大小的相框时,如果叠加视频帧的尺寸大于1920×1080像素,那么叠加视频帧的一部分内容会位于相框之外,在这种情况下,视频采集设备可以将叠加视频帧的位于相框之外的部分裁剪掉,得到尺寸为1920×1080像素大小并且相框位于内容四周边缘的叠加视频帧。
示例地,如图13所示,当叠加视频帧的尺为3840×2160像素(2160P)时,如果预先设置的视频通话质量是1080P,那么,视频采集设备可以对叠加视频帧进行缩放,将叠加视频帧从2160P缩小到1080P。缩小后的叠加视频帧及其编码产生的编码视频帧的字节量更小,能够降低显示设备与视频采集设备之间的USB带宽资源的占用率,并且能够降低编码视频帧在后续从显示设备向远端设备传输时的网络资源的占用率。
步骤S104,视频采集设备将编码视频帧发送给显示设备。
具体实现中,视频采集设备可以通过UVC通道将编码视频帧发送给显示设备,从而利用UVC通道在传输视频数据方面的优势,既能够实现编码视频帧的高效传输、又不影响RNDIS通道传输图像素材。
步骤S105,显示设备接收到编码视频帧之后,将编码视频帧发送给远端设备。
远端设备在接收到编码视频帧后,可以对编码视频帧进行解码,并渲染产生包含图像素材的视频图像,将视频图像远端设备的显示屏上显示。因此远端用户能够在视频通话过程中看到本地用户添加的图像素材。
进一步如图14所示,当显示屏以画中画108的方式显示视频采集设备采集到的视频图像时,如果显示设备接收到了包含图像素材102的编码视频帧,显示设备除了将编码视频帧发送给远端设备300之外,还可以将编码视频帧进行解码和渲染,并在画中画108中显示,从而使得本地用户能够看到其选择的图像素材102被叠加后的显示效果,便于本地用户根据显示效果决定是否继续叠加该图像素材102或者更换其他的图像素材。
本申请实施例提供的方法,图像素材的叠加过程在视频采集设备中完成,因此显示设备可以直接将编码视频帧发送给远端设备,不需要再对编码视频帧进行任何与叠加图 像素材相关的操作,例如:解码、编码等,省去了解码和编码的时间,因此降低了显示设备和远端设备的视频通话时延,提高了用户使用体验。以表2为例,对于H264格式、分辨率为720P的视频质量,本申请实施例的方法能够降低大约15ms的时延;对于H264格式、分辨率为1080P的视频质量,本申请实施例的方法能够降低大约20ms的时延;对于H264格式、分辨率为1440P的视频质量,本申请实施例的方法能够降低大约35ms的时延;对于H264格式、分辨率为2560P的视频质量,本申请实施例的方法能够降低大约60ms的时延。可以看出,视频质量越高,采用本申请实施例的方法降低视频通话时延的效果越显著。除此之外,本申请实施例中的图像素材可以存储在显示设备内,便于灵活更换,并且不占用视频采集设备存储器空间。另外,本申请实施例中的显示设备不需要将图像素材单独发送给远端设备,因此也不会产生额外的网络资源消耗。
需要补充说明的是,在一些视频通话场景中,当用户在视频中叠加了一个相框并且持续一段时间之后,有时会希望取消叠加这个相框。这时,用户可以通过遥控器操作、隔空手势操作或者语音控制等方式,对显示设备触发第二用户指令。显示设备可以响应于第一用户指令,通过RNDIS通道向视频采集设备发送一个指示消息,该指示消息用于指示视频采集设备停止将图像素材叠加到原始视频帧。
示例地,如图15所示,当用户使用遥控器501操作显示设备时,用户可以操作遥控器的方向键选择显示器画面中的“关闭相框”图标109;然后,用户可以点击遥控器的确认(OK)键以触发第二用户指令,使显示设备向视频采集设备发送指示消息。
示例地,如图16所示,当用户使用隔空手势502操作显示设备时,用户可以通过隔空划动操作(例如:隔空左划、右划)选择显示器画面中的“关闭相框”图标109;然后,用户可以通过特定的隔空点击(例如隔空双击操作)或者隔空划动操作(例如选中缩略图向上划动等)触发第二用户指令,使显示设备向视频采集设备发送指示消息。
示例地,如图17所示,当用户使用语音控制显示设备时,显示设备可以直接响应于用户的语音指令向视频采集设备发送指示消息。例如:当用户说“关闭相框”时,显示设备向视频采集设备发送指示消息。
由此,当用户在视频中叠加了图像素材之后,用户可以随时取消叠加图像素材,灵活方便,提升用户体验。
在一些实施例中,当本地用户在显示设备中选择了一个图像素材并且触发第一用户指令之后,显示设备还可以将图像素材发送给远端设备,使得远端设备也可以在其采集到的视频图像中叠加图像素材,然后发送给显示设备进行显示。这样,如图18所示,双方用户就都可以看到对方的叠加有图像素材的视频图像。
作为一种可选择的实现方式,如图19所示,当本地用户在显示设备中选择了一个图像素材并且触发第一用户指令之后,显示设备可以在显示屏弹出用于询问用户是否将图像素材推荐给远端用户的对话框,如图19(a)所示。用户可以通过遥控器操作、隔空手势操作或者语音控制等方式选择“是”或者“否”的选项;如果用户选择了“否”,则对话框关闭,显示设备不会向远端用户推荐图像素材;如果用户选择了“是”,则对话框关闭,显示设备向远端设备发送推荐图像素材的消息,该消息例如可以包含图像素材的缩略图。远端设备在接收到显示设备的推荐图像素材的消息之后,可以显示带有该图像缩略图的对话框,并通过该对话框询问远端用户是否要体验一下这个图像素材,如图19(b)所示;如果用户选择“不体验”,则远端设备关闭对话框,并且不会在采集到的 视频图像中叠加图像素材;如果用户选择“体验一下”,则远端设备关闭对话框,并且向显示设备请求图像素材的文件,远端设备在接收到显示设备发送的图像素材文件后,将图像素材叠加到其采集到的视频图像中。
作为另一种可选择的实现方式,当本地用户在显示设备中选择了一个图像素材并且触发第一用户指令之后,显示设备也可以直接向远端设备发送推荐图像素材的消息,而不需要用户进行其他操作。
可以理解的是,上述用于使双方用户就都可以看到对方的叠加有图像素材的视频图像的实现方式仅仅是本申请实施例示出的一部分实现方式,而不是全部的实现方式,能够在此处应用的设计和构思均没有超出本申请实施例的保护范围。
本申请实施例的方法还提供了一种自适应的带宽调整机制,能够合理地为第一传输协议通道和第二传输协议通道分配USB带宽资源,提高USB带宽利用率。
通常,USB的带宽资源都是有限的,例如:USB2.0协议理论上的最大传输带宽为60MB/s,但是实际应用中,受到数据传输协议和编码方式的影响,实际能够达到的最大传输带宽仅为30MB/s;而USB3.0协议虽然能够达到更高的传输带宽,但是受到视频采集设备和显示设备的存储器读写性能等因素的影响,其实际能够达到的最大带宽也受到限制。
一般来说,基于USB协议有限的带宽资源,显示设备可以将一部分带宽资源分配给第一传输协议通道,将另一部分带宽资源分配给第二传输协议通道。以最大传输带宽为30MB/s为例,当视频通话质量为1080P@30Hz时,显示设备可以为第二传输协议通道分配4MB/s的带宽资源;当视频通话质量为1080P@60Hz时,显示设备可以为第二传输协议通道分配10MB/s的带宽资源;当视频通话质量为1440P@30Hz时,显示设备可以为第二传输协议通道分配15MB/s的带宽资源;当视频通话质量为2560P@30Hz时,显示设备可以为第二传输协议通道分配25MB/s的带宽资源;其余带宽资源则可以分配给第一传输协议通道。然而,当显示设备未向视频采集设备发送图像素材时,第一传输协议通道上仅有少量非视频数据传输(例如维测数据)或者无数据传输,这就导致了USB的带宽资源出现了浪费。另外,当视频采集设备向显示设备发送图像素材时,如果分配给第一传输协议通道带宽资源过少,还会导致图像素材传输时间过长,使图像素材的显示出现明显的延迟。
为了解决上述问题,本申请实施例提供的自适应的带宽调整机制如图20所示可以包括步骤S201-步骤S205。
在步骤S201中,显示设备可以实时检测第一传输协议通道的数据流量。
在步骤S202中,显示设备根据检测到的数据流量和USB的总带宽(即实际能够达到的最大传输带宽)确定第一传输协议通道和第二传输协议通道的可用带宽资源。
作为一种可选择的实现方式,当第一传输协议通道为RNDIS通道时,显示设备可以对USB链路上的数据流量进行数据头head分析,例如分析数据头中的协议版本号等,以识别出TCP/IP协议的流量,TCP/IP协议的流量即为第一传输协议通道的数据流量,其他未识别的流量即为第二传输协议通道的数据流量。数据头分析的可以在显示设备的驱动层进行,例如,当一个数据包来临时,驱动层可以尝试对数据包的数据头进行协议版本号解析,如果能够解析到一个协议版本号,例如IPv4的1000,则说明这个数据包属于在第一传输协议通道传输的TCP/IP协议的流量。
作为一种可选择的实现方式,可以构建流量模型来识别第一传输协议通道和第二传输协议通道的数据流量。其中,流量模型可以神经网络模型,例如卷积神经网络(convolutional neural network,CNN)、深度神经网络(deep neural networks,DNN)、支持向量机(support vector machine,SVM)等。例如,显示设备内可以构建用于流量识别的卷积神经网络CNN,并且使用大量已标注的数据对(A,Z)对卷积神经网络CNN进行训练,使其具备识别未知流量类型的能力。其中,数据对(A,Z)中的A表示已知类型的流量段,Z是流量段A的标注结果,在训练时,以A作为卷积神经网络CNN的输A,以Z作为神经网络模型的输入,以训练卷积神经网络CNN的内部参数。需要补充说明的是,为了卷积神经网络CNN得到归一化的分类结果,卷积神经网络CNN在输出端还可以连接池化层pooling和归一化指数函数层softmax等,这些连接方式属于构建神经网络模型的一般方式,本申请实施例此处不再赘述。
进一步地,显示设备在检测到第一传输协议通道的数据流量较小时,可以将USB的大部分带宽资源分配给第一传输协议通道,以保证视频帧的流畅传输。例如,假设USB链路实际能够达到的最大传输带宽为30MB/s,当第一传输协议通道的数据流量小于256KB/s时,显示设备可以为第一传输协议通道分配1MB/s的可用带宽,为第二传输协议通道分配其余29MB/s的可用带宽;当第一传输协议通道的数据流量在短时间内出现激增,例如达到10MB/s时,说明第一传输协议通道正在传输图像素材,显示设备可以为第一传输协议通道分配15MB/s的可用带宽,为第二传输协议通道分配其余15MB/s的可用带宽,以缩短图像素材的传输时长,降低显示设备的响应延时。
在步骤S203中,显示设备还可以根据第二传输协议通道的可用带宽资源动态确定叠加视频帧的编码参数,从而改变编码视频帧在传输时消耗的数据流量。例如,当第二传输协议通道的可用带宽为29MB/s时,显示设备确定叠加视频帧的编码参数为1080P@60Hz,甚至更高,例如1440P@30Hz、2560P@30Hz等;当第二传输协议通道的可用带宽为15MB/s时,显示设备确定叠加视频帧的编码参数为1080P@60Hz,甚至更低,例如1080P@30Hz、720P@30Hz等。
在步骤S204中,显示设备在确定编码参数之后,可以通过第一传输协议通道将编码参数发送给视频采集设备。
在步骤S205中,视频采集设备根据编码参数对叠加视频帧进行编码,得到编码视频帧。
可以理解的是,采用不同编码参数得到的编码视频帧的字节大小不同,在传输时产生的数据流量也不同,对第二传输协议通道的可用带宽的需求也不同。据此,本申请实施例根据第二传输协议通道的可用带宽的大小,动态调整编码参数,使得第二传输协议通道的可用带宽始终能够满足传输编码视频帧的需求,提高带宽资源利用率,并且保证视频通话不卡顿。
上述本申请提供的实施例中,从显示设备本身、视频采集设备以及从显示设备和视频采集设备之间交互的角度对本申请提供的视频叠加方法的各方案进行了介绍。可以理解的是,上述显示设备和视频采集设备为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的方法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特 定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例还提供了一种电子设备,该电子设备可以包括图4所示的显示设备100和视频采集设备200,其中,显示设备100用于响应于第一用户指令,向视频采集设备200发送图像素材;视频采集设备200用于将接收到的图像素材叠加到原始视频帧,得到叠加视频帧,原始视频帧包括视频采集设备200采集到的未经编码的视频帧;视频采集设备200还用于对叠加视频帧进行编码,得到编码视频帧;视频采集设备200还用于将编码视频帧发送给显示设备100。
本申请实施例还提供了另一种电子设备,该电子设备例如可以是图4所示的显示设备100。其中,接口模块105用于与视频采集设备200实现数据传输,例如向视频采集设备200发送图像素材,以及,接收视频采集设备200发送的编码视频帧。存储器103用于存储图像素材,以及存储计算机程序代码,该计算机程序代码包括计算机指令;当处理器104执行计算机指令时,使显示设备执行上述各实施例中涉及的方法,例如:响应于第一用户指令,向视频采集设备发送图像素材;接收视频采集设备发送的编码视频帧,其中,编码视频帧是视频采集设备对叠加视频帧编码得到的,叠加视频帧是视频采集设备将图像素材叠加到原始视频帧中得到的,原始视频帧包括视频采集设备采集到的未经编码的视频帧。
本申请实施例还提供了一种视频采集设备,例如图4所示的视频采集设备200。其中,接口模块204用于与显示设备100实现数据传输,例如接收显示设备100发送的图像素材,以及,将编码视频帧发送给显示设备100。存储器205用于存储计算机程序代码,该计算机程序代码包括计算机指令;当处理器203执行计算机指令时,使视频采集设备200执行上述各实施例中涉及的方法,例如:接收显示设备发送的图像素材;将图像素材叠加到原始视频帧,得到叠加视频帧,原始视频帧包括视频采集设备采集到的未经编码的视频帧;对叠加视频帧进行编码,得到编码视频帧;将编码视频帧发送给显示设备。
本申请实施例还提供了一种视频叠加系统,该系统包括显示设备和视频采集设备。其中,显示设备用于响应于第一用户指令,向视频采集设备发送图像素材;视频采集设备用于接收到图像素材之后,将图像素材叠加到原始视频帧,得到叠加视频帧;视频采集设备还用于对叠加视频帧进行编码,得到编码视频帧;视频采集设备还用于将编码视频帧发送给显示设备;显示设备还用于接收到编码视频帧之后,将编码视频帧发送给远端设备。
以上的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。

Claims (29)

  1. 一种视频叠加方法,其特征在于,包括:
    显示设备响应于第一用户指令,向视频采集设备发送图像素材;
    所述视频采集设备将接收到的所述图像素材叠加到原始视频帧,得到叠加视频帧,所述原始视频帧包括所述视频采集设备采集到的未经编码的视频帧;
    所述视频采集设备对所述叠加视频帧进行编码,得到编码视频帧;
    所述视频采集设备将所述编码视频帧发送给所述显示设备。
  2. 根据权利要求1所述的方法,其特征在于,所述显示设备响应于第一用户指令,向所述视频采集设备发送图像素材之前,还包括:所述显示设备与远端设备建立视频连接。
  3. 根据权利要求1或2所述的方法,其特征在于,还包括:所述显示设备将所述编码视频帧发送给远端设备。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述显示设备和所述视频采集设备通过物理链路建立数据连接,所述物理链路用于承载所述显示设备和所述视频采集设备之间的基于第一传输协议通道和/或第二传输协议通道的数据传输。
  5. 根据权利要求4所述的方法,其特征在于,所述显示设备响应于第一用户指令,向视频采集设备发送图像素材,包括:所述显示设备响应于所述第一用户指令,通过所述第一传输协议通道向所述视频采集设备发送所述图像素材。
  6. 根据权利要求4或5所述的方法,其特征在于,所述视频采集设备将所述编码视频帧发送给所述显示设备,包括:所述视频采集设备通过所述第二传输协议通道将所述编码视频帧发送给所述显示设备。
  7. 根据权利要求4-6任一项所述的方法,其特征在于,还包括:
    所述显示设备响应于第二用户指令,通过所述第一传输协议通道向所述视频采集设备发送第一指示消息,所述第一指示消息用于指示所述视频采集设备停止将所述图像素材叠加到所述原始视频帧。
  8. 根据权利要求4-7任一项所述的方法,其特征在于,还包括:
    所述显示设备实时检测所述第一传输协议通道的数据流量;
    所述显示设备根据所述数据流量和所述物理链路的总带宽确定所述第二传输协议通道的可用带宽资源;
    所述显示设备根据所述可用带宽资源确定所述叠加视频帧的编码参数;
    所述显示设备通过所述第一传输协议通道将所述编码参数发送给所述视频采集设备。
  9. 根据权利要求8所述的方法,其特征在于,所述视频采集设备对所述叠加视频帧进行编码,得到编码视频帧,包括:所述视频采集设备根据所述编码参数对所述叠加视频帧进行编码,得到所述编码视频帧。
  10. 根据权利要求4-9任一项所述的方法,其特征在于,
    所述物理链路为通用串行总线USB链路;
    所述第一传输协议通道为远程网络驱动程序接口规范RNDIS通道;
    所述第二传输协议通道为USB视频规范UVC通道。
  11. 一种电子设备,其特征在于,包括:显示设备和视频采集设备;
    所述显示设备,用于响应于第一用户指令,向所述视频采集设备发送图像素材;
    所述视频采集设备,用于将接收到的所述图像素材叠加到原始视频帧,得到叠加视频帧,所述原始视频帧包括所述视频采集设备采集到的未经编码的视频帧;
    所述视频采集设备,还用于对所述叠加视频帧进行编码,得到编码视频帧;
    所述视频采集设备,还用于将所述编码视频帧发送给所述显示设备。
  12. 根据权利要求11所述的电子设备,其特征在于,
    所述显示设备,还用于与远端设备建立视频连接。
  13. 根据权利要求11或12所述的电子设备,其特征在于,
    所述显示设备,还用于将所述编码视频帧发送给远端设备。
  14. 根据权利要求11-13任一项所述的电子设备,其特征在于,所述显示设备和所述视频采集设备通过物理链路建立数据连接,所述物理链路用于承载所述显示设备和所述视频采集设备之间的基于第一传输协议通道和/或第二传输协议通道的数据传输。
  15. 根据权利要求14所述的电子设备,其特征在于,
    所述显示设备,具体用于响应于所述第一用户指令,通过所述第一传输协议通道向所述视频采集设备发送所述图像素材。
  16. 根据权利要求14或15所述的电子设备,其特征在于,
    所述视频采集装置,具体用于通过所述第二传输协议通道将所述编码视频帧发送给所述显示设备。
  17. 根据权利要求14-16任一项所述的电子设备,其特征在于,
    所述显示设备,还用于响应于第二用户指令,通过所述第一传输协议通道向所述视频采集设备发送第一指示消息,所述第一指示消息用于指示所述视频采集设备停止将所述图像素材叠加到所述原始视频帧。
  18. 根据权利要求14-17任一项所述的电子设备,其特征在于,
    所述显示设备,还用于实时检测所述第一传输协议通道的数据流量;
    所述显示设备,还用于根据所述数据流量和所述物理链路的总带宽确定所述第二传输协议通道的可用带宽资源;
    所述显示设备,还用于根据所述可用带宽资源确定所述叠加视频帧的编码参数;
    所述显示设备,还用于通过所述第一传输协议通道将所述编码参数发送给所述视频采集设备。
  19. 根据权利要求18所述的电子设备,其特征在于,
    所述视频采集设备,还用于根据所述编码参数对所述叠加视频帧进行编码,得到所述编码视频帧。
  20. 根据权利要求14-19任一项所述的电子设备,其特征在于,
    所述物理链路为通用串行总线USB链路;
    所述第一传输协议通道为远程网络驱动程序接口规范RNDIS通道;
    所述第二传输协议通道为USB视频规范UVC通道。
  21. 一种电子设备,其特征在于,包括:存储器和处理器;所述存储器存储有图像素材和计算机指令,当所述计算机指令被所述处理器执行时,使得所述电子设备执行以下步骤:
    响应于第一用户指令,向视频采集设备发送图像素材;
    接收所述视频采集设备发送的编码视频帧,其中,所述编码视频帧是所述视频采集设备对叠加视频帧编码得到的,所述叠加视频帧是所述视频采集设备将所述图像素材叠加到原始视频帧中得到的,所述原始视频帧包括所述视频采集设备采集到的未经编码的视频帧。
  22. 根据权利要求21所述的电子设备,其特征在于,所述电子设备还执行:
    在响应于所述第一用户指令,向所述视频采集设备发送图像素材之前,与远端设备建立视频连接。
  23. 根据权利要求21或22所述的电子设备,其特征在于,所述电子设备还执行:
    将所述编码视频帧发送给远端设备。
  24. 根据权利要求21-23任一项所述的电子设备,其特征在于,所述电子设备和所述视频采集设备通过物理链路建立数据连接,所述物理链路用于承载所述电子设备和所述视频采集设备之间的基于第一传输协议通道和/或第二传输协议通道的数据传输。
  25. 根据权利要求24所述的电子设备,其特征在于,所述响应于第一用户指令,向视频采集设备发送图像素材,包括:
    响应于所述第一用户指令,通过所述第一传输协议通道向所述视频采集设备发送所述图像素材。
  26. 根据权利要求24或25所述的电子设备,其特征在于,所述接收所述视频采集设备发送的编码视频帧包括:
    接收所述视频采集设备通过所述第二传输协议通道发送的所述编码视频帧。
  27. 根据权利要求24-26任一项所述的电子设备,其特征在于,所述电子设备还执行:
    响应于第二用户指令,通过所述第一传输协议通道向所述视频采集设备发送第一指示消息,所述第一指示消息用于指示所述视频采集设备停止将所述图像素材叠加到所述原始视频帧。
  28. 根据权利要求24-27任一项所述的电子设备,其特征在于,所述电子设备还执行:
    实时检测所述第一传输协议通道的数据流量;
    根据所述数据流量和所述物理链路的总带宽确定所述第二传输协议通道的可用带宽资源;
    根据所述可用带宽资源确定所述叠加视频帧的编码参数;
    通过所述第一传输协议通道将所述编码参数发送给所述视频采集设备,以使得所述视频采集设备根据所述编码参数对所述叠加视频帧进行编码,得到所述编码视频帧。
  29. 根据权利要求24-28任一项所述的电子设备,其特征在于,
    所述物理链路为通用串行总线USB链路;
    所述第一传输协议通道为远程网络驱动程序接口规范RNDIS通道;
    所述第二传输协议通道为USB视频规范UVC通道。
PCT/CN2021/079028 2020-04-24 2021-03-04 一种视频叠加方法、装置及系统 WO2021213017A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010332758.0 2020-04-24
CN202010332758.0A CN113556500B (zh) 2020-04-24 2020-04-24 一种视频叠加方法、装置及系统

Publications (1)

Publication Number Publication Date
WO2021213017A1 true WO2021213017A1 (zh) 2021-10-28

Family

ID=78101311

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/079028 WO2021213017A1 (zh) 2020-04-24 2021-03-04 一种视频叠加方法、装置及系统

Country Status (2)

Country Link
CN (1) CN113556500B (zh)
WO (1) WO2021213017A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023185651A1 (zh) * 2022-03-31 2023-10-05 华为技术有限公司 一种通信方法、装置及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101909166A (zh) * 2009-06-05 2010-12-08 中兴通讯股份有限公司 可视通讯中的接收端/发送端图像合成方法及图像合成器
CN103220490A (zh) * 2013-03-15 2013-07-24 广东欧珀移动通信有限公司 一种在视频通信中实现特效的方法及视频用户端
US20160127508A1 (en) * 2013-06-17 2016-05-05 Square Enix Holdings Co., Ltd. Image processing apparatus, image processing system, image processing method and storage medium
CN207010877U (zh) * 2017-02-22 2018-02-13 徐文波 具备素材加载功能的视频采集设备
CN108924464A (zh) * 2018-07-10 2018-11-30 腾讯科技(深圳)有限公司 视频文件的生成方法、装置及存储介质

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4332365B2 (ja) * 2003-04-04 2009-09-16 ソニー株式会社 メタデータ表示システム,映像信号記録再生装置,撮像装置,メタデータ表示方法
CN101841662A (zh) * 2010-04-16 2010-09-22 华为终端有限公司 移动终端获得相框合成图像的方法及移动终端
CN104091608B (zh) * 2014-06-13 2017-03-15 北京奇艺世纪科技有限公司 一种基于ios设备的视频编辑方法及装置
JP6459954B2 (ja) * 2015-12-24 2019-01-30 株式会社デンソー 車両に搭載される動画送信装置、車両に搭載される動画受信装置、および、車両に搭載される動画通信システム
CN107343220B (zh) * 2016-08-19 2019-12-31 北京市商汤科技开发有限公司 数据处理方法、装置和终端设备
CN108289185B (zh) * 2017-01-09 2021-08-13 腾讯科技(深圳)有限公司 一种视频通信方法、装置及终端设备
WO2018127091A1 (zh) * 2017-01-09 2018-07-12 腾讯科技(深圳)有限公司 一种图像处理的方法、装置、相关设备及服务器
CN106803909A (zh) * 2017-02-21 2017-06-06 腾讯科技(深圳)有限公司 一种视频文件的生成方法及终端
CN107277642B (zh) * 2017-07-24 2020-09-15 硕诺科技(深圳)有限公司 一种基于视频通话数据流处理实现趣味贴图的方法
CN109391792B (zh) * 2017-08-03 2021-10-29 腾讯科技(深圳)有限公司 视频通信的方法、装置、终端及计算机可读存储介质
CN108650542B (zh) * 2018-05-09 2022-02-01 腾讯科技(深圳)有限公司 生成竖屏视频流、图像处理的方法、电子设备和视频系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101909166A (zh) * 2009-06-05 2010-12-08 中兴通讯股份有限公司 可视通讯中的接收端/发送端图像合成方法及图像合成器
CN103220490A (zh) * 2013-03-15 2013-07-24 广东欧珀移动通信有限公司 一种在视频通信中实现特效的方法及视频用户端
US20160127508A1 (en) * 2013-06-17 2016-05-05 Square Enix Holdings Co., Ltd. Image processing apparatus, image processing system, image processing method and storage medium
CN207010877U (zh) * 2017-02-22 2018-02-13 徐文波 具备素材加载功能的视频采集设备
CN108924464A (zh) * 2018-07-10 2018-11-30 腾讯科技(深圳)有限公司 视频文件的生成方法、装置及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023185651A1 (zh) * 2022-03-31 2023-10-05 华为技术有限公司 一种通信方法、装置及系统

Also Published As

Publication number Publication date
CN113556500A (zh) 2021-10-26
CN113556500B (zh) 2022-05-13

Similar Documents

Publication Publication Date Title
WO2019001347A1 (zh) 移动设备的投屏方法、存储介质、终端及投屏系统
US8659638B2 (en) Method applied to endpoint of video conference system and associated endpoint
CN111654629A (zh) 摄像头切换方法、装置、电子设备及可读存储介质
WO2021175054A1 (zh) 图像数据处理方法及相关装置
US20090234919A1 (en) Method of Transmitting Data in a Communication System
US20230162324A1 (en) Projection data processing method and apparatus
WO2022161227A1 (zh) 图像处理方法、装置、图像处理芯片和电子设备
CN101998051A (zh) 图像显示控制装置、图像处理装置及搭载其的摄像装置
WO2021213017A1 (zh) 一种视频叠加方法、装置及系统
CN113301355A (zh) 视频传输、直播与播放方法、设备及存储介质
CN111918098A (zh) 视频处理方法、装置、电子设备、服务器及存储介质
EP3543900A1 (en) Image processing method and electronic device
WO2024051824A1 (zh) 图像处理方法、图像处理电路、电子设备和可读存储介质
WO2003030529A1 (en) Television reception apparatus
EP2230834A1 (en) Apparatus and method for image processing
CN113453069B (zh) 一种显示设备及缩略图生成方法
CN107580228B (zh) 一种监控视频处理方法、装置及设备
CN107544124B (zh) 摄像设备及其控制方法和存储介质
CN115278323A (zh) 一种显示设备、智能设备和数据处理方法
CN110798700B (zh) 视频处理方法、视频处理装置、存储介质与电子设备
US11776186B2 (en) Method for optimizing the image processing of web videos, electronic device, and storage medium applying the method
JP2003319386A (ja) モバイル端末機への撮影画像伝送システム
WO2024082863A1 (zh) 图像处理方法及电子设备
CN109495762A (zh) 数据流处理方法、装置及存储介质、终端设备
WO2022267696A1 (zh) 内容识别方法、装置、电子设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21791748

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21791748

Country of ref document: EP

Kind code of ref document: A1