CN113556500B

CN113556500B - Video overlapping method, device and system

Info

Publication number: CN113556500B
Application number: CN202010332758.0A
Authority: CN
Inventors: 刘宗奇
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2022-05-13
Anticipated expiration: 2040-04-24
Also published as: CN113556500A; WO2021213017A1

Abstract

The application provides a video overlapping method, a device and a system. Wherein, the method comprises the following steps: the display equipment responds to a first user instruction and sends image materials to the video acquisition equipment; the video acquisition equipment superimposes the received image material on an original video frame to obtain a superimposed video frame, wherein the original video frame comprises an uncoded video frame acquired by the video acquisition equipment; the video acquisition equipment encodes the overlapped video frame to obtain an encoded video frame; the video capture device sends the encoded video frames to a display device. According to the technical scheme, the superposition process of the image materials is completed in the video acquisition equipment, so that the display equipment can directly send the coded video frames to the far-end equipment, the coded video frames do not need to be subjected to any operation related to image material superposition, the decoding and coding time is saved, the video call time delay of the display equipment and the far-end equipment is reduced, and the user experience is improved.

Description

Video overlapping method, device and system

Technical Field

The present application relates to the field of video processing technologies, and in particular, to a method, an apparatus, and a system for video superimposition.

Background

At present, large-screen display equipment is developing towards the direction of intellectualization, and functions are more and more abundant. Some large-screen display devices are equipped with video acquisition devices, such as cameras, so that a user can take pictures and record videos by using the large-screen display devices, and even can establish video call connection with other remote devices by using the large-screen display devices. In addition, some large-screen display devices also allow a user to superimpose image materials such as a photo frame, a watermark, animation and/or expressions on the video acquired by the camera during a video call, so that the image materials and the video acquired by the camera are displayed on the display screen of the remote device together, and the interestingness and interactivity of the video call are increased.

In general, the superimposing behavior of image material can be realized in a large-screen display device. Specifically, the large-screen display device decodes a video frame received from the video acquisition device, then superimposes an image material on the decoded video frame, then encodes the video frame on which the material is superimposed, and finally transmits the encoded video frame to the remote device. Therefore, the video frame is received from the large-screen display device and sent to the far-end device by the large-screen display device, a decoding process and an encoding process are performed in the period, and the decoding and the encoding of the video frame can be completed only by consuming a certain time, namely a certain decoding time delay and a certain encoding time delay exist, so that the video call time delay of the large-screen display device and the far-end device can be increased by the method, and the use experience of a user is reduced.

Disclosure of Invention

The application provides a video overlapping method, a video overlapping device and a video overlapping system, which can reduce video call time delay of display equipment and remote equipment in a video overlapping scene and improve user experience.

In a first aspect, the present application provides a video overlay method, including: the display equipment responds to a first user instruction and sends image materials to the video acquisition equipment; the video acquisition equipment superimposes the received image material on an original video frame to obtain a superimposed video frame, wherein the original video frame comprises an uncoded video frame acquired by the video acquisition equipment; the video acquisition equipment encodes the overlapped video frames to obtain encoded video frames; the video capture device sends the encoded video frames to a display device.

According to the method, the superposition process of the image materials is completed in the video acquisition equipment, so that the display equipment can directly send the coded video frames to the remote equipment, the coded video frames do not need to be subjected to any operation related to image material superposition, the decoding and encoding time is saved, the video call time delay of the display equipment and the remote equipment is reduced, and the user experience is improved.

In one implementation, before the display device sends the image material to the video capture device in response to the first user instruction, the method further includes: the display device establishes a video connection with the remote device.

In one implementation, the method further comprises: the display device transmits the encoded video frames to the remote device.

In one implementation, the display device and the video capture device establish a data connection via a physical link, where the physical link is used to carry data transmission between the display device and the video capture device based on the first transport protocol channel and/or the second transport protocol channel.

In one implementation, a display device sends image material to a video capture device over a first transport protocol channel in response to a first user instruction.

In one implementation, the video capture device sends the encoded video frames to the display device via a second transport protocol channel.

Therefore, the image material and the coded video frame are transmitted in two different protocol channels without mutual influence, and the stability of data transmission is improved.

In one implementation, the method further comprises: and the display equipment responds to the second user instruction and sends a first indication message to the video acquisition equipment through the first transmission protocol channel, wherein the first indication message is used for indicating the video acquisition equipment to stop overlaying the image material to the original video frame.

In one implementation, the method further comprises: the display equipment detects the data flow of the first transmission protocol channel in real time; the display equipment determines available bandwidth resources of a second transmission protocol channel according to the data flow and the total bandwidth of the physical link; the display equipment determines the coding parameters of the superposed video frames according to the available bandwidth resources; the display device sends the encoding parameters to the video acquisition device through the first transmission protocol channel.

Therefore, the display equipment can dynamically adjust the coding parameters according to the size of the available bandwidth resources of the second transmission protocol channel, so that the available bandwidth of the second transmission protocol channel can always meet the requirement of transmitting the coded video frames, the utilization rate of the bandwidth resources is improved, and the video call is ensured not to be blocked.

In one implementation, the video capture device encodes the superimposed video frame according to the encoding parameters to obtain an encoded video frame.

In one implementation, the physical link is a Universal Serial Bus (USB) link.

In one implementation, the first transport protocol tunnel is a remote network driver interface specification, RNDIS, tunnel.

In one implementation, the second transport protocol channel is a USB video specification UVC channel.

Therefore, the advantage of the UVC channel in the aspect of transmitting video data can be utilized between the display equipment and the video acquisition equipment to transmit video frames, and the advantage of the RNDIS channel, such as reliable, stable and efficient data (abnormal or wrong data for user positioning and processing) and the like, except for image materials and other video frames can be transmitted, so that the transmission of the video frames is not influenced while the image materials are transmitted efficiently.

In a second aspect, the present application provides an electronic device comprising a display device and a video capture device; the display equipment is used for responding to a first user instruction and sending image materials to the video acquisition equipment; the video acquisition equipment is used for superposing the received image material to an original video frame to obtain a superposed video frame, and the original video frame comprises an uncoded video frame acquired by the video acquisition equipment; the video acquisition equipment is also used for encoding the superposed video frames to obtain encoded video frames; the video capture device is also configured to send the encoded video frames to a display device.

According to the electronic equipment, the superposition process of the image materials is completed in the video acquisition equipment, so that the display equipment can directly send the coded video frames to the far-end equipment, the coded video frames do not need to be subjected to any operation related to image material superposition, the decoding and coding time is saved, the video call time delay of the display equipment and the far-end equipment is reduced, and the user experience is improved.

In one implementation, the display device is also used to establish a video connection with a remote device.

In one implementation, the display device is further configured to transmit the encoded video frames to a remote device.

In one implementation, the display device is specifically configured to send the image material to the video capture device over the first transport protocol channel in response to a first user instruction.

In one implementation, the video capture device is specifically configured to send the encoded video frames to the display device via the second transport protocol channel.

In one implementation, the display device is further configured to send a first indication message to the video capture device over the first transport protocol channel in response to the second user instruction, the first indication message being configured to instruct the video capture device to stop overlaying image material to the original video frame.

In one implementation, the display device is further configured to detect a data traffic of the first transport protocol channel in real time; determining available bandwidth resources of a second transmission protocol channel according to the data flow and the total bandwidth of the physical link; determining the coding parameters of the superposed video frames according to the available bandwidth resources; and sending the encoding parameters to the video acquisition device through the first transport protocol channel.

In one implementation, the video capture device is further configured to encode the overlay video frame according to the encoding parameters to obtain an encoded video frame.

In one implementation, the physical link is a Universal Serial Bus (USB) link; the first transmission protocol channel is a Remote Network Driver Interface Specification (RNDIS) channel; the second transport protocol channel is a USB video specification UVC channel.

In a third aspect, the present application provides an electronic device comprising a memory and a processor; wherein the memory stores image material and computer instructions that, when executed by the processor, cause the electronic device to perform the steps of: responding to a first user instruction, and sending an image material to video acquisition equipment; receiving an encoded video frame sent by video acquisition equipment, wherein the encoded video frame is obtained by encoding an overlapped video frame by the video acquisition equipment, the overlapped video frame is obtained by overlapping a pixel material into an original video frame by the video acquisition equipment, and the original video frame comprises an uncoded video frame acquired by the video acquisition equipment.

According to the electronic equipment, the superposition process of the image materials is completed in the video acquisition equipment, so that the electronic equipment can directly send the coded video frames to the far-end equipment, the coded video frames do not need to be subjected to any operation related to image material superposition, the decoding and coding time is saved, the video call time delay of the electronic equipment and the far-end equipment is reduced, and the user experience is improved.

In one implementation, the electronic device further performs: establishing a video connection with a remote device prior to sending image material to the video capture device in response to the first user instruction.

In one implementation, the electronic device further performs: the encoded video frames are transmitted to a remote device.

In one implementation, the electronic device and the video capture device establish a data connection via a physical link, and the physical link is used for carrying data transmission between the electronic device and the video capture device based on the first transport protocol channel and/or the second transport protocol channel.

In one implementation, an electronic device sends image material to a video capture device over a first transmission protocol channel in response to a first user instruction.

In one implementation, an electronic device receives encoded video frames sent by a video capture device over a second transport protocol channel.

In one implementation, the electronic device further performs: and responding to the second user instruction, and sending a first indication message to the video acquisition equipment through the first transmission protocol channel, wherein the first indication message is used for indicating the video acquisition equipment to stop overlaying the image material to the original video frame.

In one implementation, the electronic device further performs: detecting the data flow of a first transmission protocol channel in real time; determining available bandwidth resources of a second transmission protocol channel according to the data flow and the total bandwidth of the physical link; determining the coding parameters of the superposed video frames according to the available bandwidth resources; and sending the coding parameters to the video acquisition equipment through the first transmission protocol channel, so that the video acquisition equipment encodes the overlapped video frame according to the coding parameters to obtain a coded video frame.

In a fourth aspect, the present application provides a video overlay system comprising a display device and a video capture device; the display equipment is used for responding to a first user instruction and sending image materials to the video acquisition equipment; the video acquisition equipment is used for superposing the received image material to the original video frame to obtain a superposed video frame; the video acquisition equipment is also used for encoding the superposed video frames to obtain encoded video frames; the video acquisition equipment is also used for sending the encoded video frame to the display equipment; the display device is further configured to send the encoded video frame to the remote device after receiving the encoded video frame.

In a fifth aspect, the present application further provides a computer storage medium. The computer storage medium computer instructions, when executed on a display device, cause the display device to perform the method of the first aspect and its implementations described above.

In a sixth aspect, the present application further provides a computer storage medium. The computer storage medium computer instructions, when executed on a video capture device, cause the video capture device to perform the method of the first aspect and its implementations described above.

In a seventh aspect, the present application further provides a computer program product containing instructions, which when run on a computer, causes the computer to perform the method performed by the display device in the first aspect and the implementation manner.

In an eighth aspect, the present application further provides a computer program product containing instructions, which when run on a computer, causes the computer to perform the method performed by the video capture device in the first aspect and the implementation manner thereof.

Drawings

Fig. 1 is a schematic structural diagram of a large-screen display device provided in an embodiment of the present application;

fig. 2 is a schematic structural diagram of another large-screen display device provided in an embodiment of the present application;

FIG. 3 is a schematic view of a video call between a large-screen display device and a remote device;

fig. 4 is a schematic diagram of a data transmission architecture of a video capture device and a display device provided in an embodiment of the present application;

fig. 5 is a flowchart of a video overlay method provided by an embodiment of the present application;

FIG. 6 illustrates a manner of triggering a first user command according to an embodiment of the present application;

FIG. 7 is another manner of triggering a first user instruction provided by an embodiment of the present application;

FIG. 8 is a diagram of another manner of triggering a first user instruction provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of a video capture device overlaying picture frame material onto an original video frame;

FIG. 10 is a schematic diagram of a video capture device overlaying emoji material onto an original video frame;

FIG. 11 is a diagram illustrating an embodiment of the present application showing an encoded video frame derived from an original video frame;

FIG. 12 is a schematic diagram illustrating a post-processing manner for an overlay video frame according to an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of another post-processing manner for the overlay video frame according to an embodiment of the present disclosure;

fig. 14 is a schematic diagram of a display device with pip displaying encoded video frames according to an embodiment of the present application;

FIG. 15 illustrates one manner of triggering a second user command provided by an embodiment of the present application;

FIG. 16 is another manner of triggering a second user instruction provided by embodiments of the present application;

FIG. 17 is yet another way of triggering a second user instruction provided by an embodiment of the present application;

fig. 18 is a schematic diagram of a display device and a remote device both overlapping image material according to an embodiment of the present application;

fig. 19 is a schematic diagram illustrating a display device sharing image materials to a remote device according to an embodiment of the present application;

fig. 20 is a flowchart of an adaptive bandwidth adjustment mechanism provided in an embodiment of the present application.

Detailed Description

In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present embodiment, "a plurality" means two or more unless otherwise specified.

At present, large-screen display equipment is developing towards the direction of intellectualization, and functions are more and more abundant. Some large-screen display devices are equipped with video acquisition devices, such as cameras, so that a user can take pictures and record videos by using the large-screen display devices, and even can establish video call connection with other remote devices by using the large-screen display devices.

In the embodiment of the present application, the large-screen display device may be, for example, a smart television, a smart screen, a laser television, a large-screen projection device, an interactive whiteboard device based on a display screen, and the like. The remote device may be, for example, a mobile phone, a tablet computer, a large screen display device, a notebook computer, a desktop personal computer, a workstation, a smart speaker with a display screen, a smart alarm clock with a display screen, an electrical device with a display screen (e.g., a smart refrigerator), an augmented reality device (AR), a virtual reality device (VR), a mixed reality device (MR), and the like. The large-screen display device and the remote device can be connected through various wired or wireless networks.

Fig. 1 is a schematic structural diagram of a large-screen display device according to an embodiment of the present application. As shown in fig. 1, the large-screen display device 110 is equipped with a video capture device 200, and the video capture device 200 may be disposed on a bezel 101 (e.g., an upper bezel) of the large-screen display device 110. In the large-screen display device 110 with different structures, the video capture device 200 may be fixedly disposed on the frame 101, or may be movably disposed relative to the frame 101. Illustratively, as shown in fig. 1, when the video capture device 200 is movably disposed relative to the bezel 101, the video capture device 200 may be a sliding telescopic structure. In one implementation, video capture device 200 may be hidden within bezel 101 when the user is not using video capture device 200, and large screen display device 110 may pop video capture device 200 out of bezel 101 when the user is using video capture device 200. The scene triggering the large-screen display device 110 to pop the video capture device 200 out of the bezel 101 may include, for example: the user operates the large-screen display device 110 to start the camera application APP, the user operates the large-screen display device 110 to initiate a video call request to a remote device, the application program in the large-screen display device 110 calls the video capture device 200, and the like. The user can operate the large-screen display device 110 through a plurality of modes such as remote controller operation, gesture operation, voice operation, mobile phone APP operation and the like. Alternatively, the video capture device may be disposed inside the large-screen display device, for example, within the bezel, or behind the screen of the large-screen display device.

Fig. 2 is a schematic structural diagram of another large-screen display device provided in an embodiment of the present application. As shown in fig. 2, the large screen display device 110 and the video capture device 200 may be two separate devices, i.e., the video capture device 200 is external to the large screen display device 110. The large screen display device 110 and the video capture device 200 may establish a communication connection in a wired or wireless manner. When a user uses the large-screen display device 110 to take a picture, take a video, or perform a video call with a remote device, the large-screen display device 110 may control the video capture device 200 to be turned on and off through the wired or wireless communication connection, and a video captured after the video capture device 200 is turned on may also be transmitted to the large-screen display device 110 through the wired or wireless communication connection.

The following is an exemplary description of a scenario in which a large-screen display device makes a video call with a remote device. Fig. 3 is a schematic diagram of a scenario in which the large-screen display device 110 performs a video call with the remote device 300. When any end user of the large-screen display device 110 or the far-end device 300 wishes to perform a video call with another end user, the device used by the end user can be operated to initiate a video call request to the other end user, and if the other end user receives the video call request on the device used by the end user, the large-screen display device 110 and the far-end device 300 can establish a video call connection. After the video call connection is established, the large-screen display device 110 sends the video acquired by the video acquisition device to the remote device 300 for display, and meanwhile, the remote device 300 also sends the video acquired by the video acquisition device to the large-screen display device 110 for display. Currently, some large-screen display devices 110 allow a user to superimpose image materials 102 such as a photo frame, a watermark, animation, and/or an expression on a video acquired by a video acquisition device during a video call, so that the image materials 102 and the video acquired by the video acquisition device are displayed on a display screen of a remote device 300 together, thereby increasing interest and interactivity of the video call and improving video call experience.

At present, overlaying image materials in a video acquired by a video acquisition device can be realized by the following three ways:

in the first way, the superposition behavior of image material is implemented in the video capture device. The method requires that image materials for superposition are built in the internal storage of the video acquisition equipment (for example, in flash memory), when image superposition is needed, the video acquisition equipment firstly superimposes the built-in image materials on the acquired uncoded video frames, then codes the video frames on which the image materials are superimposed and sends the video frames to the large-screen display equipment. It can be understood that, because the internal storage capacity of the video capture device is very limited, a large amount of image materials cannot be stored, so that the image superposition style is relatively single. Illustratively, table 1 provides file sizes for image material of different resolutions in the ARGB8888 color format. As shown in table 1, a size of a 720P resolution image material is 3.5MB (megabyte), if the internal storage capacity of the video capture apparatus is 8MB, only 2 image materials can be stored at the maximum, and if the internal storage capacity of the video capture apparatus is 16MB, only 4 image materials can be stored at the maximum. Similarly, the size of a 1080P resolution image material is 7.9MB (megabyte), if the internal storage capacity of the video capture device is 8MB, only 1 image material can be stored at most, and if the internal storage capacity of the video capture device is 16MB, only 2 image materials can be stored at most. And, as the resolution of the image material is further increased, the size of the image material may be further increased, making it more difficult to store in the internal storage of the video capture device. In addition, besides the limited number of image materials, the image materials stored in the video acquisition device are not convenient to flexibly replace, and the user experience of the first mode is poor comprehensively.

Resolution ratio	Material size (MB)	Coding format
			720P	3.5	ARGB8888
1080P	7.9	ARGB8888
			1440P	14	ARGB8888
2560P	35	ARGB8888

TABLE 1

In the second mode, the superposition behavior of image materials is realized in a large-screen display device. Specifically, after a video frame is collected by a video collecting device, the video frame is coded and then sent to a large-screen display device; the large-screen display equipment decodes the received video frame, then superimposes image materials on the decoded video frame, then encodes the video frame superimposed with the materials, and finally sends the encoded video frame to the remote equipment. Therefore, the video frame is received from the large-screen display device and sent to the remote device by the large-screen display device, and a decoding process and an encoding process are performed in the period, and the decoding and the encoding of the video frame can be completed only by consuming a certain time, namely a certain decoding delay and a certain encoding delay exist, so that the video call delay of the large-screen display device and the remote device can be increased by the method. Illustratively, table 2 provides the latency for decoding and encoding video frames of different resolutions in the H264 format for one current large screen display device. As shown in table 2, for a video frame with H264 format and resolution of 720P, the decoding delay is 5ms (milliseconds), the encoding delay is 10ms, and the total delay is 15 ms; for a video frame with an H264 format and a resolution of 1080P, the decoding time delay is 5ms, the coding time delay is 15ms, and the total time delay is 20 ms; for a resolution of 1440P or 2560P, the decoding delay and the encoding delay are longer.

Resolution ratio	Decoding delay (ms)	Coding delay (ms)	Total time delay (ms)
				720P	5	10	15
1080P	5	15	20
				1440P	15	20	35
2560P	30	30	60

TABLE 2

In a third mode, the action of superimposing the image material is implemented in the remote device. Specifically, the large-screen display device transmits image materials for superimposition to the remote device in addition to the encoded video frames. After the remote equipment receives the video frame and the image material, the video frame is decoded firstly, then the image material is superposed into the decoded video frame, and finally the video frame and the image material are displayed to a user. It can be understood that, since the large-screen display device needs to send the image material to the remote device separately, additional network resources are consumed, and the action of adding the image material also increases the power consumption of the remote device, resulting in heat generation and reduced endurance of the remote device.

In order to solve the problems of various ways of superimposing image materials on a video, embodiments of the present application provide a video superimposing method. The method can be applied to a display device equipped with a video capture device, such as a large screen display device. The video capture device may be included in the display device as shown in fig. 1 and be a part of the display device, or may be formed as two independent devices with the display device as shown in fig. 2, and establish a communication connection in a wired or wireless manner.

The following specifically explains various embodiments of the video overlay method provided in the embodiments of the present application.

Fig. 4 is a schematic diagram of a data transmission architecture of a video capture device and a display device provided in an embodiment of the present application.

As shown in fig. 4, the video capture device may include, for example, an optical lens 201, an image sensor (image sensor)202, at least one processor 203, an interface module 204, and a memory 205. The optical lens 201 is composed of a plurality of lenses, and is used for collecting a light source in a field of view and projecting the light source to the image sensor 202. When a user of display device 100 (i.e., a local user) is engaged in a video call with a user of a remote device (i.e., a remote user), the local user will typically be positioned in front of video capture device 200 such that the local user is within the field of view of optical lens 201. The image sensor 202 includes millions, tens of millions, or even hundreds of millions of light-sensitive pixels through which the image sensor 202 can convert received light source signals into electrical signals that are sent to the processor 203. The processor 203 may include, for example, an Image Signal Processor (ISP) and a Digital Signal Processor (DSP); the image signal processor ISP can sample and process the electrical signal from the image sensor 202 to obtain a video frame, and the image signal processor ISP can perform preprocessing on the video frame, such as noise removal, white balance correction, color correction, gamma correction, color space conversion, and the like. The digital signal sensor DSP may be configured to encode the video frame, and send the encoded video frame to the display device 100 through the interface module 204 after performing post-processing such as cropping and scaling.

As shown in fig. 4, the display device 100 may include, for example, a memory 103, a processor 104, an interface module 105, and a network module 106. The processor 103 may be configured to store image materials, and the processor 104 may receive the video frames sent by the video capture device 200 through the interface module 105, and send the received video frames to the remote device through the network module 106.

In the embodiment of the present application, the interface module 105 may be, for example, a Universal Serial Bus (USB) interface. Physically, the interface module 105 may be, for example: USB-Type-A interface, USB-Type-B interface, USB-Type-C interface, USB-Micro-B interface and the like. In the interface standard, the interface module 105 may be, for example, a USB1.0 interface, a USB2.0 interface, a USB3.0 interface, or the like. The physical form and the interface standard of the interface module 105 are not specifically limited in the embodiments of the present application.

As shown in fig. 4, the video capture device 200 and the display device 100 may establish a data connection based on a physical link through a USB transmission line 400 connecting the two-terminal interface modules. Moreover, based on the USB connection, the embodiment of the present application establishes two virtual transmission protocol channels between the video capture device 200 and the display device 100, for example, a first transmission protocol channel 401 and a second transmission protocol channel 402, for transmitting different types of data.

The first transport protocol channel 401 may be, for example, a Remote Network Driver Interface Specification (RNDIS) channel, and is configured to transmit non-video data. RNDIS is a communication protocol that enables Ethernet connections based on USB, such as TCP/IP protocol suite (TCP/IP protocol) connections. Based on a TCP/IP protocol, the RNDIS can use a socket programming interface, and by utilizing the advantages of light weight, good portability and the like of the socket programming interface, reliable, stable and efficient transmission capability can be provided between the video acquisition equipment 200 and the display equipment 100, so that the RNDIS is suitable for transmission of non-video data.

The second transport protocol channel 402 may be, for example, a USB Video Class (UVC) channel for transmitting video data. UVC is a video transmission protocol standard that enables the video capture device 200 to connect to the display device 100 and perform video transmission without installing any driver.

Fig. 5 is a flowchart of a video overlay method provided in an embodiment of the present application, which may be implemented based on the data transmission architecture shown in fig. 4. As shown in fig. 5, the method may include the following steps S101 to S105:

step S101, the display device sends the image material to the video capture device in response to a first user instruction.

Step S101 may occur after the display device establishes a video call connection with the remote device. In a specific implementation, as shown in fig. 6, after the display device establishes a video call connection with the remote device, the display device may display a video image acquired by the remote device in a display screen of the display device, and in addition, the display device may display the video image acquired by the video acquisition device in a partial area (for example, an upper left corner, an upper right corner, and the like) of the display screen in a picture-in-picture 108 manner. In the embodiment of the present application, the size of the pip displayed by the display device may be determined according to the scale of the display screen of the remote device, for example: if the ratio of the width to the height of the display screen of the remote device is 9: 18.5, the ratio of the width to the height of the pip may be 9: 18.5, if the ratio of the width to the height of the display screen of the remote device is 16: 9, the ratio of the width to the height of the pip may be 16: 9. it is understood that the size of the pip may also be determined according to other manners, such as a preset size of a display device or a size set by a local user, which is not specifically limited in this embodiment of the present application.

In addition, the display screen can display at least one available image material or thumbnails of the available image materials in a preset area for the user to browse and select; alternatively, after the display device establishes a video call connection with the remote device, the user may call up at least one available image material or thumbnails of the available image materials through some operation. At this time, if the user wants to superimpose image materials on the video frame acquired by the video acquisition device, the user may select one image material to be superimposed from the available image materials in a manner of remote controller operation, space gesture operation, voice control, or the like, and trigger a first user instruction for instructing the display device to transmit the image material to the video acquisition device, so that the display device transmits the image material selected by the user to the video acquisition device in response to the first user instruction.

Illustratively, as shown in fig. 6, when the user operates the display apparatus using the remote controller 501, the user can cause the display apparatus to switch to display different thumbnails 107 and select a thumbnail 107 of an image material to be superimposed by operating a key of the remote controller 501; the user may then click on the OK key of remote control 501 to trigger a first user instruction to cause the display device to send image material to the video capture device.

Illustratively, as shown in FIG. 7, when the user operates the display device using the space gesture 502, the user may cause the display device to switch to display different thumbnails 107 by a space swipe operation (e.g., space left swipe, space right swipe), and select a thumbnail 107 of an image material to be superimposed by a space click operation (e.g., space single click operation); the user may then trigger a first user instruction by a particular space click (e.g., a space double click operation) or space swipe operation (e.g., selecting the thumbnail 107 to swipe up, etc.), causing the display device to send the user-selected image material to the video capture device.

Illustratively, as shown in fig. 8, when the user controls the display apparatus using voice, the display apparatus may transmit image material directly in response to the user's voice instruction. For example, the available image material includes a plurality of frames, and when the user says "add a first frame", the display device may send the image material corresponding to the first frame to the video capture device. It should be noted that, the display device may have a voice recognition capability for recognizing a voice command of a user, and the voice recognition capability may be implemented by an artificial intelligence AI module built in the display device, an AI speaker docked with the display device, or a cloud AI server. The AI module may include a hardware module, such as a neural-Network Processing Unit (NPU), or a software module, such as a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), and the like. It should be noted that, when the user is using the display device for video communication, the AI module may be woken up by a specific voice wake-up word or a specific action to perform voice control.

In the embodiment of the present application, the image material may adopt a plurality of color formats, for example: the ARGB color format (a color format composed of a transparent channel Alpha, a Red channel Red, a Green channel Green, and a Blue channel Blue, and commonly used for a storage structure of a 32-bit bitmap), which may specifically include: ARGB8888 (8 bits are used to record A, R, G, B data for each pixel, respectively), ARGB4444 (4 bits are used to record A, R, G, B data for each pixel, respectively), and so on. Image material may also be in a variety of file formats, such as: bitmap BMP, Graphics Interchange Format (GIF), Portable Network Graphics (PNG), and the like.

In the embodiment of the application, the image materials can be stored locally in the display device in advance, and when the display device establishes video call connection with the remote device, the display device directly displays the thumbnails of the locally stored image materials to a user. The image materials can also be stored in the cloud server, and when the display device is connected with the remote device in a video call mode, the display device can acquire the image materials from the cloud server and generate thumbnails of the image materials to be displayed to the user. The image materials can be stored in the local storage of the display device in advance, the other part of the image materials are stored in the cloud server, when the display device is connected with the remote device in a video call mode, the display device firstly displays thumbnails of the image materials stored in the local storage to a user, and if the user needs to browse more image materials, the display device acquires the image materials from the cloud server and generates thumbnails of the image materials to be displayed to the user.

In the embodiment of the application, when the RNDIS channel and the UVC channel are established by the display device and the video capture device through the USB connection, the display device and the video capture device can transmit video frames by using the advantage of the UVC channel in the aspect of transmitting video data, and transmit image materials and other data other than video frames, such as dimension data (data for positioning and processing abnormality or error by a user) by using the advantage of reliability, stability and high efficiency of the RNDIS channel, thereby ensuring that the transmission of the image materials is efficiently transmitted without affecting the transmission of the video frames.

Step S102, after the video acquisition equipment receives the image material, the image material is overlaid to the original video frame to obtain an overlaid video frame.

In the embodiment of the present application, the image material may be a single-frame material or a multi-frame material. Wherein, the single frame material refers to a material only containing one frame of image, such as a photo frame material; the multi-frame material refers to a material including multi-frame images, such as an expression and an animation material.

Generally, after the display device establishes a video call connection with a remote device, the video capture device continuously generates video frames according to a preset video resolution and frame rate. If the video frame rate is 30Hz, it means that the video capture device will generate 30 video frames per second, and if the video frame rate is 60Hz, it means that the video capture device will generate 60 video frames per second. In order to distinguish the video frames before encoding from the video frames after encoding, the embodiment of the present application refers to the video frames before encoding (i.e., not encoded) as original video frames and refers to the video frames after encoding as encoded video frames.

In specific implementation, for different materials, the way in which the video capture device overlays the materials onto the original video frame may be different. The following specifically describes a manner in which the video capture device superimposes the pixel material onto the original video frame, taking a photo frame material (generally, a single frame material) and an expression material (generally, a multi-frame material) as examples.

Fig. 9 is a schematic diagram of a video capture device overlaying picture frame material onto an original video frame. As shown in fig. 9, assuming that the video capture device receives the picture frame material at time t0, the video capture device will superimpose the picture frame material on each of the original video frames captured after time t0 until the video call is over or the video capture device receives an indication message sent by the display device to instruct it to stop superimposing the image material.

The expression material is generally composed of a plurality of frames of images, and the images are arranged according to the sequence and can show a dynamic effect when being played. The file format of common emoticon materials is, for example, gif and the like. Generally, when a user wants to send an expression to the other party during a video call, the expression material usually only needs to be played once and disappears immediately without being displayed continuously, which is the difference between the display modes of the expression material and the picture frame material.

Fig. 10 is a schematic diagram of a video capture device overlaying emoji material onto an original video frame. As shown in fig. 10, assuming that the expression material includes 30 frames of images E1-E30, when the video capture device receives the expression material at time t0, the video capture device may superimpose the image E1 into the first original video frame P1 captured after time t0, superimpose the image E2 into the second original video frame P2 captured after time t0, superimpose the image E3 into the third original video frame P3 captured after time t0, and so on until the image E30 is superimposed into the second original video frame P30 captured at time t 0. Thus, after the video capture device receives the emoji material, a total of 30 original video frames (P1-P30) are overlaid with the emoji material, forming 30 overlaid video frames. The 30 superposed video frames can play a complete expression animation, and if the frame rate of the video call is 30Hz, the playing time of the expression animation is 1 second. It should be noted that the number of video frames is 30 frames here for illustration only, and the application does not limit the number of image frames on which expressive material is superimposed. For example: the expression material of 30 frames can be superposed into a plurality of image frames according to a time sequence, and the same expression material image can be superposed into one or more image frames; for example, the image E1 may be superimposed on the original video frames P1, P2, or may be superimposed on more original video frames.

It can be understood that, in a video call, a user may select a plurality of image materials in the display device sequentially, in which case, the video capture device also needs to superimpose different image materials in the original video frame in different time periods according to the selection of the user. To achieve this, the video capture device detects in real time whether a new image material has been received from the display device, in the event that a previous image material has been received; if the new image material is not received, the video acquisition equipment can continuously superimpose the prior image material in the original video frame; if a new image material is received, the video capture device may stop superimposing the previous image material on the original video frame and instead superimpose the new image material on the original video frame.

And step S103, the video acquisition equipment encodes the overlapped video frame to obtain an encoded video frame.

Fig. 11 is a schematic diagram of deriving an encoded video frame from an original video frame according to an embodiment of the present application. As shown in fig. 11, assuming that the video capture device receives the image material at time t0, for the original video frames generated before time t0, the video capture device will perform preprocessing on them and then directly encode them to obtain encoded video frames; for the original video frames generated after the time t0, the video acquisition device preprocesses the original video frames, then superimposes the pixel materials onto the original video frames to obtain superimposed video frames, and finally, encodes the superimposed video frames to obtain encoded video frames.

The embodiment of the present application may use multiple encoding modes to obtain the encoded video frame, for example: advanced video coding h.264, or high efficiency video coding (also referred to as h.265) (high efficiency video coding, HEVC), etc., which are not described herein again.

In addition, it should be added that, in the embodiment of the present application, before encoding the superimposed video frame, the superimposed video frame may be first subjected to post-processing. Post-processing may include, for example, cropping or scaling the superimposed video frame, and the parameters of post-processing may be determined based on the resolution of the image material or the resolution of the display screen of the remote device.

For example, as shown in fig. 12, when the image material is a photo frame having a size of 1920 × 1080 pixels, if the size of the superimposed video frame is larger than 1920 × 1080 pixels, a part of the content of the superimposed video frame may be located outside the photo frame, in which case the video capture device may cut out the part of the superimposed video frame located outside the photo frame, resulting in the superimposed video frame having a size of 1920 × 1080 pixels and the photo frame located at the peripheral edge of the content.

Illustratively, as shown in fig. 13, when the size of the overlay video frame is 3840 × 2160 pixels (2160P), if the preset video call quality is 1080P, the video capture device may scale the overlay video frame, down scaling the overlay video frame from 2160P to 1080P. The reduced overlapped video frame and the coded video frame generated by coding the overlapped video frame have smaller byte quantity, can reduce the occupancy rate of USB bandwidth resources between the display equipment and the video acquisition equipment, and can reduce the occupancy rate of network resources when the coded video frame is transmitted from the display equipment to the remote equipment in the subsequent process.

And step S104, the video acquisition equipment sends the encoded video frame to the display equipment.

In specific implementation, the video acquisition device can send the encoded video frame to the display device through the UVC channel, so that the advantage of the UVC channel in the aspect of transmitting video data is utilized, the high-efficiency transmission of the encoded video frame can be realized, and the image material transmission of the RNDIS channel is not influenced.

Step S105, after receiving the encoded video frame, the display device sends the encoded video frame to the remote device.

After receiving the encoded video frame, the remote device may decode the encoded video frame, render a video image containing an image material, and display the video image on a display screen of the remote device. The remote user is able to see the image material added by the local user during the video call.

As further shown in fig. 14, when the display screen displays the video image captured by the video capture device in the pip 108, if the display device receives the encoded video frame containing the image material 102, the display device may decode and render the encoded video frame, in addition to transmitting the encoded video frame to the remote device 300, and display the decoded video frame in the pip 108, so that the local user can see the display effect of the image material 102 selected by the local user after being superimposed, and the local user can decide whether to continue to superimpose the image material 102 or replace another image material according to the display effect.

According to the method provided by the embodiment of the application, the superposition process of the image materials is completed in the video acquisition equipment, so that the display equipment can directly send the coded video frame to the far-end equipment, and the coded video frame does not need to be subjected to any operation related to superposition of the image materials, such as: decoding, encoding and the like, and the time of decoding and encoding is saved, so that the video call time delay of the display equipment and the far-end equipment is reduced, and the user experience is improved. Taking table 2 as an example, for the video quality with the H264 format and the resolution of 720P, the method of the embodiment of the present application can reduce the delay of about 15 ms; for the video quality with H264 format and 1080P resolution, the method of the embodiment of the application can reduce the time delay of about 20 ms; for the video quality with H264 format and 1440P resolution, the method of the embodiment of the application can reduce the time delay of about 35 ms; for video quality with the H264 format and the resolution of 2560P, the method of the embodiment of the application can reduce the time delay of about 60 ms. It can be seen that the higher the video quality is, the more remarkable the effect of reducing the video call delay by using the method of the embodiment of the present application is. In addition, the image materials in the embodiment of the application can be stored in the display equipment, so that the display equipment is convenient to flexibly replace, and the memory space of the video acquisition equipment is not occupied. In addition, the display device in the embodiment of the application does not need to send the pixel materials to the remote device separately, so that additional network resource consumption is not generated.

It should be added that in some video call scenarios, when a user superimposes a picture frame in a video and continues for a while, it is sometimes desirable to cancel the superimposition of the picture frame. At this time, the user may trigger a second user instruction to the display device through remote control operation, air gesture operation, voice control, or the like. The display device may send an indication message to the video capture device over the RNDIS channel in response to the first user instruction, the indication message instructing the video capture device to stop overlaying the image material onto the original video frame.

Illustratively, as shown in fig. 15, when the user operates the display apparatus using the remote controller 501, the user can operate the direction key of the remote controller to select the "close picture frame" icon 109 in the display screen; the user may then click on the remote control's OK button to trigger a second user instruction that causes the display device to send an indication message to the video capture device.

Illustratively, as shown in FIG. 16, when the user operates the display device using the blank gesture 502, the user may select the "close photo frame" icon 109 in the display screen by a blank swipe operation (e.g., blank left swipe, right swipe); the user may then trigger a second user instruction by a particular space-click (e.g., a space-double-click operation) or space-sliding operation (e.g., a selection of a thumbnail to slide up, etc.), causing the display device to send an indication message to the video capture device.

For example, as shown in fig. 17, when the user controls the display device using voice, the display device may directly transmit an instruction message to the video capture device in response to the voice instruction of the user. For example: when the user says "close the photo frame", the display device sends an indication message to the video capture device.

Therefore, after the user overlaps the image materials in the video, the user can cancel the overlapping of the image materials at any time, the method is flexible and convenient, and the user experience is improved.

In some embodiments, after the local user selects an image material in the display device and triggers the first user instruction, the display device may also send the image material to the remote device, so that the remote device may also superimpose the image material in the video image it captures and then send to the display device for display. Thus, as shown in fig. 18, both users can see the video image of the other party on which the image material is superimposed.

As an alternative implementation, as shown in fig. 19, after the local user selects an image material in the display device and triggers the first user instruction, the display device may pop up a dialog box on the display screen asking the user whether to recommend the image material to the remote user, as shown in fig. 19 (a). The user can select the options of 'yes' or 'no' through modes such as remote controller operation, air gesture operation or voice control; if the user selects 'no', the dialog box is closed, and the display equipment does not recommend image materials to the remote user; if the user selects "yes," the dialog box closes and the display device sends a message to the remote device recommending image material, which may contain, for example, thumbnails of the image material. The remote apparatus, after receiving the message of recommending image material from the display apparatus, may display a dialog box with the thumbnail of the image and ask the remote user whether to experience the image material through the dialog box, as shown in fig. 19 (b); if the user selects 'do not experience', the remote equipment closes the dialog box and does not superimpose image materials in the acquired video images; if the user selects 'experience once', the remote equipment closes the dialog box and requests the display equipment for the file of the image material, and after receiving the image material file sent by the display equipment, the remote equipment overlays the image material into the collected video image.

As another alternative implementation, after the local user selects an image material in the display device and triggers the first user instruction, the display device may also directly send a message recommending the image material to the remote device without the user performing other operations.

It is understood that the above-mentioned implementation manners for enabling both users to see the video image superimposed with the image material of the other user are only a part of the implementation manners shown in the embodiments of the present application, and not all of the implementation manners, and the design and concept that can be applied herein do not exceed the protection scope of the embodiments of the present application.

The method of the embodiment of the application also provides a self-adaptive bandwidth adjustment mechanism, which can reasonably allocate the USB bandwidth resources to the first transmission protocol channel and the second transmission protocol channel and improve the utilization rate of the USB bandwidth.

In general, the bandwidth resources of USB are limited, for example: the theoretical maximum transmission bandwidth of the USB2.0 protocol is 60MB/s, but in practical application, the maximum transmission bandwidth which can be actually achieved is only 30MB/s under the influence of a data transmission protocol and a coding mode; the USB3.0 protocol can achieve a higher transmission bandwidth, but is affected by factors such as the read/write performance of the memories of the video capture device and the display device, and the maximum bandwidth that can be actually achieved is also limited.

Generally, based on the limited bandwidth resources of the USB protocol, the display device may allocate a portion of the bandwidth resources to the first transport protocol channel and another portion of the bandwidth resources to the second transport protocol channel. Taking the maximum transmission bandwidth of 30MB/s as an example, when the video call quality is 1080P @30Hz, the display device may allocate a bandwidth resource of 4MB/s for the second transport protocol channel; when the video call quality is 1080P @60Hz, the display device can allocate 10MB/s of bandwidth resources for the second transport protocol channel; when the video call quality is 1440P @30Hz, the display device can allocate a bandwidth resource of 15MB/s for the second transmission protocol channel; when the video call quality is 2560P @30Hz, the display device can allocate 25MB/s of bandwidth resources for the second transport protocol channel; the remaining bandwidth resources may then be allocated to the first transport protocol channel. However, when the display device does not send image material to the video capture device, there is only a small amount of non-video data transmission (e.g., dimension data) or no data transmission on the first transport protocol channel, which results in a waste of USB bandwidth resources. In addition, when the video capture device sends the image material to the display device, if the bandwidth resource allocated to the first transmission protocol channel is too small, the transmission time of the image material is too long, and the display of the image material is delayed obviously.

In order to solve the above problem, the adaptive bandwidth adjustment mechanism provided in the embodiment of the present application may include steps S201 to S205 as shown in fig. 20.

In step S201, the display device may detect the data traffic of the first transport protocol channel in real time.

In step S202, the display device determines available bandwidth resources of the first transport protocol channel and the second transport protocol channel according to the detected data traffic and the total bandwidth of the USB (i.e. the maximum transmission bandwidth that can be actually achieved).

As an alternative implementation manner, when the first transport protocol channel is the RNDIS channel, the display device may perform header head analysis on data traffic on the USB link, for example, analyze a protocol version number in the header, so as to identify traffic of a TCP/IP protocol, where the traffic of the TCP/IP protocol is data traffic of the first transport protocol channel, and other unidentified traffic is data traffic of the second transport protocol channel. Header analysis may be performed at a driver layer of the display device, for example, when a packet comes, the driver layer may attempt to perform protocol version number resolution on the header of the packet, and if a protocol version number, for example, 1000 of IPv4, can be resolved, it indicates that the packet belongs to the traffic of the TCP/IP protocol transmitted in the first transport protocol channel.

As an alternative implementation, a traffic model may be constructed to identify data traffic of the first transport protocol channel and the second transport protocol channel. The flow model may be a neural network model, such as a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), a Support Vector Machine (SVM), and the like. For example, a convolutional neural network CNN for traffic identification may be constructed in the display device, and trained using a large number of labeled data pairs (a, Z) to have the capability of identifying unknown traffic types. A in the data pair (A, Z) represents a known type of traffic segment, Z is a labeling result of the traffic segment A, and during training, A is used as an input A of the convolutional neural network CNN, and Z is used as an input of a neural network model to train internal parameters of the convolutional neural network CNN. It should be added that, in order to obtain the normalized classification result by the convolutional neural network CNN, the convolutional neural network CNN may further be connected to the pooling layer and the normalized index function layer softmax, etc. at the output end, these connection modes belong to general modes for constructing a neural network model, and are not described herein again in this embodiment of the present application.

Further, when detecting that the data traffic of the first transport protocol channel is small, the display device may allocate most of the bandwidth resources of the USB to the first transport protocol channel to ensure smooth transmission of the video frames. For example, assuming that the maximum transmission bandwidth actually achievable by the USB link is 30MB/s, when the data traffic of the first transport protocol channel is less than 256KB/s, the display device may allocate 1MB/s of available bandwidth for the first transport protocol channel and allocate the remaining 29MB/s of available bandwidth for the second transport protocol channel; when the data traffic of the first transport protocol channel is increased rapidly in a short time, for example, 10MB/s is reached, it indicates that the first transport protocol channel is transmitting image materials, the display device may allocate an available bandwidth of 15MB/s for the first transport protocol channel, and allocate the remaining available bandwidth of 15MB/s for the second transport protocol channel, so as to shorten the transmission duration of the image materials and reduce the response delay of the display device.

In step S203, the display device may also dynamically determine the encoding parameters of the superimposed video frame according to the available bandwidth resources of the second transmission protocol channel, so as to change the data traffic consumed by the encoded video frame during transmission. For example, when the available bandwidth of the second transport protocol channel is 29MB/s, the display device determines that the encoding parameter of the superimposed video frame is 1080P @60Hz, or even higher, such as 1440P @30Hz, 2560P @30Hz, or the like; when the available bandwidth of the second transport protocol channel is 15MB/s, the display device determines that the encoding parameter of the overlaid video frame is 1080P @60Hz, or even lower, such as 1080P @30Hz, 720P @30Hz, or the like.

In step S204, after determining the encoding parameters, the display device may send the encoding parameters to the video capture device through the first transmission protocol channel.

In step S205, the video capture device encodes the superimposed video frame according to the encoding parameters to obtain an encoded video frame.

It can be understood that the byte sizes of the encoded video frames obtained by using different encoding parameters are different, the data traffic generated during transmission is also different, and the requirement for the available bandwidth of the second transport protocol channel is also different. Therefore, according to the size of the available bandwidth of the second transmission protocol channel, the encoding parameters are dynamically adjusted, so that the available bandwidth of the second transmission protocol channel can always meet the requirement of transmitting encoded video frames, the utilization rate of bandwidth resources is improved, and video conversation is guaranteed not to be blocked.

In the embodiments provided in the present application, the aspects of the video overlay method provided in the present application are introduced from the perspective of the display device itself, the video capture device, and the interaction between the display device and the video capture device. It is understood that the display device and the video capture device include corresponding hardware structures and/or software modules for performing the respective functions in order to realize the functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the exemplary method steps described in connection with the embodiments disclosed herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The embodiment of the present application also provides an electronic device, which may include the display device 100 and the video capture device 200 shown in fig. 4, where the display device 100 is configured to send an image material to the video capture device 200 in response to a first user instruction; the video collecting device 200 is configured to superimpose the received image material onto an original video frame to obtain a superimposed video frame, where the original video frame includes an uncoded video frame collected by the video collecting device 200; the video capture device 200 is further configured to encode the superimposed video frame to obtain an encoded video frame; video capture device 200 is also operative to transmit the encoded video frames to display device 100.

The embodiment of the present application also provides another electronic device, which may be, for example, the display device 100 shown in fig. 4. The interface module 105 is configured to implement data transmission with the video capture device 200, such as sending image material to the video capture device 200 and receiving encoded video frames sent by the video capture device 200. The memory 103 is used to store image material and to store computer program code, which includes computer instructions; the computer instructions, when executed by the processor 104, cause the display device to perform the methods involved in the embodiments described above, such as: responding to a first user instruction, and sending an image material to video acquisition equipment; receiving an encoded video frame sent by video acquisition equipment, wherein the encoded video frame is obtained by encoding an overlapped video frame by the video acquisition equipment, the overlapped video frame is obtained by overlapping a pixel material into an original video frame by the video acquisition equipment, and the original video frame comprises an uncoded video frame acquired by the video acquisition equipment.

The embodiment of the present application further provides a video capture device, such as the video capture device 200 shown in fig. 4. The interface module 204 is configured to implement data transmission with the display apparatus 100, for example, receive image material sent by the display apparatus 100, and send encoded video frames to the display apparatus 100. The memory 205 is used to store computer program code, which includes computer instructions; the computer instructions, when executed by the processor 203, cause the video capture device 200 to perform the methods involved in the embodiments described above, such as: receiving an image material sent by display equipment; overlapping the image pixels to an original video frame to obtain an overlapped video frame, wherein the original video frame comprises an uncoded video frame acquired by video acquisition equipment; coding the overlapped video frame to obtain a coded video frame; the encoded video frames are transmitted to a display device.

The embodiment of the application also provides a video superposition system, which comprises display equipment and video acquisition equipment. The display equipment is used for responding to a first user instruction and sending image materials to the video acquisition equipment; the video acquisition equipment is used for overlapping the image material to the original video frame after receiving the image material to obtain an overlapped video frame; the video acquisition equipment is also used for encoding the superposed video frames to obtain encoded video frames; the video acquisition equipment is also used for sending the encoded video frame to the display equipment; the display device is further configured to send the encoded video frame to the remote device after receiving the encoded video frame.

The above embodiments are only for illustrating the embodiments of the present invention and are not to be construed as limiting the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the embodiments of the present invention shall be included in the scope of the present invention.

Claims

1. A method of video overlay, comprising:

the display equipment responds to a first user instruction and sends image materials to the video acquisition equipment;

the video acquisition equipment superimposes the received image material onto an original video frame to obtain a superimposed video frame, wherein the original video frame comprises an uncoded video frame acquired by the video acquisition equipment;

the video acquisition equipment encodes the superposed video frame to obtain an encoded video frame;

and the video acquisition equipment sends the coded video frame to the display equipment.

2. The method of claim 1, wherein prior to the display device sending image material to the video capture device in response to the first user instruction, further comprising: and the display equipment establishes video connection with the remote equipment.

3. The method of claim 1, further comprising: the display device sends the encoded video frames to a remote device.

4. The method according to any of claims 1-3, wherein the display device and the video capture device establish a data connection over a physical link for carrying data transmission between the display device and the video capture device based on the first transport protocol channel and/or the second transport protocol channel.

5. The method of claim 4, wherein the display device, in response to the first user instruction, sends image material to the video capture device, comprising: the display device sends the image material to the video capture device over the first transport protocol channel in response to the first user instruction.

6. The method of claim 4, wherein the video capture device sending the encoded video frames to the display device comprises: and the video acquisition equipment sends the encoded video frame to the display equipment through the second transmission protocol channel.

7. The method of claim 4, further comprising:

and the display equipment responds to a second user instruction, and sends a first indication message to the video acquisition equipment through the first transmission protocol channel, wherein the first indication message is used for indicating the video acquisition equipment to stop overlaying the image material to the original video frame.

8. The method of claim 4, further comprising:

the display equipment detects the data flow of the first transmission protocol channel in real time;

the display equipment determines available bandwidth resources of the second transmission protocol channel according to the data traffic and the total bandwidth of the physical link;

the display equipment determines the coding parameters of the superposed video frame according to the available bandwidth resources;

and the display equipment sends the coding parameters to the video acquisition equipment through the first transmission protocol channel.

9. The method of claim 8, wherein the video capture device encodes the overlay video frame to obtain an encoded video frame, comprising: and the video acquisition equipment encodes the superposed video frame according to the encoding parameters to obtain the encoded video frame.

10. The method of claim 4,

the physical link is a Universal Serial Bus (USB) link;

the first transmission protocol channel is a Remote Network Driver Interface Specification (RNDIS) channel;

the second transport protocol channel is a USB video specification UVC channel.

11. An electronic device, comprising: the system comprises a display device and a video acquisition device;

the display equipment is used for responding to a first user instruction and sending image materials to the video acquisition equipment;

the video acquisition equipment is used for overlaying the received image material to an original video frame to obtain an overlaid video frame, and the original video frame comprises an uncoded video frame acquired by the video acquisition equipment;

the video acquisition equipment is also used for encoding the superposed video frame to obtain an encoded video frame;

the video acquisition device is further configured to send the encoded video frame to the display device.

12. The electronic device of claim 11,

the display device is also used for establishing video connection with the remote device.

13. The electronic device of claim 11,

the display device is further configured to send the encoded video frame to a remote device.

14. The electronic device according to any of claims 11-13, wherein the display device and the video capture device establish a data connection via a physical link for carrying data transmission between the display device and the video capture device based on the first transport protocol channel and/or the second transport protocol channel.

15. The electronic device of claim 14,

the display device is specifically configured to send the image material to the video capture device through the first transport protocol channel in response to the first user instruction.

16. The electronic device of claim 14,

the video capture device is specifically configured to send the encoded video frame to the display device through the second transport protocol channel.

17. The electronic device of claim 14,

the display device is further configured to send, in response to a second user instruction, a first indication message to the video capture device through the first transport protocol channel, where the first indication message is used to instruct the video capture device to stop overlaying the image material onto the original video frame.

18. The electronic device of claim 14,

the display device is also used for detecting the data flow of the first transmission protocol channel in real time;

the display device is further configured to determine available bandwidth resources of the second transport protocol channel according to the data traffic and the total bandwidth of the physical link;

the display device is further configured to determine encoding parameters of the overlay video frame according to the available bandwidth resources;

the display device is further configured to send the encoding parameter to the video capture device through the first transport protocol channel.

19. The electronic device of claim 18,

the video acquisition equipment is further configured to encode the overlay video frame according to the encoding parameters to obtain the encoded video frame.

20. The electronic device of claim 14,

the physical link is a Universal Serial Bus (USB) link;

the second transport protocol channel is a USB video specification UVC channel.

21. An electronic device, comprising: a memory and a processor; the memory stores image material and computer instructions that, when executed by the processor, cause the electronic device to perform the steps of:

responding to a first user instruction, and sending an image material to video acquisition equipment;

receiving a coded video frame sent by the video acquisition equipment, wherein the coded video frame is obtained by coding an overlapped video frame by the video acquisition equipment, the overlapped video frame is obtained by overlapping the image material to an original video frame by the video acquisition equipment, and the original video frame comprises an uncoded video frame acquired by the video acquisition equipment.

22. The electronic device of claim 21, wherein the electronic device further performs:

establishing a video connection with a remote device prior to sending image material to the video capture device in response to the first user instruction.

23. The electronic device of claim 21, wherein the electronic device further performs:

and transmitting the encoded video frame to a remote device.

24. The electronic device according to any of claims 21-23, wherein the electronic device and the video capture device establish a data connection via a physical link, the physical link being configured to carry data transmissions between the electronic device and the video capture device based on the first transport protocol channel and/or the second transport protocol channel.

25. The electronic device of claim 24, wherein sending image material to a video capture device in response to the first user instruction comprises:

in response to the first user instruction, sending the image material to the video capture device over the first transport protocol channel.

26. The electronic device of claim 24, wherein receiving encoded video frames transmitted by the video capture device comprises:

and receiving the coded video frame sent by the video acquisition equipment through the second transmission protocol channel.

27. The electronic device of claim 24, wherein the electronic device further performs:

and responding to a second user instruction, and sending a first indication message to the video acquisition equipment through the first transmission protocol channel, wherein the first indication message is used for indicating the video acquisition equipment to stop overlaying the image material to the original video frame.

28. The electronic device of claim 24, wherein the electronic device further performs:

detecting the data flow of the first transmission protocol channel in real time;

determining available bandwidth resources of the second transmission protocol channel according to the data traffic and the total bandwidth of the physical link;

determining encoding parameters of the superimposed video frame according to the available bandwidth resources;

and sending the coding parameters to the video acquisition equipment through the first transmission protocol channel, so that the video acquisition equipment encodes the superposed video frame according to the coding parameters to obtain the coded video frame.

29. The electronic device of claim 24,

the physical link is a Universal Serial Bus (USB) link;

the second transport protocol channel is a USB video specification UVC channel.