WO2016150317A1

WO2016150317A1 - Method, apparatus and system for synthesizing live video

Info

Publication number: WO2016150317A1
Application number: PCT/CN2016/076374
Authority: WO
Inventors: 晏营; 袁英灿; 吴易明
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2015-03-23
Filing date: 2016-03-15
Publication date: 2016-09-29
Also published as: CN106162221A

Abstract

The present application relates to a method, apparatus and system for synthesizing a live video. The method comprises: collecting a second video stream by means of a video collection device when a first video stream is played; transmitting the second video stream to a server, so that the server merges with the playing first video stream by using the second video stream and forms a live third video stream; receiving the third video stream sent by the server; parsing the third video stream, forming a playing picture of the third video stream, and playing the playing picture of the third video stream. In the present application, an interactive video of a user can be added to a currently playing picture to form a live picture, the effect of synthesizing the video stream in the server is even better, and user experience is good.

Description

Method, device and system for synthesizing live video

The present application claims priority to Chinese Patent Application Serial No. No. No. No. No. No. No. No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No

Technical field

The present application relates to the field of video processing technologies, and in particular, to a method, device, and system for synthesizing live video.

Background technique

The existing live channel is a live broadcast directly formed by a single video stream, and the user can directly watch the live video through the network. However, with the development of computers and networks, traditional live broadcasts have been unable to meet the diverse needs of users. People will want more interactive ways in viewing the video playback. For example, in the process of playing a game by a user, some game darlings hope to play a game explanation or add motion guidance to the ongoing game, synthesize a video with the game screen, and form a vivid game strategy, while other users also want to see These game masters are how to operate the game strategy.

The existing video synthesis method is usually a software with a media material editing function installed on the terminal, which combines the captured video, picture and recorded audio into a dynamic video with sound. This kind of video synthesized on a single terminal has no way to directly share it with others, and thus cannot achieve the effect of live video.

Summary of the invention

The purpose of the present application is to provide a method, a device and a system for synthesizing a live video, which can add a user's interactive video to the currently played screen to form a live broadcast screen, and the effect of synthesizing the video stream in the server is better, and the user experience is good.

The present application provides a method for synthesizing a live video, the method comprising:

When the first video stream is played, the second video stream is collected by the video capture device;

Transmitting the second video stream to a server, so that the server merges with the first video stream being played by using the second video stream to form a live third video stream;

Receiving the third video stream sent by the server;

Parsing the third video stream to form a play screen of the third video stream, and playing a play screen of the third video stream.

In another aspect, the present application further provides a method for synthesizing a live video, the method comprising:

Receiving, by the terminal, the second video stream that is transmitted by the terminal, where the second video stream is a video stream that is collected by the terminal through the video collection device;

Merging the second video stream with the first video stream to form a live third video stream;

Transmitting the third video stream to the terminal.

In another aspect, the present application further provides a synthesizing device for a live video, the device comprising:

An acquiring unit, configured to collect a second video stream when playing the first video stream;

a transmitting unit, configured to transmit the second video stream collected by the collecting unit to a server, so that the server merges with the first video stream being played by using the second video stream to form a live broadcast Three video streams;

a receiving unit, configured to receive the third video stream sent by the server;

And a processing unit, configured to parse the third video stream received by the receiving unit, form a play screen of the third video stream, and play a play screen of the third video stream.

a receiving unit, when the terminal plays the first video stream, receiving a second video stream that is transmitted by the terminal, where the second video stream is a video stream that is collected by the terminal through the video collection device;

a processing unit, configured to merge the second video stream received by the receiving unit with the first video stream to form a third video stream that is broadcasted;

And a transmitting unit, configured to transmit the third video stream formed by the processing unit to the terminal.

In another aspect, the present application further provides a system for synthesizing live video, the system comprising: a server and a terminal with a video capture device;

When the terminal plays the first video stream, the terminal collects the second video stream by using the video capture device;

Transmitting, by the terminal, the second video stream to the server;

The server merges with the first video stream being played by using the second video stream to form a third video stream of the live broadcast;

Receiving, by the terminal, the third video stream sent by the server;

The terminal parses the third video stream to form a play screen of the third video stream, and plays a play screen of the third video stream.

The method and device for synthesizing the live video provided by the embodiment of the present application, the video capture device is used to collect the interaction behavior of the user for the currently played picture, and the collected video stream is transmitted to the server, and the user's interactive video can be added to the currently played picture. , forming a live video, good real-time, good user experience, and at the same time because it is in the server The composite video stream can make the live video stream better and the picture clearer.

DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present application. Other drawings may also be obtained from those of ordinary skill in the art in view of the drawings.

FIG. 1 is a schematic diagram of a system for synthesizing live video according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for synthesizing live video on a terminal side according to an embodiment of the present disclosure;

FIG. 3 is a flowchart of a method for synthesizing live video on a server side according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a synthesized live broadcast screen according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a device for synthesizing live video according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a device for synthesizing live video according to an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of the embodiments of the invention, which are generally described and illustrated in the figures herein, may be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of the invention in the claims All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

The method and device for synthesizing live video provided by the embodiment of the present invention are applicable to a terminal with a video capture device and capable of network connection, or to a terminal that can connect to a video capture device and can perform network connection, for example, A terminal device such as a television, a computer, a pad, a mobile phone, etc., which can be connected to a cloud server through a network cable or a wireless network to communicate with the cloud server.

1 is a schematic diagram of a system for synthesizing live video provided by an embodiment of the present application. As shown in FIG. 1, the system includes a terminal 1 and a server 2 with a video capture device 11. The server 2 may be a cloud server, and the terminal 1 and the server 2 are connected through a network. The terminal 1 plays the video stream transmitted by the server, and the user can interact with the playback screen of the terminal 1. The terminal 1 collects the interactive video stream of the user through the video collection device 11 and transmits the video stream to the server 2, after performing video synthesis in the server 2, The synthesized live video stream is played on the terminal 1. Below The method for synthesizing the live video provided by the present application is described in detail in FIG. 2 and FIG. 3 .

2 is a flowchart of a method for synthesizing a live video according to an embodiment of the present application. As shown in FIG. 2, a method for synthesizing a live video according to an embodiment of the present application includes:

S101. When the terminal plays the first video stream, the terminal collects the second video stream by using the video collection device.

The second video stream is a video data stream that is collected by the terminal through the video capture device in real time. For example, the video capture device collects a user's interaction behavior with the play screen of the first video stream to form a second video stream. Of course, it can also be other video content collected by real-time shooting by a device such as a camera.

The video capture device includes a camera or a camera. The camera may be a camera on a mobile terminal such as a mobile phone or a tablet computer connected to the television, or may be a camera of a camera, a video recorder, or the like.

The interaction behavior includes a voice interaction behavior and an action interaction behavior. The collecting, by the video capture device, the interaction behavior of the user on the play screen of the first video stream includes: collecting, by the camera, the action interaction behavior of the user on the play screen of the first video stream; and collecting the device through the microphone The voice interaction data of the user to the first video stream is described.

Alternatively, the interaction behavior includes an action interaction behavior. The collecting, by the video capture device, the interaction behavior of the user on the play screen of the first video stream includes: collecting, by the camera, the action interaction behavior of the user on the played screen.

S102. The second video stream is transmitted to a server, so that the server merges with the first video stream being played by using the second video stream to form a live third video stream.

S103. Receive the third video stream sent by the server.

The third video stream is a video data stream after the video being played in the server is combined with the video captured by the terminal through a camera or the like. After receiving the third video stream, the terminal may form a play screen.

S104. Parse the third video stream, form a play screen of the third video stream, and play a play screen of the third video stream.

After receiving the third video stream, the terminal processes the third video stream according to the existing video codec manner, obtains a play picture of the third video stream, and plays a play picture of the third video stream on the display of the terminal. The playback screen at this time includes the original video screen and the video screen captured by the terminal. The live broadcast screen can be seen immediately in the user's own terminal. When other users choose to watch the video on the network, they can also see the live broadcast screen.

In the embodiment of the present application, since the video synthesis is processed on the cloud server, the user can directly select the interactive live broadcast mode through the own video capture device, and the video can be broadcast live, without It is very simple and convenient to buy professional equipment. Moreover, the video picture synthesized on the cloud server side has higher pixels and better effect.

Optionally, after the playing the first video stream, the method further includes: receiving, by the input control device, an input control operation of the user on the first video stream. Wherein, the input control device comprises a game handle, a keyboard, a mouse or a somatosensory camera.

The terminal processes the received input control operation accordingly. For example, when the user uses the gamepad to move left or right, the terminal can move the video screen being played to the left or right.

Optionally, after receiving the third video stream sent by the server, the method further includes: the terminal storing the third video stream; and parsing the third when receiving the operation of playing the third video stream a video stream, forming a play picture of the third video stream, playing a play picture of the third video stream. In this way, the user can view or order the video on the terminal, and the operation is flexible and convenient.

The user may also choose to store the file formed by the third video stream on a website or in a cloud storage space so that other users can watch or order the video. Of course, the file formed by the third video stream can also be stored on the server.

3 is a flowchart of a method for synthesizing a live video according to an embodiment of the present application. As shown in FIG. 3, a method for synthesizing a live video according to an embodiment of the present application includes:

S201. Receive a second video stream transmitted by the terminal when the terminal plays the first video stream.

The second video stream is a video data stream collected by the terminal through the video capture device in real time. For example, the second video stream is a video stream formed by the user that the terminal collects through the video collection device interacts with the first video stream.

S202. Combine the second video stream with the first video stream to form a live third video stream.

After receiving the second video stream of the terminal in S201, the server combines the second video stream with the first video stream being played on the terminal stored in the server by using a codec technology to form a third video stream.

Specifically, the merging the second video stream with the first video stream that is being played to form a live third video stream may include: embedding a play window in a play screen of the first video stream; a time identifier of the second video stream, the play screen of the second video stream is added to the play window, and the play screen of the play window has the same time identifier as the play screen of the first video stream. Forming the third video stream.

S203. Transmit the third video stream to the terminal.

Optionally, before the transmitting, by the S203, the third video stream to the terminal, the method further includes: compressing a formed play picture of the third video stream, and transmitting the compressed third video stream to the terminal. This way, you can The amount of data transmitted in the network is reduced, and the response speed is fast.

For example, the user is playing a cloud game on a local terminal and has selected to upload the game video stream (ie, the first video stream) in real time over the network. Among them, the cloud game refers to the game in which the video stream of the game is stored on the cloud server. Then, the user can use the local camera to collect video, collect the action and voice of the game, form a second video stream, and upload the second video stream to the cloud server in real time through the network. The cloud server combines the first video stream and the second video stream by using a codec technology to form a live stream of a video stream, that is, a third video stream. The cloud server then transmits the synthesized third video stream to the local terminal. The user can view the synthesized live video program on the screen of the local terminal. At the same time, other users in the network can also select to view the live video program of the user in an on-demand manner. FIG. 4 is a schematic diagram of a synthesized live broadcast screen provided by an embodiment of the present application, where a user can view a synthesized live video program on a screen of a local terminal.

The live video synthesis method of the present application can be used in many application scenarios. The scenario similar to the above-mentioned user playing the game may further include: the teacher may also use the live video synthesis system provided by the application to create a live classroom, etc., and the specific processing process is also Similarly, it will not be repeated here.

The method for synthesizing the live video provided by the embodiment of the present application, the method for forming a live channel by combining the client and the cloud, and transmitting the video stream of the user interaction collected by the terminal to the server, and the effect of synthesizing the video stream in the server is better. The user's interactive video can be added to the currently played screen to form a live broadcast screen, and the user experience is good.

The above is a detailed description of the method for synthesizing the live video provided by the embodiment of the present application. The following describes the synthesizing device for the live video provided by the present application.

FIG. 5 is a schematic diagram of a device for synthesizing a live video according to an embodiment of the present disclosure. As shown in FIG. 5, the device for synthesizing a live video of the present application includes: an acquisition unit 301, a transmission unit 302, a receiving unit 303, and a processing unit 304.

The collecting unit 301 is configured to collect the second video stream when the first video stream is played.

The second video stream is a video stream formed by collecting interaction behaviors of the user on the play screen of the first video stream.

The transmitting unit 302 is configured to transmit the second video stream collected by the collecting unit 301 to the server 2, so that the server 2 merges with the first video stream being played by using the second video stream to form a third live broadcast. Video stream.

The receiving unit 303 is configured to receive the third video stream sent by the server 2.

The processing unit 304 is configured to parse the third video stream received by the receiving unit 303, form a play screen of the third video stream, and play a play screen of the third video stream.

Optionally, the interaction behavior includes a voice interaction behavior and an action interaction behavior. The acquisition unit 301 includes a camera a head and a microphone, the camera collects an action interaction behavior of the user on a play screen of the first video stream, and the microphone collects voice interaction data of the user on the first video stream.

Optionally, the interaction behavior includes an action interaction behavior. The collecting unit 301 includes a camera that collects the action interaction behavior of the user on the playing screen of the first video stream.

Optionally, the synthesizing device of the live video further includes: an input control unit, configured to receive an input control operation of the first video stream by the user. The processing unit 304 processes the input control operations received by the input control unit accordingly. For example, when the user performs an operation of moving left or right using the game pad, the processing unit 304 moves the video screen being played to the left or right.

Optionally, the synthesizing device of the live video further includes: a storage unit, configured to store the third video stream after the receiving unit 303 receives the third video stream sent by the server. When the receiving unit 3303 receives the operation of playing the third video stream, the processing unit 304 parses the third video stream to form a play screen of the third video stream, and plays a play screen of the third video stream.

The functions of the foregoing units may correspond to the processing steps of the method for synthesizing the live video described in detail in FIG. 2, and details are not described herein again.

FIG. 6 is a schematic diagram of a device for synthesizing a live video according to an embodiment of the present disclosure. As shown in FIG. 6 , the device for synthesizing a live video of the present application includes: a receiving unit 401, a processing unit 402, and a transmission unit 403.

The receiving unit 401 is configured to receive a second video stream that is transmitted by the terminal when the terminal plays the first video stream, where the second video stream is a video stream that is collected by the terminal through the video collection device.

The second video stream is a video stream formed by the user that the terminal collects through the video collection device interacts with the first video stream.

The processing unit 402 is configured to merge the second video stream received by the receiving unit 401 with the first video stream to form a live third video stream.

The transmitting unit 403 is configured to transmit the third video stream formed by the processing unit 402 to the terminal.

Optionally, the processing unit 402 specifically includes: an embedded subunit and a merged subunit.

The embedded subunit is configured to embed a play window in a play screen of the first video stream.

The merging subunit is configured to add a play screen of the second video stream to the play window according to a time identifier of the second video stream, and play a play screen of the play window with the first video The play pictures of the stream have the same time stamp, forming the third video stream.

Optionally, the processing unit 402 further includes: a compression subunit. The compression subunit is configured to compress the formed play picture of the third video stream before the transmission unit 403 transmits the third video stream to the terminal. The transmitting unit 403 transmits the third video stream compressed by the compression subunit to the terminal.

The functions of the foregoing units may correspond to the processing steps of the method for synthesizing the live video described in detail in FIG. 3, and details are not described herein again.

The method, device, and system for synthesizing live video provided by the embodiment of the present application, the video capture device is used to collect the interaction behavior of the user for the currently played picture, and the collected video stream is transmitted to the server, and the user may be added to the currently played picture. Interactive video, forming a live video, good real-time performance, good user experience, and because the video stream is synthesized in the server, the live video stream can be better and the picture is clearer.

A person skilled in the art should further appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both, in order to clearly illustrate hardware and software. Interchangeability, the composition and steps of the various examples have been generally described in terms of function in the above description. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein can be implemented in hardware, a software module executed by a processor, or a combination of both. The software module can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or technical field. Any other form of storage medium known.

The specific embodiments of the present invention have been described in detail with reference to the specific embodiments of the present application. It is to be understood that the foregoing description is only The scope of protection, any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of this application are intended to be included within the scope of the present application.

Claims

A method for synthesizing live video, characterized in that the method comprises:

When the first video stream is played, the second video stream is collected by the video capture device;

Transmitting the second video stream to a server, so that the server merges with the first video stream being played by using the second video stream to form a live third video stream;

Receiving the third video stream sent by the server;

Parsing the third video stream to form a play screen of the third video stream, and playing a play screen of the third video stream.
The method according to claim 1, wherein the second video stream comprises: collecting, by the video capture device, a user's interaction behavior with a play screen of the first video stream to form a second video stream.
The method according to claim 2, wherein the interaction behavior comprises a voice interaction behavior and an action interaction behavior;

The interaction behavior of the user to capture the play screen of the first video stream by using the video capture device includes:

Collecting, by the camera, an action interaction behavior of the user on a play screen of the first video stream; and collecting, by using a microphone, voice interaction data of the user on the first video stream.
The method of claim 2, wherein the interaction behavior comprises an action interaction behavior;

The interaction behavior of the user to capture the play screen of the first video stream by using the video capture device includes:

The action interaction behavior of the user on the played picture is collected by the camera.
The method according to claim 1, wherein after receiving the third video stream sent by the server, the method further comprises:

Storing the third video stream;

When receiving the operation of playing the third video stream, the third video stream is parsed to form a play screen of the third video stream, and a play screen of the third video stream is played.
A method for synthesizing live video, characterized in that the method comprises:

Receiving, by the terminal, the second video stream that is transmitted by the terminal, where the second video stream is a video stream that is collected by the terminal through the video collection device;

Merging the second video stream with the first video stream to form a live third video stream;

Transmitting the third video stream to the terminal.
The method according to claim 6, wherein the second video stream is a video stream formed by the user that the terminal collects through the video collection device interacts with the first video stream.
The method according to claim 6, wherein the merging of the second video stream with the first video stream being played to form a third video stream of the live broadcast comprises:

Embedding a play window in a play screen of the first video stream;

Adding a play screen of the second video stream to the play window according to the time identifier of the second video stream, and the play screen of the play window has the same play screen as the play screen of the first video stream Time identification, forming the third video stream.
The method according to claim 6, wherein before the transmitting the third video stream to the terminal, the method further comprises:

And compressing the formed play picture of the third video stream, and transmitting the compressed third video stream to the terminal.
A synthesizing device for a live video, characterized in that the device comprises:

An acquiring unit, configured to collect a second video stream when playing the first video stream;

a transmitting unit, configured to transmit the second video stream collected by the collecting unit to a server, so that the server merges with the first video stream being played by using the second video stream to form a live broadcast Three video streams;

a receiving unit, configured to receive the third video stream sent by the server;

And a processing unit, configured to parse the third video stream received by the receiving unit, form a play screen of the third video stream, and play a play screen of the third video stream.
The device according to claim 10, wherein the second video stream comprises a second video stream formed by the collected user interaction behavior of the play picture of the first video stream;

The interaction behavior includes a voice interaction behavior and an action interaction behavior; the collection unit includes a camera and a microphone, and the camera collects an action interaction behavior of the user on a play screen of the first video stream, and the microphone collects the User interaction data of the first video stream;

Or the interaction behavior includes an action interaction behavior; the collection unit includes a camera, and the camera collects an action interaction behavior of the user on a play screen of the first video stream.
The device according to claim 10, wherein the device further comprises:

a storage unit, configured to store the third video stream after the receiving unit receives the third video stream sent by the server;

When the receiving unit receives the operation of playing the third video stream, the processing unit parses the third video stream to form a play screen of the third video stream, and plays the play of the third video stream. Picture.
A synthesizing device for a live video, characterized in that the device comprises:

a receiving unit, configured to receive, when the terminal plays the first video stream, a second video stream that is transmitted by the terminal;

a processing unit, configured to merge the second video stream received by the receiving unit with the first video stream to form a third video stream that is broadcasted;

And a transmitting unit, configured to transmit the third video stream formed by the processing unit to the terminal.
The device according to claim 13, wherein the second video stream is a video stream formed by the user that the terminal collects through the video collection device interacts with the first video stream.
The device according to claim 13, wherein the processing unit specifically comprises:

Embedding a subunit for embedding a play window in a play screen of the first video stream;

a merging unit, configured to add a play screen of the second video stream to the play window according to a time identifier of the second video stream, and play a play screen of the play window with the first video stream The play screens have the same time stamp to form the third video stream.
The device according to claim 13, wherein the processing unit further comprises:

a compression subunit, configured to compress the formed play picture of the third video stream before the transmitting unit transmits the third video stream to the terminal;

The transmitting unit transmits the compressed third video stream to the terminal.
A system for synthesizing live video, characterized in that the system comprises: a server and a terminal with a video capture device;

When the terminal plays the first video stream, the terminal collects the second video stream by using the video capture device;

Transmitting, by the terminal, the second video stream to the server;

The server merges with the first video stream being played by using the second video stream to form a third video stream of the live broadcast;

Receiving, by the terminal, the third video stream sent by the server;

The terminal parses the third video stream to form a play screen of the third video stream, and plays a play screen of the third video stream.