CN112235238B

CN112235238B - MCU system and method based on WebRTC

Info

Publication number: CN112235238B
Application number: CN202010909199.5A
Authority: CN
Inventors: 方辉
Original assignee: Wuhan Fiberhome Digtal Technology Co Ltd
Current assignee: Wuhan Fiberhome Digtal Technology Co Ltd
Priority date: 2020-09-02
Filing date: 2020-09-02
Publication date: 2022-11-01
Anticipated expiration: 2040-09-02
Also published as: CN112235238A

Abstract

An MCU system and method based on WebRTC, the MCU system includes WebRTC server, mixer server and STUN/TURN server; the WebRTC server is used for receiving a room adding request from a terminal user, adding the terminal user into a room, receiving audio and video streams issued by each participant in the room, and forwarding the audio and video streams issued by each participant to the Mixer server, wherein the Mixer server is realized based on a GStreamer frame and is used for carrying out mixed flow processing on the multi-path audio and video streams acquired from the WebRTC server, mixing the multi-path audio and video streams into one-path audio and video stream to form a mixed stream, re-issuing the mixed stream into the WebRTC server, and managing the position of each mixed flow window; the STUN/TURN server is built based on a TURN open source server and is used for ICE to achieve an intranet penetration function. The invention can provide richer communication modes for the traditional P2P mode system.

Description

MCU system and method based on WebRTC

Technical Field

The invention relates to the field of information communication and the like, in particular to a WebRTC-based MCU system and a WebRTC-based MCU method.

Background

With the rapid development of the current mobile internet technology, the application of the industries such as public security, transportation and the like to intelligent mobile terminals becomes mature day by day, and although the traditional systems such as digital talkback, analog talkback and the like still occupy a place in the command and scheduling work, the single voice information scheduling function provided by the traditional systems and the diversity requirements of the command work on information cannot be suitable. After google opens source WebRTC in 2010, the video call technology threshold is greatly reduced, so that more people are willing to participate in the development of the function, but new problems also arise, and although WebRTC is a real-time communication technology mainly using P2P and should not have centralized nodes, in some large-scale multi-person communication scenes, if end-to-end direct connection is used, the problems of bandwidth and performance can be met on the end.

Disclosure of Invention

In view of technical defects and technical drawbacks in the prior art, embodiments of the present invention provide an MCU system and method based on WebRTC, which overcome or at least partially solve the above problems, and the specific scheme is as follows:

as a first aspect of the present invention, there is provided a WebRTC-based MCU system, the MCU system including a WebRTC server, a Mixer server, and a STUN/TURN server;

the WebRTC server is used for receiving a room adding request from a terminal user, adding the terminal user into a room, receiving audio and video streams issued by each participant in the room, and forwarding the audio and video streams issued by each participant to the Mixer server;

the Mixer server is realized based on a GStreamer frame and is used for carrying out mixed flow processing on the multi-path audio and video streams acquired from the WebRTC server, mixing the multi-path audio and video streams into one path of audio and video stream to form a mixed stream, releasing the mixed stream into the WebRTC server again and managing the positions of mixed flow windows at the same time;

the STUN/TURN server is built based on a TURN open source server and is used for ICE to achieve an intranet penetration function.

Further, the WebRTC server is specifically configured to:

receiving a room joining request from a terminal user, wherein the room joining request comprises authentication information, and the authentication information comprises a room ID, a user name password and an STUN/TURN address;

judging whether the room ID and the user name password are matched, if not, prompting no authority, ending the process, if the match is successful, carrying out ICE interaction operation, if the ICE interaction is failed, ending the process, otherwise, starting to receive audio and video streams from participants, forwarding the audio and video streams to a Mixer server, and starting a stream receiving port for receiving mixed streams from the Mixer server;

and receiving a mixed stream from the Mixed server, and informing the participants to subscribe the mixed stream, wherein the mixed stream contains picture information and sound information of all members in the room.

Further, the Mixer server is specifically configured to:

receiving a mixed flow request from a WebRTC server, wherein the mixed flow request comprises a room ID and a flow receiving port number used for receiving mixed flow in the WebRTC server;

confirming whether a mixed flow process named by a corresponding room ID exists or not, if so, adding the received audio and video stream into the existing mixed flow process, otherwise, creating a new mixed flow process named by the room ID, and adding the received audio and video stream into the new mixed flow process;

distributing a mixed flow window for newly added audio and video streams to display the position of the audio and video streams and the position of the audio and video streams, creating a mixed flow device based on a GStreamer frame, executing a mixed flow algorithm, mixing multi-path audio and video streams into a single-path mixed flow, and after the mixed flow algorithm is executed, forwarding the mixed audio and video streams to a stream collecting port of the WebRTC server.

Furthermore, the number of the Mixer servers is multiple, and the WebRTC server comprises a Room module, a Streaming module and a WebRTC protocol layer;

the Room module is used for receiving a Room creating request of a manager, creating a Room for the manager, receiving a Room adding request from a terminal user, authenticating each added terminal user, and adding the authenticated terminal user into the Room;

the Streaming module is used for acquiring the audio and video stream when participants publish the audio and video stream in a room, forwarding the audio and video stream to an available or low-load Mixer server, performing mixed flow processing by the corresponding Mixer server, receiving the mixed audio and video stream from the Mixer server, and notifying each participant to subscribe;

the WebRTC protocol layer is used for realizing the underlying WebRTC protocol.

Further, the Room module is also used for notifying other participants when a participant leaves the Room and notifying the Streaming module, and the Streaming module is also used for notifying the Mixer server to delete the corresponding mixed flow window when the participant leaves the Room.

As a second aspect of the present invention, there is provided a WebRTC-based MCU method, the method comprising:

step 1, a WebRTC server receives a room adding request from a terminal user, adds the terminal user into a room, receives audio and video streams issued by each participant in the room, and forwards the audio and video streams issued by each participant to a Mixer server;

and 2, the mixer server performs mixed flow processing on the multi-path audio and video stream acquired from the WebRTC server, mixes the multi-path audio and video stream into one path of audio and video stream to form mixed stream, and releases the mixed stream to the WebRTC server again, and manages the positions of mixed flow windows at the same time.

Further, in step 1, the WebRTC server receives a room joining request from a terminal user, joins the terminal user into a room, receives audio and video streams issued from each participant in the room, and forwards the audio and video streams issued by each participant to the Mixer server specifically includes:

step 101, receiving a room joining request from a terminal user, wherein the room joining request comprises authentication information, and the authentication information comprises a room ID, a user name and a password and an STUN/TURN address;

step 102, judging whether the room ID is matched with the user name and the password, and if not, entering step 303; if the matching is successful, go to step 304;

step 303, prompting no authority, and ending the process;

step 304, performing ICE interaction operation, if the ICE interaction fails, entering step 305, otherwise, entering step 306;

step 305, ICE interaction fails, and the process ends;

step 306, start receiving audio/video stream data from participant, enter step 309

Step 309, sending a mixed flow request to the Mixer server, forwarding the audio and video stream to the mixed flow server, starting a stream receiving port for receiving the mixed stream from the Mixer server, and entering step 310, wherein the mixed flow request includes a room ID and a stream receiving port number used for receiving the mixed stream in the WebRTC server;

step 310, receiving a mixed stream from the Mixe server, wherein the mixed stream contains picture information and sound information of all members in the room.

And 311, after the mixed stream is obtained, notifying the participants to subscribe the mixed stream.

Further, step 306 further comprises:

step 307, judging the number of the participants in front of the anterior chamber, if the number is less than 6, entering step 309, and if not, entering step 308;

step 308, notifying other participants in the room to subscribe the video stream;

step 309, forward the audio and video stream to the Mixer server, and start a stream receiving port for receiving the mixed stream from the Mixer server, and enter step 310.

Further, in step 2, the Mixer server performs mixing processing on the multiple audio and video streams acquired from the WebRTC server, mixes the multiple audio and video streams into one audio and video stream to form a mixed stream, and reissues the mixed stream to the WebRTC server specifically includes:

step 401, receiving a mixed flow request from a WebRTC server, where the mixed flow request includes a room ID and a stream receiving port number used for receiving a mixed flow in the WebRTC server;

step 402, confirming whether a mixed flow process named by a corresponding room ID exists, if so, entering step 403, otherwise, entering step 404;

step 403, adding the received audio/video stream into the existing mixing process, so that the audio/video stream in the same room ID executes the same mixing process, and then step 405 is performed.

Step 404, a new mixed flow process named by the room ID is created, the same mixed flow process is executed on the audio and video flows in the same room, and the step 405 is executed after the new mixed flow process is added.

Step 405, allocating a mixed flow window for the newly added audio/video stream to display the position of the audio/video stream, and entering step 406;

and 406, creating a mixer based on the GStreamer frame, executing a mixing algorithm, and mixing the multi-path audio and video stream into a single-path mixed stream.

And step 407, after the mixed flow algorithm is executed, forwarding the mixed audio and video stream to a stream receiving port of the WebRTC server.

Further, step 407 is followed by:

step 408, after receiving the window amplification request of the terminal user, forwarding the one-way stream;

step 409, after receiving the window shrinking request of the terminal, re-executing step 406 until the whole process is finished.

The invention has the following beneficial effects:

the embodiment of the invention provides a WebRTC-based MCU system and a method, wherein the system comprises: webRTC server, mixer server and STUN/TURN server; the WebRTC server is used for room management and stream management and comprises a bottom-layer WebRTC protocol implementation; the Mixer server is realized based on a GStreamer frame, and is used for mixing multiple paths of audio and video streams acquired from WebRTC into one path. Republishing the mixed flow window to a WebRTC server, and managing the positions of the mixed flow windows simultaneously, wherein the mixed flow window comprises but is not limited to single window amplification and reduction and the like; the STUN/TURN server is built based on a TURN open source server and is used for realizing an intranet penetration function by ICE. The embodiment of the invention provides a comprehensive solution in terms of programs and architecture, if 5 users are connected to a room, each video acquisition terminal is connected with a streaming module, a mixing process is responsible for complex logics of all video coding, transcoding, decoding, mixing and the like, each terminal only needs 1 connection, the whole application only consumes 5 connections, the bandwidth occupation (including uplink and downlink) is 10m, the pressure of a browser end is much smaller, more people can be supported for simultaneous audio and video communication, and the method is more suitable for multi-person video conferences. When the number of people in a room is too large and the mixed flow pressure of the Mixer server is high, the SFU mode can be automatically switched to, namely the MCU method is closed. The Mixer server and the STUN/TURN server can support cluster deployment and support the concurrency capability of a horizontal expansion system when the number of users is too large.

Drawings

Fig. 1 is a network topology diagram of an MCU system based on WebRTC according to an embodiment of the present invention;

fig. 2 is a block diagram of a WebRTC-based MCU system structure provided in an embodiment of the present invention;

fig. 3 is a flowchart illustrating an operation of a WebRTC server in an MCU method based on WebRTC according to an embodiment of the present invention;

fig. 4 is a flowchart of a Mixer server in the MCU method based on WebRTC according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1-2, as a first embodiment of the present invention, a WebRTC-based MCU system is provided, which includes a WebRTC server, a Mixer server and STUN/TURN servers, where the Mixer server and STUN/TURN servers may include multiple servers, multiple Mixer servers form a mixed-flow server cluster, multiple STUN/TURN servers form an ICE server cluster, and the mixed-flow server cluster is illustrated in fig. 2 as including two talk servers, and the system communicates with an end user through HTTP/JSON-RPC messages;

the WebRTC server is used for room management and stream management and comprises a bottom layer WebRTC protocol implementation, and comprises the steps of receiving a room adding request from a terminal user, adding the terminal user into a room, receiving audio and video streams issued by participants in the room, and forwarding the audio and video streams issued by the participants to a Mixer server, wherein the room is created by a manager in advance, each room has a room ID and a user name password thereof, and the room adding request comprises the room ID to be added and the corresponding user name password;

the Mixer server is realized based on a GStreamer frame and is used for carrying out mixed flow processing on the multi-path audio and video streams acquired from the WebRTC server, mixing the multi-path audio and video streams into one path of audio and video stream to form a mixed stream, and re-releasing the mixed stream into the WebRTC server; the name of the mixed stream is named with the room ID; and meanwhile, the Mixer server also manages the mixed flow windows, the mixed flow is started by default to open the 6 mixed flow windows, the participants are arranged in sequence, when one user leaves or stops the audio and video distribution, the window occupied by the user is released, and the released window becomes black. When a user double-clicks and amplifies a certain path of window, the Mixer server cancels the mixed flow method and only distributes the path of video flow.

The STUN/TURN server is built based on a TURN open source server and is used for ICE to realize an intranet penetration function.

Preferably, the WebRTC server is specifically configured to:

judging whether the room ID and the user name password of the room joining request are matched with the room ID and the user name password of the room needing to be joined, if not, prompting no permission, ending the process, if successful matching, carrying out ICE interaction operation, if ICE interaction fails, ending the process, otherwise, starting to receive audio and video streams from participants, forwarding the audio and video streams to a Mixer server, and starting a stream receiving port for receiving mixed streams from the Mixer server;

Preferably, the Mixer server is specifically configured to:

Preferably, the WebRTC server includes a Room module, a Streaming module, and a WebRTC protocol layer;

the Room module is used for receiving a Room creating request of a manager, creating a Room for the manager, receiving a Room adding request from a terminal user, authenticating each added terminal user, and adding the authenticated terminal user into the corresponding Room; in addition, the Room module also receives the audio and video streams published by each participant and informs other participants in the Room to subscribe the video streams; the Streaming module is used for acquiring the audio and video stream when participants publish the audio and video stream in a Room, forwarding the audio and video stream to an available or low-load Mixer server, performing mixed flow processing by the corresponding Mixer server, receiving the mixed audio and video stream from the Mixer server, and informing each participant to subscribe, when a user leaves the Room, the Room module can get through other participants and simultaneously inform the Streaming module, and the Streaming module can also inform the Mixer server to delete a mixed flow window; the WebRTC protocol layer is used for realizing a bottom WebRTC protocol.

The embodiment of the invention provides a comprehensive solution in terms of programs and architecture, if 5 users are connected to a room, each video acquisition terminal is connected with a streaming module, a mixing process is responsible for complex logics of all video coding, transcoding, decoding, mixing and the like, each terminal only needs 1 connection, the whole application only consumes 5 connections, the bandwidth occupation (including uplink and downlink) is 10m, the pressure of a browser end is much smaller, more people can be supported for simultaneous audio and video communication, and the method is more suitable for multi-person video conferences. When the number of people in a room is too large and the mixed flow pressure of the Mixer server is high, the SFU mode can be automatically switched to, namely the MCU method is closed. The Mixer server and the STUN/TURN server can support cluster deployment and support the concurrency capability of a horizontal expansion system when the number of users is too large.

As a second embodiment of the present invention, there is provided a WebRTC-based MCU method, including:

Preferably, as shown in fig. 3, in step 1, the WebRTC server receives a room joining request from an end user, joins the end user into a room, receives audio and video streams issued from each participant in the room, and forwards the audio and video streams issued by each participant to the Mixer server specifically includes:

step 303, prompting no authority, and ending the process;

step 305, ICE interaction fails, and the process ends;

step 306, starting to receive audio and video stream data from a participant;

309, forwarding the audio and video stream to a Mixer server, starting a stream receiving port for receiving a mixed stream from the Mixer server, and entering step 310;

In step 311, after the mixed stream is obtained, the participants are notified to subscribe to the mixed stream, and it is noted that the whole flow mixed stream is subscribed only once, which is different from the case that in step 308, the participants are notified to subscribe for multiple times.

Preferably, as shown in fig. 4, in step 2, the step of mixing the multiple audio and video streams acquired from the WebRTC server by the Mixer server, mixing the multiple audio and video streams into one audio and video stream to form a mixed stream, and reissuing the mixed stream to the WebRTC server specifically includes:

Step 404, a new mixed flow process named by the room ID is created to ensure that the audio and video streams in the same room execute the same mixed flow process, and after the new mixed flow process is added, the process goes to step 405.

Step 407, after the mixed flow algorithm is executed, forwarding the mixed audio and video stream to a stream receiving port of the WebRTC server;

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A WebRTC-based MCU system, comprising a WebRTC server, a Mixer server and a STUN/TURN server;

the Mixer server is used for mixing the multi-path audio and video stream acquired from the WebRTC server, mixing the multi-path audio and video stream into one path of audio and video stream to form a mixed stream, re-issuing the mixed stream into the WebRTC server, and managing the positions of mixed stream windows;

the STUN/TURN server is built based on a coturn open source server and is used for ICE to realize an intranet penetration function;

wherein the Mixer server is specifically configured to:

distributing a mixed flow window for newly added audio and video streams to display the position of the audio and video streams and the position of the audio and video streams, creating a mixed flow device based on a GSTreamer frame, mixing multi-path audio and video streams into a single-path mixed flow, and forwarding the mixed audio and video streams to a flow receiving port of a WebRTC server;

wherein the WebRTC server is specifically configured to:

judging whether the room ID is matched with the user name and the password, if not, prompting no authority, ending the process, if successfully matched, carrying out ICE interaction operation, if the ICE interaction is failed, ending the process, otherwise, starting to receive the audio and video stream from the participant, forwarding the audio and video stream to a Mixer server, and starting a stream receiving port for receiving a mixed stream from the Mixer server;

receiving a mixed stream from a Mixer server, and informing a participant to subscribe the mixed stream, wherein the mixed stream comprises picture information and sound information of all members in a room;

the number of the Mixer servers is multiple, and the WebRTC server comprises a Room module, a Streaming module and a WebRTC protocol layer;

the WebRTC protocol layer is used for realizing a bottom WebRTC protocol;

the Room module is used for informing other participants when a participant leaves the Room and informing the Streaming module at the same time, and the Streaming module is also used for informing the Mixer server to delete the corresponding mixed flow window when the participant leaves the Room.

2. An MCU method based on WebRTC, the method comprising:

step 2, the mixer server performs mixed flow processing on the multi-path audio and video stream acquired from the WebRTC server, mixes the multi-path audio and video stream into one path of audio and video stream to form a mixed stream, and re-issues the mixed stream into the WebRTC server;

the WebRTC server receives a room joining request from a terminal user, joins the terminal user into a room, receives audio and video streams issued by each participant in the room, and forwards the audio and video streams issued by each participant to the Mixer server specifically includes:

step 301, receiving a room joining request from a terminal user, wherein the room joining request comprises authentication information, and the authentication information comprises a room ID, a username and password and an STUN/TURN address;

step 302, judging whether the room ID is matched with the user name and the password, and if not, entering step 303; if the matching is successful, go to step 304;

step 303, prompting no authority, and ending the process;

step 305, ICE interaction fails, and the flow ends;

step 306, starting to receive audio and video stream data from a participant;

step 310, receiving a mixed stream from a Mixer server, wherein the mixed stream comprises picture information and sound information of all members in a room;

step 311, after the mixed stream is obtained, notifying the participants to subscribe the mixed stream;

in step 2, the step of mixing the multiple audio and video streams acquired from the WebRTC server by the Mixer server, mixing the multiple audio and video streams into one audio and video stream to form a mixed stream, and re-issuing the mixed stream to the WebRTC server specifically includes:

step 403, adding the received audio and video stream into the existing mixed flow process, so that the audio and video stream in the same room ID performs the same mixed flow process, and entering step 405;

step 404, a new mixed flow process named by the room ID is created, the same mixed flow process is executed on the audio and video flows in the same room, and after the new mixed flow process is added, the step 405 is executed;

step 406, creating a mixer based on the GStreamer frame, and mixing the multi-path audio and video stream into a single-path mixed stream;

step 407, forwarding the mixed audio and video stream to a stream receiving port of the WebRTC server;

step 408, after receiving a window amplification request of a terminal user, forwarding the one-way flow;