CN114615236A

CN114615236A - WebRTC-based multi-user audio and video communication method

Info

Publication number: CN114615236A
Application number: CN202210232841.XA
Authority: CN
Inventors: 束静; 谢忠敏
Original assignee: Lingan Technology Hangzhou Co ltd
Current assignee: Lingan Technology Hangzhou Co ltd
Priority date: 2022-03-09
Filing date: 2022-03-09
Publication date: 2022-06-10

Abstract

Embodiments of the present disclosure provide a WebRTC-based multi-person audio-video communication method, apparatus, device, and computer-readable storage medium. The method comprises the steps of obtaining network information of each client; based on the network information, connecting each client through a signaling server; and after the connection is established, completing audio and video communication among a plurality of clients through WebRTC. In this way, server cost is reduced while supporting browsers.

Description

WebRTC-based multi-user audio and video communication method

Technical Field

Embodiments of the present disclosure relate generally to the field of video communication, and more particularly, to a WebRTC-based multi-person audio-video communication method, apparatus, device, and computer-readable storage medium.

Background

At present, in the multi-person real-time audio and video solution, a video push-streaming technology is mainly used in the market, but in the mainstream push-streaming technology, in a mainstream push protocol RTMP protocol, video must be encoded by H264, audio must be encoded by AAC or MP3, and the video is mostly packaged in flv format.

RTMP is the most mainstream streaming media transmission protocol at present, has good support for CDN, has low realization difficulty, and is the choice of most live broadcast platforms. However, RTMP does not support browsers and Adobe is no longer updated, so that additional push protocol support is required for live broadcast services to support browsers.

Disclosure of Invention

According to an embodiment of the disclosure, a WebRTC-based multi-user audio-video communication scheme is provided.

In a first aspect of the disclosure, a WebRTC-based multi-person audio and video communication method is provided. The method comprises the following steps:

acquiring network information of each client;

based on the network information, connecting each client through a signaling server;

and after the connection is established, completing audio and video communication among a plurality of clients through WebRTC.

Further, the obtaining the network information of each client device includes:

each client establishes an RTCPererConnection object respectively;

and acquiring corresponding network information through a STUN server based on the RTCPeerConnection object.

Further, still include:

the signaling server is constructed by a duplex communication technology.

Further, the completing audio-video communication among a plurality of clients through WebRTC includes:

the calling client acquires local audio and video coding and resolution information through a createOffer method of the RTCPereConnection and sends the information to the called client through the signaling server;

after receiving the information sent by the calling client, the called client acquires local audio and video coding and resolution information through a createAnswer method of the RTCPeerConnection, and sends the local audio and video coding and resolution information to the calling client through a signaling server to complete audio and video communication among a plurality of clients.

Further, the network information includes the external network ip and the port information.

Further, still include:

and setting the equipment information of the audio and video by a setromoteddescription method of the RTCPereConnection.

Further, still include:

and adding local audio and video coding and resolution information into the RTCPererConnection through a setLocalDescription method.

In a second aspect of the disclosure, a WebRTC-based multi-person audio-video communication device is provided. The device includes:

the acquisition module is used for acquiring the network information of each client;

the connection module is used for connecting each client through a signaling server based on the network information;

and the communication module is used for completing audio and video communication among a plurality of clients through WebRTC after the connection is established.

In a third aspect of the disclosure, an electronic device is provided. The electronic device includes: a memory having a computer program stored thereon and a processor implementing the method as described above when executing the program.

In a fourth aspect of the present disclosure, a computer readable storage medium is provided, having stored thereon a computer program, which when executed by a processor, implements a method as in accordance with the first aspect of the present disclosure.

The multi-user audio and video communication method based on the WebRTC, provided by the embodiment of the application, comprises the steps of acquiring network information of each client; based on the network information, connecting each client through a signaling server; after the connection is established, audio and video communication among a plurality of clients is completed through WebRTC, the method is suitable for video conferences, live broadcast microphone connection, interphones and voice telephones, supports browsers, and has low delay.

It should be understood that what is described in this summary section is not intended to define key or essential features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, like or similar reference characters denote like or similar elements, and wherein:

FIG. 1 illustrates a schematic diagram of an exemplary operating environment in which embodiments of the present disclosure can be implemented;

fig. 2 shows a flow diagram of a WebRTC-based multi-person audio-video communication method according to an embodiment of the present disclosure;

fig. 3 shows a block diagram of a WebRTC-based multi-person audio-video communication device according to an embodiment of the present disclosure;

FIG. 4 illustrates a block diagram of an exemplary electronic device capable of implementing embodiments of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without inventive step, are intended to be within the scope of the present disclosure.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

FIG. 1 illustrates a schematic diagram of an exemplary operating environment 100 in which embodiments of the present disclosure can be implemented. Included in the runtime environment 100 are a client 101, a signaling server 102, a client 103, and a STUN server (TURN) 104.

The STUN server 104 is used for acquiring the external network addresses and port information of the clients 101 and 103;

the signaling server 102 is used to relay transmission information between clients (the STUN server is not stable in China, and in order to ensure the stability of the service, the signaling server 102 is required to relay in the communication establishment process).

In some embodiments, the signaling server 102 may be constructed with a technology of duplex communication, such as WebSocket.

The signaling server and the WebRTC are combined with each other to have better expansibility, and the signaling server is used as a bridge for WebRTC data transmission, can transmit media data, and can also transmit data such as files, messages and the like to be conveniently expanded.

It should be understood that the number of clients, and servers in FIG. 1 is illustrative only. There may be any number of clients and servers, as desired for implementation. In particular, the system architecture described above may not include a network, but only a client or server, in the event that the target data does not need to be obtained remotely.

Fig. 2 shows a flowchart of a WebRTC-based multi-person audio-video communication method 200 according to an embodiment of the present disclosure. The method comprises the following steps:

s210, network information of each client is obtained.

Wherein the network information includes an external network address and port information.

In some embodiments, the external network address between the clients is obtained through the NAT technology. In practice, most computers are located behind the NAT, and only a few hosts have the external network address. Thus, the present disclosure employs a penetrable NAT technique (STUN, TRUN) to obtain the foreign network address and port of the current host, i.e., using WebRTC technique to obtain the foreign network address and port of the current host (WebRTC obtains the foreign network address and port of the current host by default using a STUN server).

Specifically, each client (to-be-established communication client) respectively establishes an RTCPeerConnection object, and the RTCPeerConnection object acquires its own external network IP and port from the STUN server.

S220, based on the network information, connecting each client through a signaling server; and after the connection is established, completing audio and video communication among a plurality of clients through WebRTC.

In some embodiments, a primary client (originating communication client) forwards network information of itself to a called client after being relayed by a signaling server;

the called client end receives the network information of the calling client end and then creates an RTCPeerConnection object, the received information is added to the object through addIntermediate, and simultaneously, the self external network address and port are forwarded to the calling client end through the signaling server. And completing the connection between the calling client and the called client, namely connecting each client through the signaling server.

A calling client acquires local audio and video coding and resolution information through a createOffer method of the RTCPererConnection, adds the local audio and video coding and resolution information into the RTCPererConnection through setlocodescription, and transmits the information to a called client after the information is transferred through a signaling server;

after receiving the information sent by the calling client, the called client saves the information by using setromoteddescription () of the RTCPeerConnection object.

Specifically, the calling client sends a create room message to the signaling server:

{

"type":"room",

"content":{

"event":"_create_room"

"data":

},

"fromId":"xxx",

"classify":1

}

the called client end sends a room joining message to the signaling server:

{

"type":"room",

"content":{

"event":"_join_room"

},

"toId":

"fromId":"xxx",

"classify":1

}

the clients in the room (the calling client and the client joining the room) are all called room joining messages sent by the called client, and at this time, the clients in the room execute the following steps:

(1) acquiring streaming media

(2) Create peerConnection

(3) Add flow to peerConnection

(4) Generate Offer Offer

(5) Setting up local descriptions

(6) Send Offer to the new called client

The offer message may be sent to the called client as follows:

{

"type":"single",

"content":{

"event":"_offer",

“data":{

“sdp":sdp

}

},

"toId":"cIid",

"fromId":"xxx",

"classify":1

}

after receiving the offer message, the called client executes the following steps:

(1) acquiring streaming media

(2) Create a peerConnection

(3) Setting up remote description

(4) Generate Answer responses

(5) Send Answer to client A in room

(6) Setting up remote description

Clients in the Answer value room may be sent as follows:

{

"type":"single",

"content":{

"event":"_answer",

"data":{

"sdp":sdp

}

},

"toId":"cIid",

"fromId":"xxx",

"classify":1

}

further, the calling client sends a local Candidate to the called client by the following method:

{

"type":"room",

"content":{

"event":"_ice_candidate",

"data":{

"candidate":candidate

}

},

"toId": room ID ",

"fromId":"xxx",

"classify":1

}

the called client sends the local Candidate to the calling client by the following method:

{

"type":"room",

"content":{

"event":"_ice_candidate",

"data":{

"candidate":candidate

}

},

"toId": room ID ",

"fromId":"xxx",

"classify":1

}

at this time, the calling client and the called client add the Candidate sent by the remote end respectively, and the connection between the calling client and the called client is completed. Namely, the calling client acquires local audio and video coding and resolution information through a createOffer method of the RTCPeerConnection, and sends the local audio and video coding and resolution information to the called client through a signaling server;

In some embodiments, the device information of the audio and video is set through a setromoteddescription method of the RTCPeerConnection.

In some embodiments, the local audio-video coding, resolution information is added to the RTCPeerConnection by the setLocalDescription method.

According to the embodiment of the disclosure, the following technical effects are achieved:

the invention is based on the WebRTC technology as the transmission architecture, the performance and the delay are relatively better, the WebRTC technology is transmitted by end-to-end media data in the transmission mode, and the dependence on the performance and the bandwidth of the server is relatively smaller. The system is not only suitable for video conferences, live broadcasting and microphone connection, and interphones, but also suitable for voice calls, and supports browsers.

It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art will appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules are not necessarily required for the disclosure.

The above is a description of embodiments of the method, and the embodiments of the apparatus are further described below.

Fig. 3 shows a block diagram of a WebRTC-based multi-person audio-video communication device 300 according to an embodiment of the present disclosure. As shown in fig. 3, the apparatus 300 includes:

an obtaining module 310, configured to obtain network information of each client;

a connection module 320, configured to connect each client through a signaling server based on the network information;

and the communication module 330 is configured to complete audio and video communication between multiple clients through WebRTC after the connection is established.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the described module may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.

FIG. 4 shows a schematic block diagram of an electronic device 400 that may be used to implement embodiments of the present disclosure. As shown, device 400 includes a Central Processing Unit (CPU)401 that may perform various appropriate actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM)402 or loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data required for the operation of the device 400 can also be stored. The CPU401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

A number of components in device 400 are connected to I/O interface 405, including: an input unit 406 such as a keyboard, a mouse, or the like; an output unit 407 such as various types of displays, speakers, and the like; a storage unit 408 such as a magnetic disk, optical disk, or the like; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Processing unit 401 performs various methods and processes described above, such as method 200. For example, in some embodiments, the method 200 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM 402 and/or the communication unit 409. When the computer program is loaded into RAM 403 and executed by CPU401, one or more steps of method 200 described above may be performed. Alternatively, in other embodiments, the CPU401 may be configured to perform the method 200 in any other suitable manner (e.g., by way of firmware).

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System On Chip (SOCs), load programmable logic devices (CPLDs), and the like.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the application referred to in the present application is not limited to the embodiments with a particular combination of the above-mentioned features, but also encompasses other embodiments with any combination of the above-mentioned features or their equivalents without departing from the spirit of the application. For example, the above features may be replaced with (but not limited to) features having similar functions as those described in this application.

Claims

1. A multi-person audio and video communication method based on WebRTC is characterized by comprising the following steps:

acquiring network information of each client;

2. The method of claim 1, wherein obtaining the network information of each client device comprises:

each client establishes an RTCPeerConnection object respectively;

3. The method of claim 2, further comprising:

the signaling server is constructed by a duplex communication technology.

4. The method of claim 3, wherein the performing audio-video communication between the plurality of clients through WebRTC comprises:

a calling client acquires local audio and video coding and resolution information through a createOffer method of RTCPeerconnection and sends the information to a called client through a signaling server;

5. The method of claim 4, wherein the network information comprises external network ip and port information.

6. The method of claim 5, further comprising:

7. The method of claim 6, further comprising:

and adding local audio and video coding and resolution information into the RTCPereerconnection through a setLocalDescription method.

8. A WebRTC-based multi-user audio-video communication device, comprising:

9. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program, wherein the processor, when executing the program, implements the method of any of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1 to 7.