CN117560358A

CN117560358A - Method, device and system for universal live broadcast voice intercom

Info

Publication number: CN117560358A
Application number: CN202311635105.XA
Authority: CN
Inventors: 易启鹏; 黄代羲; 王俭; 徐敬晓; 曾泽君; 吉东; 张斌; 易露
Original assignee: Beijing Liujinnian Technology Co ltd
Current assignee: Beijing Liujinnian Technology Co ltd
Priority date: 2023-11-30
Filing date: 2023-11-30
Publication date: 2024-02-13

Abstract

The application discloses a method, a device and a system for universal live voice intercom, which relate to the technical field of signal processing, and the method comprises the following steps: receiving a voice call demand sent by a user; creating a voice call room; establishing information flow links between each terminal and a voice call room in the system; acquiring voice call information pushed by a terminal; processing the voice call information to obtain voice mixed flow information; and pushing the voice mixed stream information to each terminal. The method has the advantages that low-delay live voice intercom can be achieved, instantaneity is higher, bandwidth change can be effectively adapted, point-to-point direct communication is supported, and firewall penetration effect can be achieved more easily.

Description

Method, device and system for universal live broadcast voice intercom

Technical Field

The present disclosure relates to the field of signal processing technologies, and in particular, to a method, an apparatus, and a system for universal live voice intercom.

Background

In the use scenario of the traditional voice intercom function, a TCP protocol is generally adopted for data transmission, and although the TCP protocol is a relatively reliable protocol in the transmission protocol, in the process of transmitting data, a certain delay is required to be introduced due to the need of acknowledgement and retransmission, and the requirement on bandwidth is high and is limited by a firewall.

Therefore, how to provide a method with higher transmission efficiency and better instantaneity is a problem to be solved.

Disclosure of Invention

In order to solve the problems, the application provides a method, a device and a system for universal live voice intercom.

In a first aspect, the present application provides a method for universal live voice intercom, applied to a universal live voice intercom system, the method including:

receiving a voice call demand sent by a user;

creating a voice call room;

establishing information flow links between each terminal and a voice call room in the system;

acquiring voice call information pushed by a terminal;

processing the voice call information to obtain voice mixed flow information;

and pushing the voice mixed stream information to each terminal.

Optionally, the step of establishing the information flow link between each terminal in the system and the voice call room specifically includes:

applying two paths of RTMP stream addresses for the created voice call room, wherein one path is used for transmitting the voice call information pushed by the MRS transmission terminal and the other path is used for transmitting the voice mixed stream information;

setting two paths of RTMP stream addresses to the terminal.

Optionally, the step of processing the voice call information to obtain voice mixed flow information specifically includes:

aiming at terminals entering a voice call room, acquiring voice call information pushed by each terminal;

and mixing streams of other terminals except the target terminal to obtain voice mixed stream information aiming at the target terminal.

Optionally, the step of mixing streams of other terminals except the target terminal to obtain voice mixed stream information for the target terminal specifically includes:

converting all voice call information into an OWT format;

carrying out mixed flow on the data in the OWT format;

and converting the mixed OWT format data into voice mixed stream information.

In a second aspect, the present application provides a universal live voice intercom device, applied to a universal live voice intercom system, the device includes:

the demand receiving unit is used for receiving the voice call demand sent by the user;

a room creation unit for creating a voice call room;

the link establishment unit is used for establishing information flow links between each terminal in the system and the voice call room;

the information acquisition unit is used for acquiring voice call information pushed by the terminal;

the mixed flow processing unit is used for processing the voice call information to obtain voice mixed flow information;

and the information pushing unit is used for pushing the voice mixed stream information to each terminal.

Optionally, the link establishment unit is specifically configured to:

setting two paths of RTMP stream addresses to the terminal.

Optionally, the mixed flow processing unit is specifically configured to:

Optionally, the mixed flow processing unit is further specifically configured to:

converting all voice call information into an OWT format;

carrying out mixed flow on the data in the OWT format;

and converting the mixed OWT format data into voice mixed stream information.

In a third aspect, the present application provides a universal live voice intercom system, including:

the PC browser is used for enabling the user to access the MRC page through the PC browser to send out a voice call demand;

MRC, is used for controlling OWT to create the voice call room; establishing information flow links between each terminal and a voice call room in the system;

MRS, used for obtaining the voice call information pushed by the terminal and forwarding to OWT; pushing the voice mixed stream information generated by OWT to each terminal;

OWT for creating a voice call room; processing the voice call information to obtain voice mixed flow information;

the terminal is used for pushing voice call information to the MRS; and receiving the voice mixed flow information pushed by the MRS.

In summary, the present application includes the following beneficial technical effects:

the universal live voice intercom system can realize low-delay live voice intercom, is higher in instantaneity, can adapt to bandwidth change more efficiently, supports point-to-point direct communication, and can realize firewall penetration more easily.

Drawings

Fig. 1 is a schematic diagram of a universal live voice intercom system architecture according to an embodiment of the present application

Fig. 2 is a flowchart of a method for speaking universal live voice in a communication system according to an embodiment of the present application.

Fig. 3 is a functional block diagram of a universal live voice intercom device according to an embodiment of the present application.

Reference numerals:

a demand receiving unit 110; a room creation unit 120; a link establishment unit 130; an information acquisition unit 140; a mixed flow processing unit 150; the information pushing unit 160.

Detailed Description

In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present application, but not all embodiments.

In the description of embodiments of the present application, words such as "exemplary," "such as" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "illustrative," "such as" or "for example" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "illustratively," "such as" or "for example," etc., is intended to present related concepts in a concrete fashion.

In the description of the embodiments of the present application, the term "plurality" means two or more unless otherwise indicated. For example, a plurality of systems means two or more systems, and a plurality of screen terminals means two or more screen terminals. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating an indicated technical feature. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

As shown in fig. 1, the system for universal broadcasting voice intercom according to an embodiment of the present invention is composed of five parts, namely a PC browser, an MRC (Media Reactor Center ), an MRS (Media Reactor Switch, media reactor exchange service), OWT (OpenWeb Text), and a terminal. The MRC control terminal is used for realizing voice communication with any one or more terminals in the live broadcast service, and can also realize voice communication between the terminals.

The information interaction between the five parts includes two types, one is a data stream, corresponding to device information, status information, live room information, etc., and the other is a media stream, corresponding to audio-video media information. The data stream is centered on the MRC and interacts with several other parts, respectively. The media stream is generated by the terminal or the PC browser and then is pushed to the terminal or the PC browser through the processing of MRS and OWT, respectively.

The PC browser end is used by a background operator and can access an MRC page of the cloud through the PC browser. MRC, MRS, OWT set up in the high in the clouds server, the terminal is carried by the personnel of external service and uses.

Specifically, the PC browser is used for enabling a user to access the MRC page through the PC browser to send out a voice call demand;

OWT for creating a voice call room; and processing the voice call information to obtain voice mixed flow information.

The following describes the method for speaking the universal live voice in detail by combining the architecture of the universal live voice speaking system.

As shown in fig. 2, the live voice intercom method according to an embodiment of the present invention includes:

step S101, receiving a voice call request sent by a user.

The voice call demand is typically initiated by the user accessing the MRC page through the PC browser, while the terminal will display the status of the call being made. This indicates that the terminal has successfully established a connection with the MRC server and is waiting to receive and transmit voice call data. The terminal prompts that the user can know the current call state in the call so that the user can better manage the voice call.

Step S102, creating a voice call room;

after receiving the voice call request, the MRC creates a voice call room (chat room) according to the request content. The terminal participating in the call will join the voice call room.

Step S103, establishing information flow links between each terminal and the voice call room in the system.

For a terminal to join a call, the MRC needs to establish an information flow link for the terminal voice call room. The specific implementation mode is that MRC applies for two paths of RTMP stream addresses for the created voice call room, one path is used for transmitting the voice call information pushed by the MRS transmission terminal, and the other path is used for transmitting the voice mixed stream information. Two stream addresses are then set to the terminal, and the two RTMP stream addresses are set to the terminal device.

Among them, RTMP is a real-time messaging protocol, a transmission protocol for audio, video and data streams. In the RTMP protocol, data transmission is performed through the TCP protocol.

It should be noted that as other embodiments of the present invention, other various protocol accesses are also supported, including WebRTC, RTSP, HLS, MPEG-DASH, etc.

After the information flow link is established, the voice call can be started.

Step S104, obtaining voice call information pushed by the terminal.

After the voice call information is collected by the terminal, the media stream is transmitted to the cloud server through the stream address, and then the OWT is used for processing.

Step S105, the voice call information is processed to obtain voice mixed flow information.

For the processing process of the voice call information, aiming at the terminals entering a voice call room, acquiring the voice call information pushed by each terminal;

It is mainly noted that, when the mixed stream is performed, the content of the mixed stream is different for the voice mixed stream information to be pushed by different terminals in the room, and the voice mixed stream information pushed to each target terminal is obtained by mixing voice call information of other terminals except the target terminal.

For example, there are four terminals A, B, C, D in the room, and the voice mixed stream information pushed to the a terminal is obtained by mixing voice call information of the three terminals B, C, D.

The specific execution process is carried out by OWT, and all voice call information is firstly converted into an OWT format; then carrying out mixed flow on the data in the OWT format; and finally, converting the OWT format data after mixed streaming into voice mixed streaming information.

The way of OWT processing voice call information is that firstly voice information is converted into text by using voice recognition technology, the converted text information is represented by using OWT format, then the converted text information is mixed, and finally mixed stream OWT format data is converted into voice mixed stream information. The OWT format is an open standard for describing and sharing text. It is based on Web technology such as HTML, CSS and JavaScript and uses JSON-LD format for data exchange.

The advantage of using the OWT format to process voice call information is that voice is converted into text, and then the text is represented using the OWT format, and sharing of voice information can be achieved by sharing the OWT format information without directly transmitting audio files. Furthermore, OWT is based on the WebRTC protocol, and a real-time communication technology is adopted, so that lower time delay is realized, and the real-time performance is better. And the data stream transmission is adopted, so that the method can adapt to bandwidth change more efficiently, supports point-to-point direct communication, and can realize firewall penetration more easily.

In an embodiment of the present application, the audio format of the supported voice call information includes: AAC, AC3, OPUS, G722, iSAC, iLBC, PCMU, PCMA, nellymoser, etc.

Step S106, pushing the voice mixed stream information to each terminal.

The voice mixed stream information is pushed to each terminal through another path stream address applied before. And the terminal pulls the voice mixed stream information through the set stream address.

In the above process, besides realizing the voice call between the terminals, the PC browser side can also acquire voice call information and voice mixed stream information, and the specific implementation manner is the same as that of the terminals and is realized through two-way stream addresses. In this way, the user of the PC browser can also participate in an online call in the voice call room.

The above procedure is a principle description of live voice call, and on the basis of the above scheme, a method for managing users and terminals in an established voice call room (chat room) is described below. The user here refers only to the user accessing the MRC through the PC browser.

Typically, the user who is applying for the establishment of the chat room is taken as an administrator. The administrator has the right to invite other users, different users under the same organization can be invited to enter the same chat room, and when the administrator invites other users, the users need to be invited to enter after agreeing. When inviting the terminal in the chat room, the administrator does not need to confirm the terminal, and for the terminal which has entered the chat room, the administrator has the right to decide whether to agree when inviting the terminal in the chat room by other users, and if agreeing, the chat room immediately disconnects the terminal.

Meanwhile, based on the terminal information of entering the chat room acquired by the MRC, the user can view the chat room member list, and can execute the functions of wheat disabling, member deleting and the like for each member.

After the conversation of a certain chat room is completed, destroying the established chat room.

As shown in fig. 3, the apparatus for universal live voice intercom provided by the embodiment of the present invention includes:

a demand receiving unit 110, configured to receive a voice call demand issued by a user;

a room creation unit 120 for creating a voice call room;

a link establishment unit 130, configured to establish information flow links between each terminal in the system and the voice call room;

an information obtaining unit 140, configured to obtain voice call information pushed by the terminal;

a mixed stream processing unit 150, configured to process the voice call information to obtain voice mixed stream information;

and an information pushing unit 160 for pushing the voice mixed stream information to each terminal.

As a preferred embodiment of the present invention, the link establishment unit 130 is specifically configured to:

setting two paths of RTMP stream addresses to the terminal.

As a preferred embodiment of the present invention, the mixed-flow processing unit 150 is specifically configured to:

As a preferred embodiment of the present invention, the mixed-flow processing unit 150 is further specifically configured to:

converting all voice call information into an OWT format;

carrying out mixed flow on the data in the OWT format;

and converting the mixed OWT format data into voice mixed stream information.

The universal broadcast voice intercom device provided by the embodiment of the invention is used for realizing the universal broadcast voice intercom method, so that the specific implementation is the same as the method and is not repeated here.

In summary, the method, the device and the system for the universal live voice intercom provided by the embodiment of the invention can realize the live voice intercom with lower time delay, have higher instantaneity, can adapt to bandwidth change more efficiently, support point-to-point direct communication and can realize firewall penetration more easily.

In several embodiments disclosed in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the present disclosure. That is, equivalent changes and modifications are contemplated by the teachings of this disclosure, which fall within the scope of the present disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure.

This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a scope and spirit of the disclosure being indicated by the claims.

Claims

1. The universal live voice intercom method is characterized by being applied to a universal live voice intercom system, and comprises the following steps:

receiving a voice call demand sent by a user;

creating a voice call room;

acquiring voice call information pushed by a terminal;

processing the voice call information to obtain voice mixed flow information;

and pushing the voice mixed stream information to each terminal.

2. The method for speaking-over-live voice as claimed in claim 1, wherein the step of establishing the information stream link between each terminal in the system and the voice call room specifically comprises:

setting two paths of RTMP stream addresses to the terminal.

3. The method for speaking-over-live voice as claimed in claim 2, wherein said step of processing said voice call information to obtain voice mixed stream information comprises:

4. The method for speaking-over-live voice as claimed in claim 3, wherein the step of mixing the other terminals except the target terminal to obtain the voice mixed stream information for the target terminal specifically comprises:

converting all voice call information into an OWT format;

carrying out mixed flow on the data in the OWT format;

and converting the mixed OWT format data into voice mixed stream information.

5. A universal live voice intercom device, characterized in that it is applied to a universal live voice intercom system, said device comprising:

a room creation unit for creating a voice call room;

6. The universal live voice intercom method as in claim 5 wherein said link establishment unit is specifically configured to:

setting two paths of RTMP stream addresses to the terminal.

7. The universal live voice intercom method as in claim 6 wherein said mixed stream processing unit is specifically configured to:

8. The universal live voice intercom method as in claim 6 wherein said mixed stream processing unit is further specifically configured to:

converting all voice call information into an OWT format;

carrying out mixed flow on the data in the OWT format;

and converting the mixed OWT format data into voice mixed stream information.

9. A universal live voice intercom system, comprising: