CN117560358A - Method, device and system for universal live broadcast voice intercom - Google Patents

Method, device and system for universal live broadcast voice intercom Download PDF

Info

Publication number
CN117560358A
CN117560358A CN202311635105.XA CN202311635105A CN117560358A CN 117560358 A CN117560358 A CN 117560358A CN 202311635105 A CN202311635105 A CN 202311635105A CN 117560358 A CN117560358 A CN 117560358A
Authority
CN
China
Prior art keywords
voice
information
voice call
terminal
mixed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311635105.XA
Other languages
Chinese (zh)
Inventor
易启鹏
黄代羲
王俭
徐敬晓
曾泽君
吉东
张斌
易露
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Liujinnian Technology Co ltd
Original Assignee
Beijing Liujinnian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Liujinnian Technology Co ltd filed Critical Beijing Liujinnian Technology Co ltd
Priority to CN202311635105.XA priority Critical patent/CN117560358A/en
Publication of CN117560358A publication Critical patent/CN117560358A/en
Pending legal-status Critical Current

Links

Abstract

The application discloses a method, a device and a system for universal live voice intercom, which relate to the technical field of signal processing, and the method comprises the following steps: receiving a voice call demand sent by a user; creating a voice call room; establishing information flow links between each terminal and a voice call room in the system; acquiring voice call information pushed by a terminal; processing the voice call information to obtain voice mixed flow information; and pushing the voice mixed stream information to each terminal. The method has the advantages that low-delay live voice intercom can be achieved, instantaneity is higher, bandwidth change can be effectively adapted, point-to-point direct communication is supported, and firewall penetration effect can be achieved more easily.

Description

Method, device and system for universal live broadcast voice intercom
Technical Field
The present disclosure relates to the field of signal processing technologies, and in particular, to a method, an apparatus, and a system for universal live voice intercom.
Background
In the use scenario of the traditional voice intercom function, a TCP protocol is generally adopted for data transmission, and although the TCP protocol is a relatively reliable protocol in the transmission protocol, in the process of transmitting data, a certain delay is required to be introduced due to the need of acknowledgement and retransmission, and the requirement on bandwidth is high and is limited by a firewall.
Therefore, how to provide a method with higher transmission efficiency and better instantaneity is a problem to be solved.
Disclosure of Invention
In order to solve the problems, the application provides a method, a device and a system for universal live voice intercom.
In a first aspect, the present application provides a method for universal live voice intercom, applied to a universal live voice intercom system, the method including:
receiving a voice call demand sent by a user;
creating a voice call room;
establishing information flow links between each terminal and a voice call room in the system;
acquiring voice call information pushed by a terminal;
processing the voice call information to obtain voice mixed flow information;
and pushing the voice mixed stream information to each terminal.
Optionally, the step of establishing the information flow link between each terminal in the system and the voice call room specifically includes:
applying two paths of RTMP stream addresses for the created voice call room, wherein one path is used for transmitting the voice call information pushed by the MRS transmission terminal and the other path is used for transmitting the voice mixed stream information;
setting two paths of RTMP stream addresses to the terminal.
Optionally, the step of processing the voice call information to obtain voice mixed flow information specifically includes:
aiming at terminals entering a voice call room, acquiring voice call information pushed by each terminal;
and mixing streams of other terminals except the target terminal to obtain voice mixed stream information aiming at the target terminal.
Optionally, the step of mixing streams of other terminals except the target terminal to obtain voice mixed stream information for the target terminal specifically includes:
converting all voice call information into an OWT format;
carrying out mixed flow on the data in the OWT format;
and converting the mixed OWT format data into voice mixed stream information.
In a second aspect, the present application provides a universal live voice intercom device, applied to a universal live voice intercom system, the device includes:
the demand receiving unit is used for receiving the voice call demand sent by the user;
a room creation unit for creating a voice call room;
the link establishment unit is used for establishing information flow links between each terminal in the system and the voice call room;
the information acquisition unit is used for acquiring voice call information pushed by the terminal;
the mixed flow processing unit is used for processing the voice call information to obtain voice mixed flow information;
and the information pushing unit is used for pushing the voice mixed stream information to each terminal.
Optionally, the link establishment unit is specifically configured to:
applying two paths of RTMP stream addresses for the created voice call room, wherein one path is used for transmitting the voice call information pushed by the MRS transmission terminal and the other path is used for transmitting the voice mixed stream information;
setting two paths of RTMP stream addresses to the terminal.
Optionally, the mixed flow processing unit is specifically configured to:
aiming at terminals entering a voice call room, acquiring voice call information pushed by each terminal;
and mixing streams of other terminals except the target terminal to obtain voice mixed stream information aiming at the target terminal.
Optionally, the mixed flow processing unit is further specifically configured to:
converting all voice call information into an OWT format;
carrying out mixed flow on the data in the OWT format;
and converting the mixed OWT format data into voice mixed stream information.
In a third aspect, the present application provides a universal live voice intercom system, including:
the PC browser is used for enabling the user to access the MRC page through the PC browser to send out a voice call demand;
MRC, is used for controlling OWT to create the voice call room; establishing information flow links between each terminal and a voice call room in the system;
MRS, used for obtaining the voice call information pushed by the terminal and forwarding to OWT; pushing the voice mixed stream information generated by OWT to each terminal;
OWT for creating a voice call room; processing the voice call information to obtain voice mixed flow information;
the terminal is used for pushing voice call information to the MRS; and receiving the voice mixed flow information pushed by the MRS.
In summary, the present application includes the following beneficial technical effects:
the universal live voice intercom system can realize low-delay live voice intercom, is higher in instantaneity, can adapt to bandwidth change more efficiently, supports point-to-point direct communication, and can realize firewall penetration more easily.
Drawings
Fig. 1 is a schematic diagram of a universal live voice intercom system architecture according to an embodiment of the present application
Fig. 2 is a flowchart of a method for speaking universal live voice in a communication system according to an embodiment of the present application.
Fig. 3 is a functional block diagram of a universal live voice intercom device according to an embodiment of the present application.
Reference numerals:
a demand receiving unit 110; a room creation unit 120; a link establishment unit 130; an information acquisition unit 140; a mixed flow processing unit 150; the information pushing unit 160.
Detailed Description
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present application, but not all embodiments.
In the description of embodiments of the present application, words such as "exemplary," "such as" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "illustrative," "such as" or "for example" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "illustratively," "such as" or "for example," etc., is intended to present related concepts in a concrete fashion.
In the description of the embodiments of the present application, the term "plurality" means two or more unless otherwise indicated. For example, a plurality of systems means two or more systems, and a plurality of screen terminals means two or more screen terminals. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating an indicated technical feature. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
As shown in fig. 1, the system for universal broadcasting voice intercom according to an embodiment of the present invention is composed of five parts, namely a PC browser, an MRC (Media Reactor Center ), an MRS (Media Reactor Switch, media reactor exchange service), OWT (OpenWeb Text), and a terminal. The MRC control terminal is used for realizing voice communication with any one or more terminals in the live broadcast service, and can also realize voice communication between the terminals.
The information interaction between the five parts includes two types, one is a data stream, corresponding to device information, status information, live room information, etc., and the other is a media stream, corresponding to audio-video media information. The data stream is centered on the MRC and interacts with several other parts, respectively. The media stream is generated by the terminal or the PC browser and then is pushed to the terminal or the PC browser through the processing of MRS and OWT, respectively.
The PC browser end is used by a background operator and can access an MRC page of the cloud through the PC browser. MRC, MRS, OWT set up in the high in the clouds server, the terminal is carried by the personnel of external service and uses.
Specifically, the PC browser is used for enabling a user to access the MRC page through the PC browser to send out a voice call demand;
MRC, is used for controlling OWT to create the voice call room; establishing information flow links between each terminal and a voice call room in the system;
MRS, used for obtaining the voice call information pushed by the terminal and forwarding to OWT; pushing the voice mixed stream information generated by OWT to each terminal;
OWT for creating a voice call room; and processing the voice call information to obtain voice mixed flow information.
The following describes the method for speaking the universal live voice in detail by combining the architecture of the universal live voice speaking system.
As shown in fig. 2, the live voice intercom method according to an embodiment of the present invention includes:
step S101, receiving a voice call request sent by a user.
The voice call demand is typically initiated by the user accessing the MRC page through the PC browser, while the terminal will display the status of the call being made. This indicates that the terminal has successfully established a connection with the MRC server and is waiting to receive and transmit voice call data. The terminal prompts that the user can know the current call state in the call so that the user can better manage the voice call.
Step S102, creating a voice call room;
after receiving the voice call request, the MRC creates a voice call room (chat room) according to the request content. The terminal participating in the call will join the voice call room.
Step S103, establishing information flow links between each terminal and the voice call room in the system.
For a terminal to join a call, the MRC needs to establish an information flow link for the terminal voice call room. The specific implementation mode is that MRC applies for two paths of RTMP stream addresses for the created voice call room, one path is used for transmitting the voice call information pushed by the MRS transmission terminal, and the other path is used for transmitting the voice mixed stream information. Two stream addresses are then set to the terminal, and the two RTMP stream addresses are set to the terminal device.
Among them, RTMP is a real-time messaging protocol, a transmission protocol for audio, video and data streams. In the RTMP protocol, data transmission is performed through the TCP protocol.
It should be noted that as other embodiments of the present invention, other various protocol accesses are also supported, including WebRTC, RTSP, HLS, MPEG-DASH, etc.
After the information flow link is established, the voice call can be started.
Step S104, obtaining voice call information pushed by the terminal.
After the voice call information is collected by the terminal, the media stream is transmitted to the cloud server through the stream address, and then the OWT is used for processing.
Step S105, the voice call information is processed to obtain voice mixed flow information.
For the processing process of the voice call information, aiming at the terminals entering a voice call room, acquiring the voice call information pushed by each terminal;
and mixing streams of other terminals except the target terminal to obtain voice mixed stream information aiming at the target terminal.
It is mainly noted that, when the mixed stream is performed, the content of the mixed stream is different for the voice mixed stream information to be pushed by different terminals in the room, and the voice mixed stream information pushed to each target terminal is obtained by mixing voice call information of other terminals except the target terminal.
For example, there are four terminals A, B, C, D in the room, and the voice mixed stream information pushed to the a terminal is obtained by mixing voice call information of the three terminals B, C, D.
The specific execution process is carried out by OWT, and all voice call information is firstly converted into an OWT format; then carrying out mixed flow on the data in the OWT format; and finally, converting the OWT format data after mixed streaming into voice mixed streaming information.
The way of OWT processing voice call information is that firstly voice information is converted into text by using voice recognition technology, the converted text information is represented by using OWT format, then the converted text information is mixed, and finally mixed stream OWT format data is converted into voice mixed stream information. The OWT format is an open standard for describing and sharing text. It is based on Web technology such as HTML, CSS and JavaScript and uses JSON-LD format for data exchange.
The advantage of using the OWT format to process voice call information is that voice is converted into text, and then the text is represented using the OWT format, and sharing of voice information can be achieved by sharing the OWT format information without directly transmitting audio files. Furthermore, OWT is based on the WebRTC protocol, and a real-time communication technology is adopted, so that lower time delay is realized, and the real-time performance is better. And the data stream transmission is adopted, so that the method can adapt to bandwidth change more efficiently, supports point-to-point direct communication, and can realize firewall penetration more easily.
In an embodiment of the present application, the audio format of the supported voice call information includes: AAC, AC3, OPUS, G722, iSAC, iLBC, PCMU, PCMA, nellymoser, etc.
Step S106, pushing the voice mixed stream information to each terminal.
The voice mixed stream information is pushed to each terminal through another path stream address applied before. And the terminal pulls the voice mixed stream information through the set stream address.
In the above process, besides realizing the voice call between the terminals, the PC browser side can also acquire voice call information and voice mixed stream information, and the specific implementation manner is the same as that of the terminals and is realized through two-way stream addresses. In this way, the user of the PC browser can also participate in an online call in the voice call room.
The above procedure is a principle description of live voice call, and on the basis of the above scheme, a method for managing users and terminals in an established voice call room (chat room) is described below. The user here refers only to the user accessing the MRC through the PC browser.
Typically, the user who is applying for the establishment of the chat room is taken as an administrator. The administrator has the right to invite other users, different users under the same organization can be invited to enter the same chat room, and when the administrator invites other users, the users need to be invited to enter after agreeing. When inviting the terminal in the chat room, the administrator does not need to confirm the terminal, and for the terminal which has entered the chat room, the administrator has the right to decide whether to agree when inviting the terminal in the chat room by other users, and if agreeing, the chat room immediately disconnects the terminal.
Meanwhile, based on the terminal information of entering the chat room acquired by the MRC, the user can view the chat room member list, and can execute the functions of wheat disabling, member deleting and the like for each member.
After the conversation of a certain chat room is completed, destroying the established chat room.
As shown in fig. 3, the apparatus for universal live voice intercom provided by the embodiment of the present invention includes:
a demand receiving unit 110, configured to receive a voice call demand issued by a user;
a room creation unit 120 for creating a voice call room;
a link establishment unit 130, configured to establish information flow links between each terminal in the system and the voice call room;
an information obtaining unit 140, configured to obtain voice call information pushed by the terminal;
a mixed stream processing unit 150, configured to process the voice call information to obtain voice mixed stream information;
and an information pushing unit 160 for pushing the voice mixed stream information to each terminal.
As a preferred embodiment of the present invention, the link establishment unit 130 is specifically configured to:
applying two paths of RTMP stream addresses for the created voice call room, wherein one path is used for transmitting the voice call information pushed by the MRS transmission terminal and the other path is used for transmitting the voice mixed stream information;
setting two paths of RTMP stream addresses to the terminal.
As a preferred embodiment of the present invention, the mixed-flow processing unit 150 is specifically configured to:
aiming at terminals entering a voice call room, acquiring voice call information pushed by each terminal;
and mixing streams of other terminals except the target terminal to obtain voice mixed stream information aiming at the target terminal.
As a preferred embodiment of the present invention, the mixed-flow processing unit 150 is further specifically configured to:
converting all voice call information into an OWT format;
carrying out mixed flow on the data in the OWT format;
and converting the mixed OWT format data into voice mixed stream information.
The universal broadcast voice intercom device provided by the embodiment of the invention is used for realizing the universal broadcast voice intercom method, so that the specific implementation is the same as the method and is not repeated here.
In summary, the method, the device and the system for the universal live voice intercom provided by the embodiment of the invention can realize the live voice intercom with lower time delay, have higher instantaneity, can adapt to bandwidth change more efficiently, support point-to-point direct communication and can realize firewall penetration more easily.
In several embodiments disclosed in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the present disclosure. That is, equivalent changes and modifications are contemplated by the teachings of this disclosure, which fall within the scope of the present disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure.
This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a scope and spirit of the disclosure being indicated by the claims.

Claims (9)

1. The universal live voice intercom method is characterized by being applied to a universal live voice intercom system, and comprises the following steps:
receiving a voice call demand sent by a user;
creating a voice call room;
establishing information flow links between each terminal and a voice call room in the system;
acquiring voice call information pushed by a terminal;
processing the voice call information to obtain voice mixed flow information;
and pushing the voice mixed stream information to each terminal.
2. The method for speaking-over-live voice as claimed in claim 1, wherein the step of establishing the information stream link between each terminal in the system and the voice call room specifically comprises:
applying two paths of RTMP stream addresses for the created voice call room, wherein one path is used for transmitting the voice call information pushed by the MRS transmission terminal and the other path is used for transmitting the voice mixed stream information;
setting two paths of RTMP stream addresses to the terminal.
3. The method for speaking-over-live voice as claimed in claim 2, wherein said step of processing said voice call information to obtain voice mixed stream information comprises:
aiming at terminals entering a voice call room, acquiring voice call information pushed by each terminal;
and mixing streams of other terminals except the target terminal to obtain voice mixed stream information aiming at the target terminal.
4. The method for speaking-over-live voice as claimed in claim 3, wherein the step of mixing the other terminals except the target terminal to obtain the voice mixed stream information for the target terminal specifically comprises:
converting all voice call information into an OWT format;
carrying out mixed flow on the data in the OWT format;
and converting the mixed OWT format data into voice mixed stream information.
5. A universal live voice intercom device, characterized in that it is applied to a universal live voice intercom system, said device comprising:
the demand receiving unit is used for receiving the voice call demand sent by the user;
a room creation unit for creating a voice call room;
the link establishment unit is used for establishing information flow links between each terminal in the system and the voice call room;
the information acquisition unit is used for acquiring voice call information pushed by the terminal;
the mixed flow processing unit is used for processing the voice call information to obtain voice mixed flow information;
and the information pushing unit is used for pushing the voice mixed stream information to each terminal.
6. The universal live voice intercom method as in claim 5 wherein said link establishment unit is specifically configured to:
applying two paths of RTMP stream addresses for the created voice call room, wherein one path is used for transmitting the voice call information pushed by the MRS transmission terminal and the other path is used for transmitting the voice mixed stream information;
setting two paths of RTMP stream addresses to the terminal.
7. The universal live voice intercom method as in claim 6 wherein said mixed stream processing unit is specifically configured to:
aiming at terminals entering a voice call room, acquiring voice call information pushed by each terminal;
and mixing streams of other terminals except the target terminal to obtain voice mixed stream information aiming at the target terminal.
8. The universal live voice intercom method as in claim 6 wherein said mixed stream processing unit is further specifically configured to:
converting all voice call information into an OWT format;
carrying out mixed flow on the data in the OWT format;
and converting the mixed OWT format data into voice mixed stream information.
9. A universal live voice intercom system, comprising:
the PC browser is used for enabling the user to access the MRC page through the PC browser to send out a voice call demand;
MRC, is used for controlling OWT to create the voice call room; establishing information flow links between each terminal and a voice call room in the system;
MRS, used for obtaining the voice call information pushed by the terminal and forwarding to OWT; pushing the voice mixed stream information generated by OWT to each terminal;
OWT for creating a voice call room; processing the voice call information to obtain voice mixed flow information;
the terminal is used for pushing voice call information to the MRS; and receiving the voice mixed flow information pushed by the MRS.
CN202311635105.XA 2023-11-30 2023-11-30 Method, device and system for universal live broadcast voice intercom Pending CN117560358A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311635105.XA CN117560358A (en) 2023-11-30 2023-11-30 Method, device and system for universal live broadcast voice intercom

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311635105.XA CN117560358A (en) 2023-11-30 2023-11-30 Method, device and system for universal live broadcast voice intercom

Publications (1)

Publication Number Publication Date
CN117560358A true CN117560358A (en) 2024-02-13

Family

ID=89814632

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311635105.XA Pending CN117560358A (en) 2023-11-30 2023-11-30 Method, device and system for universal live broadcast voice intercom

Country Status (1)

Country Link
CN (1) CN117560358A (en)

Similar Documents

Publication Publication Date Title
US9565249B2 (en) Adaptive connectivity in network-based collaboration background information
US6836788B2 (en) Method for selecting RTP element in dynamic multicast tree for multimedia conference
EP2863632B1 (en) System and method for real-time adaptation of a conferencing system to current conditions of a conference session
KR101298342B1 (en) Mechanism for controlling a decentralized multi-party conference
CN112738140B (en) Video stream transmission method, device, storage medium and equipment based on WebRTC
US9621958B2 (en) Deferred, on-demand loading of user presence within a real-time collaborative service
US20110110505A1 (en) Mixed media conferencing
EP2124399A1 (en) A method, a device and a system for converging ip message
EP1131935B1 (en) Announced session control
JP2007329917A (en) Video conference system, and method for enabling a plurality of video conference attendees to see and hear each other, and graphical user interface for videoconference system
US20080037576A1 (en) Media broadcast over an internet protocol (IP) network
US20110137438A1 (en) Video conference system and method based on video surveillance system
WO2015131750A1 (en) Method, device and system for establishing multi-party call based on web rtc
US20150229487A1 (en) Systems and methods for automatic translation of audio and video data from any browser based device to any browser based client
JP7463552B2 (en) SESSION CREATION METHOD, ELECTRONIC DEVICE, AND READABLE STORAGE MEDIUM
KR102208187B1 (en) A method and system for integrating content viewing and communication in immersive social center session
CN106416185A (en) Publish/subscribe network enabled for multimedia signaling control, method for initiating a session within the network and respective network device
CN104869106A (en) Sound recording method, voice switching equipment, sound recording server and sound recording system
CN102469294A (en) Method and system for dynamically regulating media contents of video conference
US9473316B2 (en) Resource consumption reduction via meeting affinity
CN113949596A (en) Equipment connection method, device, equipment and storage medium
US11716363B2 (en) Messaging resource function
US10764075B2 (en) Real time application programming interface in a telecommunications network
US20220391452A1 (en) Method for conducting an audio and/or video conference
CN117560358A (en) Method, device and system for universal live broadcast voice intercom

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination