CN117615169A

CN117615169A - Audio media distribution method, device and system and electronic equipment

Info

Publication number: CN117615169A
Application number: CN202311578446.8A
Authority: CN
Inventors: 高少良; 孙浩; 黄伟胜; 王鹏; 黄文华; 邓衍煜; 梁苑文; 仇国祥; 丘凌; 王刚; 邵亚红; 叶冉
Original assignee: Tianyi Digital Life Technology Co Ltd
Current assignee: Tianyi Digital Life Technology Co Ltd
Priority date: 2023-11-23
Filing date: 2023-11-23
Publication date: 2024-02-27

Abstract

The application discloses an audio media distribution method, device, system and electronic equipment, wherein the method comprises the following steps: obtaining a push address; establishing connection with a client, receiving a control instruction sent by the client, and setting a push flow parameter according to the control instruction; receiving audio data sent by a client; transcoding the audio data according to the audio format in the push stream parameters; and correspondingly distributing the audio data in various formats obtained by transcoding to the push stream addresses so as to push the transcoded audio data to each downstream voice gateway address corresponding to the push stream addresses. The method and the device can uniformly distribute the audio data to each audio device with different transmission protocols and transcoding protocols, and can be widely applied to the technical field of audio transmission and transcoding.

Description

Audio media distribution method, device and system and electronic equipment

Technical Field

The application relates to the technical field of audio transmission and transcoding, in particular to an audio media distribution method, an audio media distribution device, an audio media distribution system and electronic equipment.

Background

The current broadcasting system adopts audio devices of different manufacturers, and the audio devices of each manufacturer have different transcoding protocols and audio transmission protocols, so that when the audio is played, the audio devices are difficult to uniformly manage, and the playing efficiency is low.

Disclosure of Invention

In view of this, the present application provides a method, apparatus, system and electronic device for distributing audio media, so as to uniformly distribute audio data to each audio device with different transmission protocols and transcoding protocols, thereby improving broadcasting efficiency.

An aspect of the present application provides an audio media distribution method, including:

obtaining a push address;

establishing connection with a client, receiving a control instruction sent by the client, and setting a push flow parameter according to the control instruction;

receiving audio data sent by the client;

transcoding the audio data according to the audio format in the push parameters;

and correspondingly distributing the audio data in various formats obtained by transcoding to the push address so as to push the transcoded audio data to each downstream voice gateway address corresponding to the push address.

Optionally, the obtaining the push address includes:

responding to a broadcasting task request of a first server, and matching according to a preset scheduling strategy to obtain a push address; the broadcasting task request is obtained by the first server according to the request of the client.

Optionally, before the connection is established with the client, the method further includes:

Generating session information and authentication information according to a plurality of the downstream voice gateway addresses;

returning the push address, the session information and the authentication information to the first service end so that the first service end returns the push address, the session information and the authentication information to the client;

the establishing connection with the client comprises the following steps:

responding to a connection handshake request of the client, receiving and verifying the push address, the session information and the authentication information which are transmitted by the client;

and after passing the verification, establishing a TCP long connection with the client and maintaining session information.

Optionally, the matching according to a preset scheduling policy to obtain the push address includes:

acquiring the IP address of the client;

searching the streaming media service with the shortest transmission link and the utilization rate lower than a set threshold according to the IP address, the current TCP connection number, the service bandwidth, the CPU utilization rate and the memory utilization rate;

and obtaining the address of the streaming media service as the push address.

Optionally, the receiving the audio data sent by the client includes:

receiving a push flow opening instruction sent by the client, and further requesting to establish connection with a gateway corresponding to each downstream voice gateway address and opening a pull flow;

And after receiving the push-stream starting instruction, receiving the audio data sent by the client.

Optionally, after the receiving the start push command sent by the client, the method further includes:

and receiving a stream pushing stopping instruction sent by the client and stopping stream pushing.

Another aspect of the present application also provides an audio media distribution apparatus, including:

the first unit is used for acquiring a push address;

the second unit is used for establishing connection with the client, receiving a control instruction sent by the client and setting a plug flow parameter according to the control instruction;

a third unit, configured to receive audio data sent by the client;

a fourth unit, configured to transcode the audio data according to an audio format in the push parameters;

and a fifth unit, configured to correspondingly distribute the audio data in various formats obtained by transcoding to the push address, so as to push the transcoded audio data to each downstream voice gateway address corresponding to the push address.

Another aspect of the present application also provides an audio media distribution system, comprising: the system comprises a client, a first service, a second service, a plurality of downstream voice gateways and a plurality of audio devices;

The client is used for sending plug flow parameters and audio data to the second server;

the first server is used for responding to the request of the client and requesting a downstream voice gateway address from the second server;

the second server is configured to execute the foregoing audio media distribution method;

the downstream voice gateway is configured to receive the transcoded audio data distributed by the second server, and send the transcoded audio data to the audio device;

the audio device is used for playing the transcoded audio data.

Optionally, the second service end includes a dispatch center module, a websocket service module, and a streaming media transcoding/distribution module;

the dispatching center module is used for acquiring a push address;

the websocket service module is used for establishing connection with a client, receiving a control instruction sent by the client and setting a plug flow parameter according to the control instruction; receiving audio data sent by the client;

the streaming media transcoding/distributing module is used for transcoding the audio data according to the audio format in the push parameters; and correspondingly distributing the audio data in various formats obtained by transcoding to the push address so as to push the transcoded audio data to each downstream voice gateway address corresponding to the push address.

Another aspect of the present application also provides an electronic device, including a processor and a memory;

the memory is used for storing programs;

the processor executes the program to implement the audio media distribution method.

Another aspect of the present application also provides a computer-readable storage medium storing a program that is executed by a processor to implement a method of audio media distribution as described above.

The application also discloses a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read from a computer-readable storage medium by a processor of an electronic device, and executed by the processor, to cause the electronic device to perform the foregoing method of audio media distribution.

Firstly, acquiring a push address; then establishing connection with the client, receiving a control instruction sent by the client, and setting a push flow parameter according to the control instruction; receiving audio data sent by the client; transcoding the audio data according to the audio format in the push stream parameters; finally, the audio data in various formats obtained by transcoding are correspondingly distributed to the push stream addresses, so that the transcoded audio data are pushed to each downstream voice gateway address corresponding to the push stream addresses, and the gateway corresponding to the downstream voice gateway address plays the audio data. The method and the device can serve as middleware for audio data transmission and transcoding, can be connected with different audio networks and protocols, and further can convert different voice formats and codes, so that media communication between different networks is possible. The method and the device have the advantages that the interoperability and compatibility of the audio data between different networks and audio devices are considered, the audio intercommunication between the different networks and the audio devices can be realized, the high-quality audio communication and media experience are provided, and the method and the device can be widely applied to the fields of communication, entertainment, broadcasting and the like.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of an audio media distribution system according to an embodiment of the present application;

FIG. 2 is an exemplary block diagram of an audio media distribution system provided in an embodiment of the present application;

fig. 3 is a flowchart of an audio media distribution method according to an embodiment of the present application;

fig. 4 is a schematic flow chart of matching a downstream voice gateway address according to an embodiment of the present application;

fig. 5 is a schematic flow chart of obtaining a web streaming address according to an embodiment of the present application;

fig. 6 is a schematic flow chart of a web streaming media push flow provided in an embodiment of the present application;

fig. 7 is a schematic flow chart of audio data transcoding and distribution according to an embodiment of the present application;

FIG. 8 is a diagram comparing the present application with the prior art provided in the examples of the present application;

Fig. 9 is a block diagram of an audio media distribution device according to an embodiment of the present application;

fig. 10 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart.

The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.

In order to facilitate understanding of the embodiments of the present application, keywords that may be related to the embodiments of the present application are explained:

PCM (Pulse Code Modulation) pulse code modulation is one of the coding schemes for digital communications. The main process is to sample analog signals such as voice and image at regular intervals to discretize, round the sampled value into integer quantization according to layering unit, and represent the sampled value with a set of binary codes to represent the amplitude of the sampled pulse.

RTSP (Real Time Streaming Protocol) the real-time streaming protocol is an application layer protocol in the TCP/IP protocol architecture that defines how efficiently a one-to-many application program transmits multimedia data over an IP network.

RTMP (Real-Time Messaging Protocol) Real-time message transmission protocol is the main protocol of live broadcast at present, and is an application layer private protocol designed by Adobe company for providing audio and video data transmission service between Flash player and server.

WebSocket is a protocol that performs full duplex communication over a single TCP connection. In the WebSocket API, the browser and the server only need to complete one handshake, and can directly create persistent connection between the two and perform bidirectional data transmission.

Downstream voice gateways generally refer to voice enabled device access gateways. For example, national standard 28181 accesses a gateway and the audio transmission adopts the SIP protocol.

Pushing flow: a process of pushing live content (live content may be audio data) to a server (which may be a server of a downstream voice gateway). I.e. the process of transmitting the packaged content of the acquisition phase to the server. It is the process of transmitting signals in the field to the network. The push flow has higher requirements on the network, if the network is unstable, the live effect is poor, and the phenomena of blocking and the like can occur during live broadcasting. The implementation of push streaming requires that audio and video data be encapsulated using a transport protocol to become streaming data. The typical streaming protocols include RTSP, RTMP, HLS, and the delay of using RTMP transmission is usually 1-3 seconds, so that RTMP is the most typical streaming protocol in live mobile phone broadcasting for a scene with very high real-time requirement. And finally, pushing the audio and video stream data to a network end through a certain Qos algorithm, and distributing through the CDN.

And (3) drawing: refers to the existing live content of the server, and the pulling process is carried out by using the designated address. That is, there are streaming audio and video files in the server, and these audio and video files are read according to different network protocol types (such as RTMP, RTSP, HTTP, etc.), which is called streaming, for example, the audio and video files are stored in the server, and the user obtains the audio and video through HTTP (or RTMP/RTSP) protocol, that is, web page, which is a streaming process, and there are three elements in this process: 1-server [ device providing audio and video file storage ]; 2-transmission protocol [ how to transmit audio and video ]; 3-reading terminal [ through which terminal audio/video is played ].

The method and the system aim to rapidly and accurately broadcast the audio streams to a plurality of intelligent broadcasting terminals so as to meet the requirements of emergency broadcasting, public safety notification and collective notification. The method and the system can realize rapid audio transmission, multipath audio processing and reliable downstream voice gateway management, and provide a reliable and efficient solution for real-time audio megaphone.

One of the objects of the invention of the present application is to achieve a fast audio transmission to ensure real-time and timeliness. According to the method and the device, the audio data can be rapidly transmitted to the downstream voice gateway, so that the downstream voice gateway can receive and process the audio information in real time.

One of the objectives of the invention is to realize audio multiplexing distribution and audio transcoding, through the method, one audio stream can be simultaneously distributed and output multiple audio streams with different protocols, and audio transcoding is performed according to audio coding supported by a downstream voice gateway so as to meet the requirement of complex broadcasting scenes.

First, an audio media distribution system provided in the present application will be described, and referring to fig. 1, the system may include: the system comprises a client, a first service, a second service, a plurality of downstream voice gateways and a plurality of audio devices;

the second server is used for executing an audio media distribution method provided by the application; this method will be described in detail below;

the audio device is used for playing the transcoded audio data.

Further, the second service end comprises a dispatching center module, a websocket service module and a streaming media transcoding/distributing module;

the dispatching center module is used for acquiring a push address;

The second server side obtains corresponding downstream voice gateway addresses from a plurality of downstream voice gateways, the client side obtains plug flow addresses corresponding to the plurality of downstream voice gateway addresses through the second server side, and the client side establishes connection with the second server side; the second server is responsible for audio transcoding, establishing connection and distribution with a plurality of downstream voice gateways.

An example flow is as follows, including S1 to S10:

S1: the client requests the first service end to create a real-time broadcasting task;

s2: the first server side further requests the second server side to acquire the address of the downstream voice gateway;

s3: firstly, a second server requests a downstream voice gateway according to a broadcasting task to acquire a downstream voice gateway address; the second server side stores a downstream voice gateway address corresponding to the broadcasting task;

s4: then, the second server matches the optimal push address of the second server according to the scheduling policy, generates session and authentication information, and returns the push address to the first server;

s5: the first server side sends a return result to the client side;

s6: the client establishes connection with a second server, the second server verifies authentication information, and session information is stored after verification is passed;

s7: after establishing connection, the client sends an instruction to set a push flow parameter and start push flow;

s8: after receiving the instruction, the second server-side reversely checks the address of the downstream audio gateway through the session information and establishes connection with the downstream voice gateway;

s9: after the client starts the push flow, audio data is sent;

s10: and after receiving the audio data, the second server terminal transcodes and forwards the transcoded audio data to the downstream voice gateway.

For ease of understanding, this embodiment provides an alternative specific example for illustration.

Referring to fig. 2, the present embodiment provides an exemplary block diagram of an audio media distribution system.

Specifically, the client calls the recording device to record, the recording device collects the PCM audio frames in real time and encodes the PCM audio frames into G711A audio frames in real time, the audio stream is transmitted to the real-time audio media gateway in real time by using a websocket protocol, the streaming media transcoding/distribution service of the real-time audio media gateway decodes the G711A audio frames into the PCM audio frames, then the PCM audio frames are recoded into MP3 and the like according to the audio format of the downstream voice gateway, the encoded audio frames are transmitted through the transmission protocol corresponding to the downstream voice gateway, the downstream voice gateway pushes the encoded audio frames to the audio device, and finally the audio device broadcasts the audio. In the distribution process, when the real-time audio media gateway and the downstream voice gateway establish an audio transmission channel, the implementation manner adopted in this embodiment is as follows: when the client pushes the stream, a stream pushing starting instruction is sent to trigger stream pushing, and then the audio data sent by the client is transmitted to a downstream voice gateway.

As an alternative implementation manner, regarding the downstream voice gateway transmission protocol, taking the example of the national standard 28181 access gateway, the audio transmission protocol refers to the national standard 28181 protocol audio transmission specification, the SIP protocol is adopted to control the audio stream, and the RTP protocol is adopted to transmit the audio frames.

In order to realize rapid audio transmission, audio multipath distribution and audio transcoding, the embodiment of the application also provides an audio media distribution method, which distributes and outputs one path of audio streams simultaneously to multipath audio streams with different protocols, and performs audio transcoding according to audio codes supported by a downstream voice gateway. The audio media distribution method can be applied to a user terminal, a server or an implementation environment formed by the user terminal and the server. In addition, the audio media distribution method may also be software running in a user terminal or a server, such as an application program having audio transmission and transcoding functions, etc. The user terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms and the like.

An audio media distribution method provided by an embodiment of the present application may include the following components:

signal transmission assembly: the component is responsible for transmitting voice signals from one network to another. It may support different transport protocols such as IP networks, etc.

An audio transport protocol component: the audio transmission protocol defines the manner and rules of transmission of audio signals in the network, such as RTSP, RTMP, etc., for audio data transmission between different networks.

Audio codec component: the sound in nature is very complex, the waveform is extremely complex, and pulse code modulation coding, namely PCM coding, is usually adopted. Audio coding techniques are used to compress PCM audio data to reduce bandwidth when transmitted over a network. Audio decoding refers to decompressing audio data to PCM data for playback by a device. Common audio compression algorithms include MP3, AAC, etc.

Signaling processing component: the component is responsible for handling signaling information in voice communications, such as connection establishment, connection maintenance, initiation of push, initiation of pull, etc. It may support different signaling protocols.

An audio signal processing component: the component is responsible for processing the audio signal such as audio enhancement, noise suppression, echo cancellation, etc. It can improve the quality and definition of voice communication.

An audio format conversion component: audio format conversion techniques are used to convert between different audio formats to achieve compatibility between different systems and devices. For example, audio in WAV format is converted to audio in MP3 format.

Next, a detailed description will be given of an audio media distribution method provided in an embodiment of the present application, and referring to fig. 3, the method may include steps S300 to S340, specifically as follows:

s300: and obtaining the push address.

Specifically, the downstream voice gateway addresses may also be referred to as web streaming addresses, and the audio devices corresponding to the downstream voice gateway addresses are devices that need to play audio data.

Further, S300 may include:

Still further, referring to fig. 4, the step of matching to obtain the push address according to the preset scheduling policy may include S401 to S403:

s401: acquiring the IP address of the client;

s402: searching the streaming media service with the shortest transmission link and the utilization rate lower than a set threshold according to the IP address, the current TCP connection number, the service bandwidth, the CPU utilization rate and the memory utilization rate;

S403: and obtaining the address of the streaming media service as the push address.

Specifically, the method of the embodiment can be applied to the second service end as a real-time voice gateway, the second service end can adopt a cluster distributed architecture, has service registration capability, serves as a dispatching center role, supports the deployment of streaming media service instances and streaming media transcoding/distribution services across multiple nodes, registers service information to the dispatching center through the streaming media services and the transcoding/distribution services deployed at each node, sends heartbeat at regular time to keep alive service states, and reports the service state information at regular time.

web real-time streaming media scheduling: the dispatching center calculates each index to obtain a weight value according to the TCP connection number, service bandwidth, CPU utilization rate, memory utilization rate and other service state information reported by the web streaming media service, and the dispatching decision algorithm finds out the best matched web streaming media service, namely the required target gateway according to the weight value.

Specifically, referring to fig. 5, taking the system shown in fig. 2 as an example, the step of obtaining the web streaming address may include:

the web client (client for short) requests the first server to acquire the web streaming media address and token authentication information.

The method comprises the steps that a first service end creates a real-time broadcasting task, a third party platform is called to obtain a downstream voice gateway address, a dispatching center matches optimal web streaming media service according to a dispatching strategy, a session information ID (identity) and a token are generated, a transcoding and distributing task is issued simultaneously, and after the first service end finishes processing, information such as the web streaming media address, the session Id and the token is returned to a client.

And the client receives the response result and analyzes and obtains the web streaming media address, the sessionId and the token authentication information.

S310: and establishing connection with the client, receiving a control instruction sent by the client, and setting a push flow parameter according to the control instruction.

In particular, the push parameters may include audio coding format, sampling rate, channel number, bit rate, etc.

Further, before the connection is established with the client in S310, the embodiment may further include:

generating session information and authentication information according to a plurality of the downstream voice gateway addresses; and returning the push address, the session information and the authentication information to the first service end so that the first service end returns the push address, the session information and the authentication information to the client.

The step of establishing a connection with the client may further comprise:

responding to a connection handshake request of the client, receiving and verifying the push address, the session information and the authentication information which are transmitted by the client; and after passing the verification, establishing a TCP long connection with the client and maintaining session information.

Specifically, referring to fig. 6, taking the system shown in fig. 2 as an example, the web streaming media push may include the following steps:

Establishing websocket connection: the web client sends a websocket connection handshake request, and transmits sessionId and token in a reference mode. The web streaming media server checks the sessionId and token, and the verification is performed by completing handshake, so as to keep TCP long connection and session information.

And (3) stream media control: the web client sends control instructions and sets push parameters including audio coding format, sampling rate, channel number, bit rate and the like. After the setting of the push parameters is completed, a push starting instruction is sent to inform the web streaming media server to start push.

And after receiving the instruction for starting the plug flow, the web streaming media server generates a plug flow event and reports the event to the dispatching center.

The web client starts recording, receives a system recording callback, processes audio data in the callback, and sends the audio data through websocket.

After receiving the audio data, the web streaming media server inquires the information of the pull streaming websocket client through the sessionId, and sends the audio streaming data to the pull streaming websocket client.

S320: and receiving the audio data sent by the client.

Specifically, the client may obtain audio data through various recording devices, or may directly obtain local audio data, or may obtain audio data sent by other terminals, so that the client sends the audio data to the second server, and the second server receives the audio data for subsequent transcoding and distribution.

Further, S320 may include:

receiving a push flow opening instruction sent by the client, and further requesting to establish connection with a gateway corresponding to each downstream voice gateway address and opening a pull flow; and after receiving the push-stream starting instruction, receiving the audio data sent by the client.

Further, after the receiving the start push command sent by the client, the embodiment may further include:

Specifically, referring to fig. 7, taking the system shown in fig. 2 as an example, the flow of audio data transcoding and distribution may include:

after receiving the push event, the scheduling center inquires the real-time broadcasting task information, generates a transcoding/distributing task according to the task information, and transmits the transcoding/distributing task to the streaming media distributing service.

Input-output streaming media initialization. The streaming media distribution service establishes connection with the web streaming media service, sends a command for starting streaming media to inform the web streaming media service, and completes streaming initialization; and the streaming media distribution service establishes an audio transmission protocol channel with the downstream voice gateway to finish streaming media distribution initialization.

The streaming media distribution service receives the audio data, traverses the downstream voice gateway list, asynchronously processes the audio transcoding, and pushes the transcoded audio data to the downstream voice gateway.

S330: and transcoding the audio data according to the audio format in the push stream parameters.

Specifically, the web client sets a transmission audio format when pushing stream, transmits audio data, and the web streaming media server (i.e. the second server) performs format conversion on the audio data according to the audio format received by the downstream voice gateway. The audio transcoding format supports ACC, MP3, PCM, G711, etc.

S340: and correspondingly distributing the audio data in various formats obtained by transcoding to the push address so as to push the transcoded audio data to each downstream voice gateway address corresponding to the push address.

Specifically, the downstream voice gateway has enterprise standards, industry standards, such as RTMP, RTSP, etc., and distributes audio streams according to the downstream voice gateway protocol, so as to distribute the client transmission audio to the downstream multipath voice gateway.

In order to facilitate a clearer understanding of the present application, the present application will be described below in terms of one complete alternative example.

First, the functions that can be realized by the present embodiment are described, including the following a) to d):

a) Streaming media scheduling decision: the streaming media scheduling decisions include web real-time streaming media scheduling decisions and streaming media real-time transcoding/distribution scheduling decisions. The web real-time streaming media scheduling decision is to analyze the TCP connection number, service bandwidth, CPU utilization rate, memory utilization rate and the like of a service end, and a scheduling decision algorithm searches the best matching streaming media service with the shortest transmission link service being idle according to the calculation of each index of the service and the judgment of the source IP of a client.

b) web real-time transport media stream: a real-time audio stream transmission protocol is realized based on websocket, control instruction data is sent by taking a binaryType as a text, and audio data is sent by taking the binaryType as an arraybuffer. The control instruction comprises setting stream media parameters, starting stream pushing, stopping stream pushing and the like. The audio data coding format supports ACC, MP3, PCM, G711, etc., and the sampling rate is 8 khz-44 khz, and supports dual channels. The websocket-based real-time audio streaming transmission protocol has good support to the web browser and low time delay, and can better support platforms such as a web end, a mobile phone end and the like to realize intelligent cloud broadcasting real-time broadcasting application.

c) Streaming media real-time transcoding capability: the audio data codes have multiple formats, when different equipment manufacturers realize intelligent cloud broadcasting equipment, the audio coding formats are different, and incompatible audio streams often appear across manufacturer equipment. When the intelligent cloud broadcasting service realizes the real-time voice broadcasting service, a real-time transcoding capability is needed to be realized, a client inputs a standard real-time audio stream, and a server converts audio data into data in another audio format in real time according to the audio format supported by a manufacturer and outputs the data to a downstream voice gateway.

The main processing procedure of the audio transcoding capability is as follows: receiving input and output standard audio streams, wherein parameters comprise a stream media address, a stream media protocol, an audio format, a sampling rate, a channel number and a bit rate; according to various audio format packaging characteristics, one input audio format data is converted into another audio format data through a transcoding algorithm.

d) Streaming media real-time multi-path distribution capability: the streaming media real-time multipath distribution capability refers to that a client inputs one path of audio stream, and distributes the audio stream to a plurality of downstream voice gateways through a server, so that real-time audio is mixed broadcast across manufacturer equipment. The real-time multiplexing and distributing process is as follows: the server receives a standard audio stream, adopts a route multiplexing and multithreading concurrency technology, utilizes an operating system buffer technology to buffer audio data, and circularly writes the audio stream data into a multipath output stream in an asynchronous thread mode.

The specific implementation method of asynchronous thread multipath distribution is as follows: firstly, an audio frame buffer is created for each downstream distribution audio, the encoded audio frames are written into the buffer, and then the thread pool manager is submitted to distribute threads to execute the transmission of the audio frames. The downstream voice gateway is abstracted into an audio stream pipe, and the sending audio frames are abstracted into audio stream pipe writing data. The thread pool manager distributes threads for the pushing of the audio streams, ensures that each audio stream can only submit one thread request at most, and avoids the blocking of the thread pool. When the thread triggers execution, the audio stream pipeline is firstly taken out, then the audio frames are circularly read frame by frame from the corresponding buffer zone, the audio frames are written into the audio stream pipeline until the buffer zone is read completely, and the thread is ended and recovered.

Then, a specific implementation flow of this embodiment is described, including the following a) to d):

a) Service registration and scheduling:

the real-time voice gateway adopts a cluster distributed architecture, a dispatching center has service registration capability and plays a role of the dispatching center, the cross-multi-node deployment of streaming media service instances and streaming media transcoding/distribution services is supported, the streaming media services and the transcoding/distribution services deployed at each node register service information to the dispatching center, heartbeat is sent regularly to keep the service state alive, and the service state information is reported regularly.

web real-time streaming media scheduling: the scheduling center calculates each index to obtain a weight value according to the service state information such as TCP connection number, service bandwidth, CPU utilization rate, memory utilization rate and the like reported by the web streaming media service, and the scheduling decision algorithm finds out the best matched web streaming media service according to the weight value.

b) web plug flow:

session management: the web real-time streaming media checks login token information, manages websocket session information and session association real-time broadcasting task information.

Control instructions: the web real-time streaming media service provides control instructions for setting streaming media parameters, starting streaming, stopping streaming and the like, the web client sets transmission audio codes, sampling rates and sound channels, and sends the instructions to control the starting streaming and the stopping streaming.

Transmitting audio: the web client transmits the audio data by taking the binaryType as an arraybuffer, and the web streaming media server transmits the audio data to the downstream voice gateway according to the session information.

c) Stream media transcoding:

audio transcoding: the web client sets a transmission audio format when pushing stream, transmits audio data, and the web streaming media server performs format conversion on the audio data according to the audio format received by the downstream voice gateway. The audio transcoding format supports ACC, MP3, PCM, G711, etc.

d) Streaming media distribution:

audio distribution: the downstream voice gateway has enterprise standards, industry standards, such as RTMP, RTSP, etc., and distributes audio streams according to a downstream voice gateway protocol, and distributes client transmission audio to downstream multipath voice gateways.

In order to more clearly illustrate the embodiments of the present application, the application process of the present application will be described in more specific examples.

The scenario described in this embodiment is: the community property manager broadcasts community notification information through the intelligent cloud broadcasting system shown in fig. 2, broadcasting sound is transmitted and amplified through intelligent sound column equipment and loudspeaker equipment installed in the community, and residents in the community can listen to broadcast notification in real time.

An example scenario is as follows:

S101: and the community property manager A receives a community emergency notice and prepares to broadcast the notice information to the community owners through the intelligent cloud broadcasting system.

S102: property manager A logs in the intelligent community platform, enters the intelligent cloud broadcasting application, enters the real-time broadcasting page, checks the intelligent sound column equipment list and creates a real-time broadcasting task.

S103: and the real-time voice gateway system receives a request for creating the real-time broadcasting task, requests a downstream voice gateway to create the task and returns the streaming media service address information.

S104: the web client acquires the web streaming media service address information and establishes websocket connection with the server.

S105: property manager A clicks to start intercom mode, web client starts recording, browser pops up recording authorization popup window, after clicking to confirm, web client records positively.

S106: the web client sends a command for starting the push stream, and after the web real-time streaming media service receives the command, the push stream state is changed to Ready and the audio stream is Ready to be received. The web real-time streaming service notifies the dispatch center service to initiate a push event.

S107: after receiving the instruction of opening the push stream, the dispatching center service matches the information of the real-time broadcasting task and issues the transcoding distribution task.

S108: after receiving the task, the transcoding distribution service requests the web real-time streaming media service to establish connection, and starts the streaming, so as to prepare for receiving the audio stream input. Requesting the downstream voice gateway to establish a connection in preparation for distributing the audio stream.

S109: after the property manager A opens the intercom mode, the community emergency notification content is announced.

S110: the computer (i.e. the client) collects microphone audio in real time, and transmits audio data to the web streaming media service through the Internet network. The audio stream is transmitted in binary data.

S111: the web streaming media service receives the audio data, searches for a broadcasting task according to the websocket session id, searches for a transcoding forwarding service websocket client, and forwards the audio data to the client.

S112: the transcoding and distributing service receives the audio data through websocket and transcodes the audio data according to the audio format supported by the downstream voice gateway.

S113: after the audio is transcoded, the transcoded audio data is sent to a downstream voice gateway by the transcoded distribution service.

S114: and after receiving the audio data, the downstream voice gateway sends the audio data to the intelligent sound column equipment end.

S115: the device broadcasts the received audio in real time and amplifies the audio through the loudspeaker. Furthermore, residents in the community can hear the emergency notification content of the community declared by the property manager A in real time. The embodiment ends this.

In summary, according to the embodiments, the present application has the audio distribution capability of distributing one audio stream in multiple ways to multiple downstream voice gateways; the audio transcoding capability of converting the audio format is provided; web push protocol standards are defined; the model and manufacturer of the intelligent sound column equipment are not limited; multiple audio formats are supported for client push.

The comparison effect of the present application with the prior art can be seen with reference to fig. 8.

Referring to fig. 9, an embodiment of the present application provides an audio media distribution apparatus, including:

the first unit is used for acquiring a push address;

a third unit, configured to receive audio data sent by the client;

The specific implementation of the audio media distribution device is basically the same as the specific embodiment of the audio media distribution method, and will not be described herein.

The embodiment of the application also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the audio media distribution method when executing the computer program. Specifically, the electronic device may be a user terminal or a server.

In this embodiment, taking a computer device as an example, the computer device is a user terminal, the specific steps are as follows:

as shown in fig. 10, the computer device 1000 may include RF (Radio Frequency) circuitry 1010, memory 1020 including one or more computer-readable storage media, an input unit 1030, a display unit 1040, a sensor 1050, audio circuitry 1060, a short-range wireless transmission module 1070, a processor 1080 including one or more processing cores, and a power supply 1090. It will be appreciated by those skilled in the art that the device structure shown in fig. 10 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The RF circuit 1010 may be used for receiving and transmitting signals during a message or a call, and in particular, after receiving downlink information of a base station, the downlink information is processed by one or more processors 1080; in addition, data relating to uplink is transmitted to the base station. Typically, RF circuitry 1010 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, an LNA (Low Noise Amplifier ), a duplexer, and the like. In addition, the RF circuitry 1010 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol including, but not limited to, GSM (Global System of Mobile communication, global system for mobile communications), GPRS (General Packet Radio Service ), CDMA (Code Division Multiple Access, code division multiple access), WCDMA (Wideband Code Division Multiple Access ), LTE (Long Term Evolution, long term evolution), email, SMS (Short Messaging Service, short message service), and the like.

Memory 1020 may be used to store software programs and modules. Processor 1080 executes various functional applications and data processing by executing software programs and modules stored in memory 1020. The memory 1020 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebooks, etc.) created according to the use of the device 1000, etc. In addition, memory 1020 may include high-speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state memory device. Accordingly, memory 1020 may also include a memory controller to provide processor 1080 and input unit 1030 with access to memory 1020. While fig. 10 shows RF circuit 1010, it is to be understood that it is not an essential component of device 1000 and may be omitted entirely as desired within the scope of not changing the essence of the invention.

The input unit 1030 may be used for receiving input numeric or character information and generating keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, the input unit 1030 may include a touch-sensitive surface 1031 and other input devices 1032. The touch-sensitive surface 1031, also referred to as a touch display screen or touch pad, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch-sensitive surface 1031 or thereabout using any suitable object or accessory such as a finger, stylus, etc.), and actuate the corresponding connection device according to a pre-set program. Alternatively, the touch sensitive surface 1031 may comprise two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 1080 and can receive commands from the processor 1080 and execute them. In addition, the touch sensitive surface 1031 may be implemented in a variety of types, such as resistive, capacitive, infrared, and surface acoustic waves. In addition to the touch-sensitive surface 1031, the input unit 1030 may include other input devices 1032. In particular, other input devices 1032 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a track ball, a mouse, a joystick, etc.

The display unit 1040 may be used to display information input by a user or information provided to a user and various graphical user interfaces of the control 1000, which may be composed of graphics, text, icons, video and any combination thereof. The display unit 1040 may include a display panel 1041, and alternatively, the display panel 1041 may be configured in the form of an LCD (Liquid Crystal Display ), an OLED (Organic Light-Emitting Diode), or the like. Further, the touch sensitive surface 1031 may be overlaid on the display panel 1041, and upon detection of a touch operation thereon or thereabout by the touch sensitive surface 1031, the touch sensitive surface is communicated to the processor 1080 to determine a type of touch event, and the processor 1080 then provides a corresponding visual output on the display panel 1041 based on the type of touch event. Although in fig. 10 the touch-sensitive surface 1031 and the display panel 1041 are implemented as two separate components for input and output functions, in some embodiments the touch-sensitive surface 1031 may be integrated with the display panel 1041 to implement the input and output functions.

The computer device 1000 may also include at least one sensor 1050, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 1041 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 1041 and/or the backlight when the device 1000 moves to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and the direction when the mobile phone is stationary, and can be used for applications of recognizing the gesture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the device 1000 are not described in detail herein.

Audio circuitry 1060, a speaker 1061, and a microphone 1062 may provide an audio interface between a user and the device 1000. Audio circuit 1060 may transmit the received electrical signal after audio data conversion to speaker 1061 for conversion by speaker 1061 into an audio signal output; on the other hand, microphone 1062 converts the collected sound signals into electrical signals, which are received by audio circuit 1060 and converted into audio data, which are processed by audio data output processor 1080 for transmission to another control device via RF circuit 1010 or for output to memory 1020 for further processing. Audio circuitry 1060 may also include an ear bud jack to provide communication of peripheral headphones with device 1000.

The short-range wireless transmission module 1070 may be a WIFI (wireless fidelity ) module, a bluetooth module, an infrared module, or the like. The device 1000 may communicate information with a wireless transmission module provided on the combat device via the short-range wireless transmission module 1070.

Processor 1080 is a control center of device 1000 and connects the various parts of the overall control device using various interfaces and lines to perform various functions of device 1000 and process data by running or executing software programs and/or modules stored in memory 1020 and invoking data stored in memory 1020 to thereby monitor the control device as a whole. Optionally, processor 1080 may include one or more processing cores; alternatively, processor 1080 may integrate an application processor primarily handling operating systems, user interfaces, applications, etc., with a modem processor primarily handling wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 1050.

The device 1000 also includes a power source 1090 (e.g., a battery) for powering the various components, which can be logically connected to the processor 1080 by a power management system, such as to perform charge, discharge, and power management functions. The power source 1090 may also include one or more of any of a direct current or alternating current power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

Although not shown, the device 1000 may further include a camera, a bluetooth module, etc., which will not be described herein. The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the audio media distribution method when being executed by a processor.

The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the audio media distribution method when being executed by a processor.

The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The present application also discloses a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read from a computer-readable storage medium by a processor of an electronic device, and executed by the processor, to cause the electronic device to perform the method shown in fig. 3.

In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of this application are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.

Furthermore, while the present application is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the described functions and/or features may be integrated in a single physical device and/or software module or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present application. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Thus, those of ordinary skill in the art will be able to implement the present application as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the application, which is to be defined by the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing an electronic device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the principles and spirit of the application, the scope of which is defined by the claims and their equivalents.

While the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the embodiment, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention, and the equivalent modifications or substitutions are included in the scope of the present invention as defined in the appended claims.

Claims

1. A method of audio media distribution, comprising:

obtaining a push address;

receiving audio data sent by the client;

2. The audio media distribution method according to claim 1, wherein the obtaining the push address includes:

3. The audio media distribution method according to claim 2, wherein prior to said establishing a connection with a client, the method further comprises:

the establishing connection with the client comprises the following steps:

4. The audio media distribution method according to claim 2, wherein the matching according to a preset scheduling policy to obtain a push address includes:

acquiring the IP address of the client;

and obtaining the address of the streaming media service as the push address.

5. The method for audio media distribution according to claim 1, wherein said receiving the audio data transmitted from the client comprises:

6. The audio media distribution method according to claim 5, wherein after said receiving the start push command sent by the client, the method further comprises:

7. An audio media distribution apparatus, comprising:

the first unit is used for acquiring a push address;

a third unit, configured to receive audio data sent by the client;

8. An audio media distribution system, comprising: the system comprises a client, a first service, a second service, a plurality of downstream voice gateways and a plurality of audio devices;

the second server is configured to perform an audio media distribution method according to any one of claims 1 to 6;

the audio device is used for playing the transcoded audio data.

9. The audio media distribution system according to claim 8, wherein the second service end comprises a dispatch center module, a websocket service module, and a streaming media transcoding/distribution module;

the dispatching center module is used for acquiring a push address;

10. An electronic device comprising a processor and a memory;

The memory is used for storing programs;

the processor executing the program implements an audio media distribution method as claimed in any one of claims 1 to 6.