WO2009089717A1 - Procédé, système et serveur de contrôle de traitement audio - Google Patents

Procédé, système et serveur de contrôle de traitement audio Download PDF

Info

Publication number
WO2009089717A1
WO2009089717A1 PCT/CN2008/073694 CN2008073694W WO2009089717A1 WO 2009089717 A1 WO2009089717 A1 WO 2009089717A1 CN 2008073694 W CN2008073694 W CN 2008073694W WO 2009089717 A1 WO2009089717 A1 WO 2009089717A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
audio data
terminal
control server
terminals
Prior art date
Application number
PCT/CN2008/073694
Other languages
English (en)
French (fr)
Inventor
Yingbin Li
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to JP2010540017A priority Critical patent/JP5320406B2/ja
Priority to EP08870951.4A priority patent/EP2216941B1/en
Priority to KR1020107014148A priority patent/KR101205386B1/ko
Publication of WO2009089717A1 publication Critical patent/WO2009089717A1/zh
Priority to US12/824,892 priority patent/US8531994B2/en
Priority to US13/669,151 priority patent/US8649300B2/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/10Architectures or entities
    • H04L65/1046Call controllers; Call servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/403Arrangements for multi-party communication, e.g. for conferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1069Session establishment or de-establishment

Definitions

  • the present invention relates to the field of voice communication technologies, and in particular, to an audio processing method, system, and control server.
  • the devices that implement core audio exchange and control multiple conference terminals are mainly MCUs (Multipoint Control Unit). , multi-point control unit), the MCU unit has at least MC (Multipoint Control) function and MP (Multipoint Process) function, capable of multi-channel mixing, for example, in a conference call, at least
  • MC Multipoint Control
  • MP Multipoint Process
  • the telephone terminals of the three conference sites communicate with each other through the MCU at the same time.
  • the MCU needs to mix the voices sent by the terminals into one channel and then send them to the telephone terminals of each conference site to ensure that the terminal users of each conference site are not in one space, but It is like communicating in a room.
  • FIG. 1 the audio processing flow when performing audio communication for multiple terminals in the prior art is as shown in FIG. 1:
  • Step 101 Assign an audio codec port to the terminals of each site to be accessed on the MCU.
  • Step 102 After the call is initiated, each terminal separately sends the encoded audio data to the MCU.
  • Step 103 After decoding the audio data sent by each terminal, the MCU selects the audio data of the site with a large volume.
  • Step 104 Mix the selected audio data into one channel of audio data.
  • Step 105 The audio data after the mixing is encoded and then sent to each venue terminal.
  • Step 106 Each site terminal decodes the received audio data.
  • the audio data is transmitted from each of the venue terminals to the MCU, and until each of the venues receives the audio data after the mixing of the MCU, the audio codec is performed once every time the MCU passes.
  • One codec will increase the audio distortion of the terminal to the terminal.
  • the field terminal When a multipoint conference based on an MCU, the field terminal will perform a codec, and the MCU will perform a codec when it is mixed, resulting in two distortions;
  • the MCU cascades multiple points, the field terminal has to perform codec once, and the two MCUs need to perform two codecs when mixing, resulting in three distortions.
  • each additional MCU will increase accordingly.
  • One distortion; and each time the codec is performed the terminal-to-terminal speech delay is also increased, and the reason and derivation process are consistent with the above-mentioned audio distortion.
  • the MCU needs to allocate audio codec ports for each terminal separately, especially when the site is more, the MCU needs to provide a large number of audio codec ports, which increases the cost of multiple points.
  • An audio processing method includes:
  • the control server receives the encoded audio data transmitted by each terminal connected to the control server;
  • the control server obtains the audio capabilities of the respective terminals by performing capability negotiation with the respective terminals;
  • the control server forwards the audio data extracted from the audio data to the respective terminals in accordance with the audio capabilities.
  • An audio processing system includes: at least one control server and a plurality of terminals,
  • the control server is configured to receive the encoded audio data sent by each terminal connected to the control server, and obtain the audio capability of each terminal by performing capability negotiation with each terminal, and the audio will be obtained from the audio
  • the audio data extracted in the data is forwarded to the respective terminals according to the audio capability
  • the terminal is configured to access the control server, and decode the received audio data and automatically mix and play.
  • a control server comprising:
  • An acquiring unit configured to receive encoded audio data that is sent by each terminal that is connected to the control server, and obtain capability of the audio of each terminal by performing capability negotiation with each terminal;
  • a forwarding unit configured to forward the audio data extracted from the audio data to the respective terminals according to the audio capability.
  • the control server after the terminal accesses the control server in the embodiment of the present invention, the control server obtains the audio capability of the terminal through capability negotiation, and the control server forwards the encoded audio data to each terminal according to the audio capability. .
  • the audio data in the embodiment of the present invention does not need to perform an audio encoding and decoding operation every time a control server is performed, and the control server reduces the number of encoding and decoding of the control server by only performing packetizing and grouping of the audio data.
  • the transmission delay of the audio data enhances the real-time interaction between the terminals, and reduces the occupation of the audio codec resources by the control server, thereby reducing the cost; and realizing multipath by reducing the number of codecs of the control server itself.
  • Mixing able to maintain good compatibility with existing standard protocol control servers, can be widely used in communication fields such as conference TV and conference phones.
  • 1 is an audio processing flow when audio communication is performed by multiple terminals in the prior art
  • FIG. 2 is a flow chart of a first embodiment of an audio processing method according to the present invention.
  • FIG. 3 is a schematic structural diagram of a second embodiment of an audio processing method according to the present invention.
  • FIG. 4 is a flow chart of a second embodiment of an audio processing method according to the present invention.
  • FIG. 5 is a schematic structural diagram of a third embodiment of an audio processing method according to the present invention.
  • FIG. 6 is a flowchart of a third implementation of an audio processing method according to the present invention.
  • FIG. 7 is a schematic structural diagram of a fourth embodiment of an audio processing method according to the present invention.
  • FIG. 8 is a flowchart of a fourth implementation of an audio processing method according to the present invention.
  • FIG. 9 is a schematic structural diagram of a fifth embodiment of an audio processing method according to the present invention.
  • FIG. 10 is a flowchart of a fifth implementation of an audio processing method according to the present invention.
  • FIG. 11 is a schematic structural diagram of a sixth embodiment of an audio processing method according to the present invention.
  • FIG. 12 is a flowchart of a sixth embodiment of an audio processing method according to the present invention.
  • FIG. 13 is a block diagram of an embodiment of an audio processing system of the present invention.
  • Figure 14 is a block diagram of an embodiment of a control server of the present invention.
  • the embodiment of the present invention provides an audio processing method, a system, and a control server. After the terminal accesses the control server, the control server acquires the audio capability of the terminal through capability negotiation, and the control server follows the The audio capability forwards the encoded audio data to each terminal.
  • FIG. 2 The flow of the first embodiment of the audio processing method of the present invention is as shown in FIG. 2:
  • Step 201 After the terminal accesses the control server, the control server acquires the audio capability of the terminal through capability negotiation.
  • the audio capability of the terminal includes: The terminal supports a multi-channel separated audio codec protocol, or the terminal supports multiple audio logical channels, or does not support a multi-channel separated audio codec protocol and a multi-audio logical channel.
  • Step 202 The MCU forwards the encoded audio data to each terminal according to the audio capability.
  • the control server forwards the encoded audio data to each terminal according to the audio capability by using any one of the following methods: When the terminal supports the multi-channel separated audio codec protocol, the control server selects the multiple audio data in the audio data to perform After being packaged, it is forwarded in an audio logical channel; when the terminal supports the multi-audio logical channel, the control server selects the multi-channel audio data in the audio data to be forwarded in multiple audio logical channels.
  • the semantic server performs the mixing and encoding of the audio data and sends the audio data to each terminal.
  • control server forwards the encoded audio data according to the audio capability to each terminal accessing the control server; when cascading the plurality of control servers, the plurality of control servers follow the audio capability level
  • the encoded audio data sent by the receiving end control server is transmitted in association, and the receiving end control server forwards the audio data to each terminal accessing the receiving end control server.
  • FIG. 3 The schematic diagram of the second embodiment of the audio processing method of the present invention is shown in FIG. 3.
  • the control server is an MCU
  • the four terminals respectively implement multi-point audio processing by connecting with the MCU, wherein each terminal and the MCU are only The only audio transmission channel (shown by the solid arrow in the figure) and the audio receiving channel (shown by the dashed arrow in the figure), that is, there is an audio logic channel between the MCU and the terminal.
  • FIG. 4 shows audio data between a terminal using a multi-channel separated audio codec protocol and an MCU.
  • Step 401 After the terminal initiates the call, the terminal accesses the MCU and sends the encoded audio data to the MCU.
  • the terminal When the terminal initiates a call, it usually determines with the MCU through capability negotiation to determine a multi-channel separated audio codec protocol between the terminal and the MCU, which is usually an audio standard of an international standard such as AAC (Advanced Audio Coding) protocol.
  • the decoding protocol can also be a proprietary protocol.
  • Step 402 The MCU creates a decoder for the multi-channel separated audio codec protocol.
  • the channel separation means that the MCU does not need to decode the received audio encoded data of each terminal, but directly from the IP packet containing the audio encoded data. It is possible to know which channel the audio data comes from and the audio coding protocol of the channel.
  • Step 403 The MCU selects a terminal that needs to be mixed according to the volume of the decoded audio data.
  • Step 404 The MCU extracts audio data from independent channels of the terminal that needs to be mixed.
  • the MCU does not need to uniformly decode the received audio data of each terminal, and then selects the required audio data for mixing, and then re-encodes the process, but separately receives the multi-channel.
  • the audio data packet of one channel is directly extracted from the audio data of the separated audio codec protocol, and the terminal to which the audio data packet belongs is extracted as a terminal that needs to be mixed through the volume of the audio data.
  • Step 405 The MCU performs packet processing on the selected audio data and sends the audio data to each terminal through an audio logic channel.
  • terminals that perform multipoint communication with the MCU are terminal 1, terminal 2, terminal 3, and terminal 4, respectively, assuming that the MCU follows the volume policy.
  • the selected three channels of audio data are the encoded audio data sent by the terminal 1, the terminal 2 and the terminal 3, respectively, and the audio data of the three terminals are respectively packaged as an independent channel into an audio logical channel, that is, the The audio data in the logical channel contains data of three independent channels, and then forwarded to each terminal, that is, the terminal 1 receives the audio data packet composed of the audio coded data of the terminal 2 and the terminal 3, and the terminal 2 receives the terminal 1 and the terminal 3.
  • the audio data packet composed of the audio encoded data receives the audio data packet composed of the audio encoded data of the terminal 1 and the terminal 2, and the terminal 4 receives the audio data composed of the audio encoded data of the terminal 1, the terminal 2 and the terminal 3. package.
  • Step 406 The terminal decodes the received packaged audio data and automatically mixes and plays.
  • the MCU when not all terminals and the MCU interoperate to support the multi-channel separated audio codec protocol, the MCU needs to create resources for mixing and encoding for terminals that do not support the protocol.
  • support automatic audio protocol adaptation that is, the audio data sent by the terminal supporting the multi-channel separated audio codec protocol is automatically decoded, mixed and encoded, and then sent to a terminal that does not support the protocol, so as to maintain the terminal that does not support the protocol.
  • FIG. 5 A schematic structural diagram of a third embodiment of the audio processing method of the present invention is shown in FIG. 5.
  • the control server is an MCU
  • the terminal A1, the terminal A2, the terminal A3, and the terminal A4 are respectively connected to the MCU-A, and the terminal B1 and the terminal B2 are connected.
  • the terminal B3 and the terminal B4 are respectively connected to the MCU-B, and the terminal implements multi-point audio processing by connecting with the MCU, wherein each terminal and the MCU have only a unique audio transmission channel (shown by a one-way solid arrow in the figure) And the audio receiving channel (shown by the dashed arrow in the figure), that is, there is an audio logical channel between the MCU and the terminal, and a call is made between the MCUs (shown by the two-way solid arrow in the figure).
  • FIG. 5 the flow of the third embodiment of the audio processing method of the present invention is as shown in FIG. 6.
  • This embodiment shows a terminal between a multi-channel separated audio codec protocol and two cascaded MCUs.
  • the process of audio data processing is described in FIG. 6.
  • Step 601 After the terminal initiates the call, the terminal accesses the MCU-A, and sends the encoded audio data to the MCU-A.
  • Step 602 MCU-A creates a decoder for the multi-channel separated audio codec protocol.
  • Step 603 The MCU-A selects the terminal that needs to be mixed according to the volume of the decoded audio data.
  • Step 605 The MCU-A packages the selected audio data and sends it to the cascade.
  • Step 606 The MCU-B creates a decoder and selects audio data for replacing the audio data of the channel of the MCU-A according to the volume.
  • the cascaded MCU-A and the MCU-B are identical to the second embodiment of the present invention when processing audio data transmitted by terminals connected thereto, but a channel is added between the cascaded MCU-A and the cascaded MCU-B.
  • a channel is added between the cascaded MCU-A and the cascaded MCU-B.
  • multiple channels are added correspondingly. Therefore, when the cascaded MCU_A sends the packaged audio data to the MCU-B, the MCU-B will according to the volume of the received audio data and The volume of the audio data sent by the MCU-B connected terminal is compared, and the volume of the audio data packet sent by the MCU-A is replaced by the larger volume audio data connected to the MCU-B according to the comparison result. Smaller audio data.
  • the audio data packet of the terminal A1, the terminal A2, and the terminal A4 connected to the MCU-A via the MCU-A volume includes the audio data of the terminal A1, the terminal A2, and the terminal A3, when the MCU is used.
  • B compares the audio data packet, and assumes that the audio data volume of the terminal B1 connected to the MCU-B is greater than the audio data volume of the terminal A1 in the audio data packet, and the audio of the terminal B1 is applied.
  • the data replaces the audio data of the terminal A1 in the audio data packet.
  • Step 607 The MCU-B repackages the replaced audio data and sends it to each terminal connected thereto through an audio logic channel.
  • Step 608 The terminal decodes the received packaged audio data and automatically mixes and plays.
  • the MCU of the transmitting end creates an audio encoder for the terminal of the transmitting end
  • the MCU of the receiving end creates an audio decoder for the terminal of the receiving end. Therefore, no matter how many MCUs are cascaded, only the terminal of the MCU of the transmitting end needs to be encoded, and the terminal of the MCU of the receiving end performs decoding, and the entire audio processing process only performs audio encoding and decoding operations once.
  • the terminal of the transmitting end MCU sends the audio encoded data, and after the transmitting end MCU packs the audio data, the audio data packet is cascaded and transmitted between the multiple MCUs.
  • the receiving end MCU does not need to perform
  • the decoding is directly based on the multi-channel separated audio codec protocol, and the audio data of the one channel is directly extracted for the audio data packet, and the audio data sent by the terminal with the large volume of the MCU at the receiving end is correspondingly replaced, and then sent to The terminal of the receiving end MCU decodes the replaced audio data packet by the terminal of the receiving end MCU.
  • the MCU of the transmitting end does not need to create an audio encoder for the terminal of the transmitting end, and the MCU of the receiving end creates an audio encoder and a decoder for the terminal of the receiving end, and the receiving end MCU
  • the received concatenated transmitted audio data packets need to be decoded and replaced after the encoding operation, so that the terminals can be compatible. Therefore, no matter how many MCUs are cascaded, the audio data packets need not be subjected to any encoding and decoding operations when transmitted between other MCUs other than the receiving MCU.
  • the audio processing of the entire cascade transmission only needs to perform two encoding and decoding operations, that is, the terminal of the transmitting end MCU transmits the audio encoded data, and after the transmitting end MCU packs the audio encoded data, the audio data packet is Between multiple MCUs Cascading transmission, when transmitting to the receiving end MCU, since the multi-channel separated audio codec protocol is not supported, the receiving end MCU needs to decode the audio data packet, and uses the large volume of the terminal of the receiving end MCU to transmit The audio data replaces the audio data of the smaller volume in the audio data packet, and the receiving MCU re-encodes the replaced audio data and sends the terminal of the MCU of the receiving end, and the terminal of the receiving MCU receives the audio data packet and decodes it.
  • FIG. 7 The schematic diagram of the fourth embodiment of the audio processing method of the present invention is shown in FIG. 7.
  • the control server is an MCU, and the four terminals respectively implement multi-point audio processing by connecting with the MCU, wherein each terminal has three lines with the MCU.
  • the audio transmission channel (shown by the solid arrow in the figure) and an audio receiving channel (shown by the dotted arrow in the figure), that is, there are three audio logic channels between the terminal and the MCU, and the embodiment is based on the standard H.323 protocol, etc.
  • An international standard protocol that supports audio communication. It supports multiple logical channels and supports multiple logical channels carrying the same type of media.
  • FIG. 8 This embodiment shows the process of audio data processing between a terminal having a plurality of audio logical channels and an MCU. :
  • Step 801 After the terminal initiates the call, the terminal accesses the MCU and sends the encoded audio data to the MCU.
  • a terminal When a terminal initiates a call, it is generally determined by the capability negotiation with the MCU to support multiple audio logical channels between the terminal and the MCU. Since the capability negotiation standard protocol has a non-standard capability protocol field, the non-standard capability protocol field description supports multiple The ability of an audio logical channel. For example, if a 4-byte content "OxOaOa" is defined in the extended capability field of the capability negotiation standard protocol, when the capability negotiation is performed, the MCU finds that the terminal is filled with "OxOaOa" in the non-standard field, indicating that multiple audios are supported. The ability of the logical channel, when the call is successful, the audio processing can be processed according to the multi-audio channel.
  • the capability negotiation standard protocol has a non-standard capability protocol field
  • the non-standard capability protocol field description supports multiple The ability of an audio logical channel. For example, if a 4-byte content "OxOaOa" is defined in the extended capability field of the capability negotiation standard protocol, when the capability negotiation is performed, the M
  • Step 802 The MCU creates a decoder for multiple audio logical channels.
  • Step 803 The MCU selects a terminal that needs to be mixed according to the volume of the decoded audio data.
  • Step 804 The audio data of the terminal that needs to be mixed is directly sent to each terminal through the corresponding three audio logical channels.
  • the MCU After receiving the encoded audio data sent by the terminal 1, the terminal 2, the terminal 3, and the terminal 4, the MCU assumes that the three channels of audio data selected by the MCU according to the audio policy are the audio data of the terminal 1, the terminal 2, and the terminal 3, respectively.
  • the MCU can directly send the selected audio data in the audio logical channel to each terminal, that is, the terminal 1 respectively receives the audio channel of the terminal 2 and the audio channel of the terminal 3.
  • the terminal 2 receives the audio data of the terminal 1 and the terminal 3 from the audio channel of the terminal 1 and the audio channel of the terminal 3, respectively, and the terminal 3 respectively receives the audio channel of the terminal 1 and the terminal 2
  • the audio channel receives the audio data of the terminal 1 and the terminal 2
  • the terminal 4 receives the audio data of the terminal 1, the terminal 2, and the terminal 3 from the audio channel of the terminal 1, the audio channel of the terminal 2, and the audio channel of the terminal 3, respectively.
  • Step 805 The terminal decodes the received audio data and automatically mixes and plays.
  • the terminal in this embodiment supports opening a plurality of audio receiving channels, supporting simultaneous decoding of multiple audio data, and supporting mixing of the decoded multi-channel audio data to output to the speaker.
  • the terminal 1 decodes the two audio data received from the audio channel of the terminal 2 and the audio channel of the terminal 3, and then mixes and outputs the audio data to the speaker.
  • the MCU when not all terminals and the MCU are interoperable to support multi-audio logical channels, the MCU needs to create resources for mixing and encoding for terminals that do not support multiple logical channels, and supports Automatic audio protocol adaptation, which automatically decodes and mixes the audio data sent by the multi-audio logic channel terminal to the terminal that does not support the multi-audio logic channel, so as to maintain compatibility with terminals that do not support multi-audio logic channels. .
  • FIG. 9 The schematic diagram of the fifth embodiment of the audio processing method of the present invention is shown in FIG. 9.
  • the control server is an MCU
  • the terminal A1, the terminal A2, the terminal A3, and the terminal A4 are respectively connected to the MCU-A
  • the terminal B1 and the terminal B2 are connected.
  • the terminal B3 and the terminal B4 are respectively connected to the MCU-B.
  • the terminal is connected to the MCU to implement multi-point audio processing, wherein each terminal has three audio transmission channels between the terminal and the MCU (as indicated by a one-way solid arrow in the figure).
  • An audio receiving channel (shown by the dashed arrow in the figure) shows four logical channels between each terminal and the MCU, and a call is made between the MCUs (as indicated by the two-way solid arrows in the figure).
  • FIG. 9 the flow of the fifth embodiment of the audio processing method of the present invention is as shown in FIG. 10, which shows audio data between a terminal having a plurality of audio logical channels and two cascaded MCUs.
  • Step 1001 After the terminal initiates the call, the MCU-A is accessed and the encoded audio data is sent to the MCU-A.
  • a call When a call is initiated, it is generally determined by the capability negotiation with the MCU to support the multi-way call cascading between the terminal and the cascaded MCU. Since the capability negotiation standard protocol has a non-standard capability protocol field, the non-standard capability protocol field description support is provided. The ability to cascade multiple calls, the same cascading call between MCUs Calling also uses the same process. For example, suppose you define it in the Extended Capability field of the Capability Negotiation Standard Protocol.
  • the 4-byte content "OxOaOb" when performing capability negotiation, the MCU finds that the terminal marks "OxOaOb" in the non-standard capability field, indicating the ability to support multi-way call cascading.
  • the audio processing is performed. It can be performed in the manner of cascading multiple calls.
  • Step 1002 MCU-A creates a decoder for multiple logical channels.
  • Step 1003 The MCU-A selects a terminal that needs to be mixed according to the volume of the decoded audio data.
  • Step 1004 Forwarding several audio logical channel data of the terminal that needs to be mixed directly to
  • Step 1005 The MCU-B creates a decoder and selects audio data for replacing the audio data of the MCU-A according to the volume.
  • Step 1006 The MCU-B sends the replaced audio data directly to each terminal through three audio logic channels.
  • Step 1007 The terminal decodes the received audio data and automatically mixes and plays.
  • the MCU of the transmitting end when all the terminals support the multi-audio logical channel, creates an audio encoder for the terminal of the transmitting end, and the MCU of the receiving end creates an audio decoder for the terminal of the receiving end. Therefore, no matter how many MCUs are cascaded, only the terminal of the MCU at the transmitting end needs to be encoded, and the audio data transmitted from the multi-audio channel is separately decoded and mixed at the terminal of the receiving MCU, and the entire audio processing process is only performed. An operation of audio encoding and decoding.
  • the terminal of the transmitting end MCU transmits the audio encoded data
  • the transmitting end MCU transmits the audio data in cascade through multiple audio logic channels among multiple MCUs.
  • the receiving end MCU does not need to decode but directly According to the multi-audio logic channel capability
  • the audio data of the audio channel of the multi-logic channel is replaced by the audio data of the audio logic channel sent by the terminal with the large volume of the MCU at the receiving end, and then sent to the terminal of the MCU of the receiving end, and the MCU of the receiving end is received.
  • the terminal decodes the replaced multi-channel audio data transmitted by the multi-audio logical channel separately.
  • the MCU of the transmitting end does not need to create an audio encoder for the terminal of the transmitting end, and the MCU of the receiving end creates an audio encoder and a decoder for the terminal of the receiving end, and the receiving MCU needs to receive the audio encoder.
  • the cascading transmitted audio data packets are decoded and re-substituted, so that the terminals can be compatible.
  • the audio data packets are in addition to the receiving MCU. It does not require any encoding and decoding operations when transferring between MCUs. Therefore, the audio processing of the entire cascade transmission only needs to perform two encoding and decoding operations, that is, the transmitting end MCU transmits the audio data in cascade through multiple audio logic channels between multiple MCUs, when transmitting to the receiving end MCU, Since the multi-audio logical channel is not supported, the receiving end MCU needs to decode the audio data of the multi-audio logical channel, and replace the audio data of the multi-audio channel with the larger volume audio data sent by the terminal of the receiving MCU. The audio data of the volume, the receiving end MCU re-encodes the replaced multi-channel audio data and sends the terminal of the MCU of the receiving end, and the terminal of the receiving MCU receives the audio data packet and decodes it.
  • FIG. 11 A schematic structural diagram of a sixth embodiment of the audio processing method of the present invention is shown in FIG. 11.
  • the control server is an MCU
  • the terminal 1 and the terminal 2 are connected to the MCU-A
  • the terminal 3 and the terminal 4 are connected to the MCU-B. It is connected with the MCU to realize multi-point audio processing, and realizes multiple cascading calls between the MCU-A and the MCU-B, that is, the number of terminals that are cascading between the MCU-A and the MCU-B is dynamically established according to the number of terminals to be mixed.
  • Road call each call has only one audio channel, and the protocol between each audio channel can be different.
  • FIG. 12 shows the flow of the sixth embodiment of the audio processing method of the present invention.
  • Step 1201 After the terminal initiates the call, the terminal accesses the MCU-A and sends the encoded audio data to the MCU-A.
  • Step 1202 The MCU-A creates a decoder for the accessed terminal.
  • Step 1203 The MCU-A selects a terminal that needs to be mixed according to the volume of the decoded audio data.
  • Step 1204 The MCU-A forwards the audio data of the terminal that needs to be mixed from the corresponding audio protocol port of the corresponding MCU-A to the port supporting the audio protocol on the MCU-B.
  • Step 1205 The MCU-B decodes the audio data sent from each port of the MCU-A after creating the decoder.
  • Step 1206 The MCU-B selects the audio data to be mixed from the received multi-channel audio data sent by the MCU-A and the multi-channel audio data sent by the terminal of the MCU-B according to the volume.
  • Step 1207 The MCU-B mixes the selected multi-channel audio data and sends them to each terminal.
  • a pair of MCU cascade ports are usually used to implement an audio call, but in the sixth embodiment of the present invention, multiple calls supporting different audio protocols are implemented between two cascaded MCUs through multiple pairs of ports. , thereby achieving multi-channel mixing of multi-channel audio data.
  • terminal 1 and terminal 2 are terminals supporting different audio protocols, respectively, and terminal 3 is a terminal supporting multiple audio logical channels, and three terminals are established between cascaded MCU-A and MCU-B.
  • the three-way cascaded call the terminal 1 and the terminal 2 encode the respective audio data and send it to the MCU-A, and the MCU-A passes the audio data of the terminal 1 and the audio data of the terminal 2 through the cascade call 1 and the cascade call. 2 is sent to the MCU-B, and the MCU-B sends the two audio data packets to the terminal 3, and the terminal 3 decodes the audio data packet.
  • the MCU of the transmitting end creates an audio encoder for the terminal of the transmitting end, and then the MCU of the receiving end decodes, mixes, and transmits the received multi-channel audio data of the cascade transmission.
  • the terminal of the terminal performs decoding, and the MCU of the receiving end can create an audio decoder for the terminal of the receiving end. Therefore, no matter how many MCUs are cascaded, the audio data packet does not need to be encoded and decoded during transmission between the MCU other than the transmitting MCU and the receiving MCU, and the audio processing of the entire cascade transmission only needs to be performed twice.
  • the operation of the secondary codec can be done. For example, in FIG.
  • the terminal 1, the terminal 2, and the terminal 3 are respectively terminals supporting different audio protocols, and three cascaded calls corresponding to three terminals are established between the cascaded MCU-A and the MCU-B, and the terminal is 1 and the terminal 2 encodes the respective audio data and sends it to the MCU-A.
  • the MCU-A transmits the audio data of the terminal 1 and the terminal 2 to the MCU-B through the cascade call 1 and the cascade call 2, respectively, and the MCU-B pair
  • the received two channels of audio data are decoded, then mixed and re-encoded into audio data corresponding to the audio protocol of the terminal 3, and the encoded audio data is sent to the terminal 3, and the terminal 3 receives the audio data according to its location.
  • the supported audio protocol decodes the audio data.
  • the appropriate MCU cascading scheme can be automatically selected according to the capability acquired when the terminal performs capability negotiation. For example, for cascading, when all terminals support multi-channel separated audio codec protocol, cascaded conferences of multi-channel separated audio codec protocols are automatically scheduled; when all terminals support multiple audio logics In the channel, the cascaded conference of multiple audio logical channels is automatically scheduled; when some terminals support the multi-channel separated audio codec protocol, and some of the terminals are ordinary terminals, the multi-channel separated audio codec protocol is automatically scheduled.
  • Multi-way call cascading conferences of terminals of terminals and other audio protocols when some terminals support multiple audio logical channels and some terminals are ordinary terminals, a cascaded conference site containing all audio protocols is automatically scheduled.
  • a single MCU when all terminals support the multi-channel split audio codec protocol, a single MCU conference with multi-channel split audio codec protocol is automatically scheduled; when all terminals support multiple audio logical channels, then automatic scheduling is performed.
  • the present invention also provides an embodiment of an audio processing system.
  • FIG. 13 A block diagram of an embodiment of the audio processing system of the present invention is shown in Figure 13:
  • the system includes: at least one control server 1310 and a plurality of terminals 1320.
  • the control server 1310 is configured to obtain the audio capability of the terminal by capability negotiation, and forward the encoded audio data to each terminal according to the audio capability; the terminal 1320 is configured to access the control server, and receive the received The audio data is decoded and automatically mixed and played.
  • the present invention also provides a control server.
  • the control server of the present invention includes: an obtaining unit 1410 and a forwarding unit 1420.
  • the obtaining unit 1410 is configured to obtain the audio capability of the terminal by capability negotiation, and the forwarding unit 1420 is configured to forward the encoded audio data to each terminal according to the audio capability.
  • the forwarding unit 1420 includes (not shown in FIG. 14): a selecting unit for performing volume selection according to the audio data. Audio data of a plurality of terminals of the sound mixing; an extracting unit, configured to extract audio data in independent channels of the several terminals; and a sending unit, configured to package the extracted audio data and pass an audio logic Channels are sent to each terminal.
  • the control server is a cascaded plurality of control servers.
  • the forwarding unit 1420 includes (not shown in FIG. 14): a selection unit for several ways of mixing according to the volume selection of the audio data The audio data of the terminal; the extracting unit extracts audio data in the independent channels of the several terminals; and the transmitting unit is configured to package the extracted audio data and cascade transmit to the receiving end through an audio logical channel Control the server.
  • the forwarding unit 1420 includes (not shown in FIG. 14 a selection unit, configured to select audio data of the receiving end that replaces the audio data sent by the sending end control server according to the volume; and a sending unit, configured to repackage the replaced audio data and pass an audio logic Channels are sent to each terminal.
  • the forwarding unit 1420 includes (not shown in FIG. 14): a selection unit, The audio data of the plurality of terminals that are mixed according to the volume of the audio data is selected; and the sending unit is configured to directly send the audio data of the several terminals to each terminal through the plurality of audio logical channels.
  • the forwarding unit 1420 includes (not shown in FIG. 14): a selection unit, configured to select audio data of a plurality of terminals that are mixed according to a volume of the audio data, and a transmission unit configured to use the plurality of terminals
  • the audio data is cascaded to the receiving end control server through multiple audio logical channels.
  • the forwarding unit 1420 includes (not shown in FIG. 14): a unit, configured to select audio data of a receiving end that replaces audio data sent by the sending end control server according to a volume; and a sending unit, configured to send the replaced audio data directly through the multiple audio logical channels Go to each terminal.
  • the forwarding unit 1420 includes (not shown in FIG. 14): a selecting unit configured to perform volume selection according to the audio data.
  • the audio data of the mixed terminals; the transmission unit is configured to separately transmit the audio data of the several terminals from the port of the audio protocol corresponding to the terminal to the corresponding port of the receiving end control server.
  • the forwarding unit 1420 includes (not shown in FIG. 14): a selecting unit for transmitting from the received according to the volume
  • the audio data sent by the end control server and the audio data of the receiving end are selected to be mixed audio data; the sending unit is configured to mix the audio data and send the audio data to each terminal.
  • the control server may further include: a creating unit, configured to create a resource for mixing and encoding for the terminal;
  • the forwarding unit 1420 includes (not shown in FIG. 14): a selecting unit, configured to select audio data of the plurality of terminals that perform mixing according to a preset policy; and a transmitting unit, configured to perform the audio data by using the resource
  • the decoding and mixing are encoded and sent to the terminal.
  • the control server selects the terminal for mixing according to the volume selection.
  • the terminal that performs the mixing may be selected according to other strategies preset, and the policies may include:
  • the call identifier selects a terminal for mixing (for example, a terminal with a special identifier is a terminal to be selected), or selects a terminal for mixing according to a call sequence of the terminal (for example, a number of terminals in front of the call are terminals to be selected) Wait.
  • the audio data in the embodiment of the present invention does not need to perform an audio codec operation every time a control server is performed, thereby greatly reducing the number of codecs of the control server, especially in only one control server.
  • the audio delay between the terminal and the terminal only the network transmission, the encoding of the transmitting terminal, and the decoding of the receiving terminal, and the control server only performs packetizing and grouping of the audio data, so the delay can be neglected.
  • the real-time interaction between the terminals is enhanced, and the control server's occupation of the audio codec resources is reduced, and the cost is reduced.
  • Multi-channel mixing can be achieved by reducing the number of encoding and decoding of the control server itself. It can maintain good compatibility with existing standard protocol control servers and can be widely used in communication fields such as conference TV and conference phones.
  • the method includes the following steps: After the terminal accesses the control server, the control server acquires the audio capability of the terminal through capability negotiation; the control server forwards the encoded audio data to each terminal according to the audio capability.
  • the storage medium such as: ROM/RAM, Disk, CD, etc.

Description

一种音频处理方法、 系统和控制服务器
本申请要求于 2007 年 12 月 28 日提交中国专利局、 申请号为 200710305684.6、 发明名称为"一种音频处理方法、 系统和控制服务器"的中国 专利申请的优先权, 其全部内容通过引用结合在本申请中。
技术领域
本发明涉及语音通信技术领域,特别涉及一种音频处理方法、 系统和控制 服务器。
背景技术
当前会议电视产品或部分会议电话产品主要遵循 ITU-H.323 协议或 ITU-H.320协议进行音频处理, 其中实现核心音频交换以及对多个会议终端进 行控制的设备主要是 MCU ( Multipoint Control Unit, 多点控制单元), MCU 单元至少具有 MC ( Multipoint Control, 多点控制)功能和 MP ( Multipoint Process, 多点处理)功能, 能够进行多路混音, 例如, 在某个电话会议中, 至 少三个会场的电话终端同时通过 MCU进行通信, 则 MCU需要将各个终端发 送过来的声音混合成一路,再发送到各个会场的电话终端, 以保证各个会场的 终端用户虽然不在一个空间内, 但却如同在一个^义室内交流。
以会议音频处理为例,现有技术中对于多个终端进行音频交流时的音频处 理流程如图 1所示:
步骤 101 :在 MCU上为接入的各个会场的终端分别分配音频编解码端口。 步骤 102: 发起呼叫后各个终端分别将编码后的音频数据发送至 MCU。 步骤 103: MCU解码各个终端发送的音频数据后, 选出音量较大的会场 的音频数据。
步骤 104: 对选出的音频数据进行混音后混合成一路音频数据。
步骤 105: 将混音后的一路音频数据进行编码后再发送到各个会场终端。 步骤 106: 各个会场终端对接收到的音频数据进行解码。
由上述对现有技术的描述可知, 从各个会场终端向 MCU发送音频数据开 始, 到各个会场接收到 MCU发送的混音后的一路音频数据为止, 每经过一次 MCU就要进行一次音频编解码。
在实现本发明的过程中,发明人发现现有技术至少存在如下问题:每进行 一次编解码就会增加终端到终端的音频失真, 当基于一个 MCU的多点会议, 则会场终端要进行一次编解码, MCU混音时要进行一次编解码, 导致出现两 次失真; 当基于两个 MCU级联的多点^义, 则会场终端要进行一次编解码, 两个 MCU混音时要进行两次编解码, 导致出现三次失真, 由此类推, 每增加 一个 MCU, 则会相应增加一次失真; 并且每进行一次编解码还会增加终端到 终端的语音延时, 其原因和推导过程与上述音频失真一致。 另外, 对于同时加 入语音会议的会场终端, MCU要为各个终端分别分配音频编解码端口, 尤其 在会场较多的时候需要 MCU提供大量的音频编解码端口, 增加了多点^义成 本。
发明内容
本发明实施例的目的在于提供一种音频处理方法、 系统和控制服务器。 为实现本发明实施例的目的 , 本发明实施例提供如下技术方案: 一种音频处理方法, 包括:
控制服务器接收接入到所述控制服务器上的各个终端发送的编码后的音 频数据;
控制服务器通过与所述各个终端进行能力协商获取所述各个终端的音频 能力;
控制服务器将从所述音频数据中提取的音频数据按照所述音频能力转发 到所述各个终端。
一种音频处理系统, 包括: 至少一个控制服务器和多个终端,
所述控制服务器用于,接收接入到所述控制服务器上的各个终端发送的编 码后的音频数据 ,通过与所述各个终端进行能力协商获取所述各个终端的音频 能力,将从所述音频数据中提取的音频数据按照所述音频能力转发到所述各个 终端;
所述终端用于接入所述控制服务器,并对接收到的音频数据进行解码并自 动混音后播放。
一种控制服务器, 包括:
获取单元,用于接收接入到所述控制服务器上的各个终端发送的编码后的 音频数据 , 通过与所述各个终端进行能力协商获取所述各个终端的音频能力; 转发单元,用于将从所述音频数据中提取的音频数据按照所述音频能力转 发到所述各个终端。
由以上本发明实施例提供的技术方案可见,本发明实施例中终端接入控制 服务器后,控制服务器通过能力协商获取终端的音频能力,控制服务器按照该 音频能力转发编码后的音频数据到各个终端。本发明实施例中的音频数据无需 每经过一个控制服务器都进行一次音频编解码操作 ,控制服务器由于只对音频 数据进行抽包和组包的重组和转发,因此降低了控制服务器的编解码次数和对 音频数据的传输延时,增强了终端之间交互的实时性, 并且减少了控制服务器 对音频编解码资源的占用, 降低了成本; 在减少了控制服务器自身编解码次数 的情况下实现多路混音, 能够与现有标准协议控制服务器保持良好的兼容性, 可以广泛应用于会议电视和会议电话等通信领域。
附图说明
图 1为现有技术中多个终端进行音频交流时的音频处理流程;
图 2为本发明音频处理方法的第一实施例流程图;
图 3为本发明音频处理方法的第二实施例结构示意图;
图 4为本发明音频处理方法的第二实施例流程图;
图 5为本发明音频处理方法的第三实施例结构示意图;
图 6为本发明音频处理方法的第三实施流程图;
图 7为本发明音频处理方法的第四实施例结构示意图;
图 8为本发明音频处理方法的第四实施流程图;
图 9为本发明音频处理方法的第五实施例结构示意图;
图 10为本发明音频处理方法的第五实施流程图;
图 11为本发明音频处理方法的第六实施例结构示意图;
图 12为本发明音频处理方法的第六实施例流程图;
图 13为本发明音频处理系统的实施例框图;
图 14为本发明控制服务器的实施例框图。
具体实施方式
本发明实施例提供了音频处理方法、 系统和控制服务器,在终端接入控制 服务器后,控制服务器通过能力协商获取终端的音频能力,控制服务器按照该 音频能力转发编码后的音频数据到各个终端。
为了使本技术领域的人员更好地理解本发明实施例提供的技术方案,下面 结合附图和具体实施方式对本发明实施例提供的技术方案作进一步的详细说 明。
本发明音频处理方法的第一实施例流程如图 2所示:
步骤 201 : 终端接入控制服务器后控制服务器通过能力协商获取终端的音 频能力。
其中, 终端的音频能力情况包括: 终端支持多声道分离音频编解码协议, 或终端支持多音频逻辑通道,或不支持多声道分离音频编解码协议和多音频逻 辑通道。
步骤 202: MCU按照音频能力转发编码后的音频数据到各个终端。
其中,控制服务器按照音频能力使用下述任一方式转发编码后的音频数据 到各个终端: 当终端支持多声道分离音频编解码协议时,控制服务器选择所述 音频数据中的多路音频数据进行打包后在一个音频逻辑通道内转发;当终端支 持多音频逻辑通道时,控制服务器选择所述音频数据中的多路音频数据在多个 音频逻辑通道内转发。 当终端不支持上述方式时, 义服务器将所述音频数据 进行混音编码后发送给各个终端。
其中,仅有一个控制服务器时,控制服务器按照所述音频能力转发编码后 的音频数据到接入该控制服务器的各个终端; 级联多个控制服务器时, 多个控 制服务器按照所述音频能力级联传输所述接收端控制服务器发送的编码后的 音频数据,并由接收端控制服务器转发音频数据到接入该接收端控制服务器的 各个终端。
本发明音频处理方法第二实施例的结构示意图如图 3所示,图 3中控制服 务器为 MCU, 四个终端分别通过与 MCU连接实现多点音频处理, 其中每个 终端与 MCU之间均只有唯一的音频发送通道(图中实线箭头所示)和音频接 收通道(图中虚线箭头所示), 即 MCU与终端之间有一条音频逻辑通道。 结 合图 3所示的结构图,本发明音频处理方法的第二实施例流程如图 4所示,该 实施例示出了采用多声道分离音频编解码协议的终端与一个 MCU之间进行音 频数据处理的过程: 步骤 401 : 终端发起呼叫后接入 MCU 并将编码后的音频数据发送给该 MCU。
终端在发起呼叫时, 通常与 MCU通过能力协商确定终端与 MCU之间支 持多声道分离音频编解码协议, 该协议通常为 AAC ( Advanced Audio Coding, 高级音频编码技术)协议等国际标准的音频编解码协议, 也可以为私有协议。
步骤 402: MCU创建针对多声道分离音频编解码协议的解码器。
本发明实施例中采用的多声道分离音频编解码协议中, 声道分离是指 MCU无需对接收到的各个终端的音频编码数据进行解码, 而是直接从包含该 音频编码数据的 IP报文中就可以知道各个音频数据来自于哪个声道以及该声 道的音频编码协议。
步骤 403: MCU根据解码出的音频数据音量选出需要混音的终端。
步骤 404: MCU从需要混音的终端的独立声道中提取音频数据。
本发明实施例中 MCU不必对接收到的各个终端的音频数据进行统一的解 码, 再选出需要的几路音频数据进行混音, 然后再编码的过程, 而是分别从接 收到的多声道分离音频编解码协议音频数据中直接提取出一个声道的音频数 据包,提取音频数据包所属的终端即为通过音频数据音量选出的需要混音的终 端。
步骤 405: MCU对选出的几路音频数据进行打包处理后通过一条音频逻 辑通道发送给各个终端。
将上述提取出未经解码的几路音频数据包直接进行重新打包组合在一起, 例如, 与 MCU进行多点通信的终端分别为终端 1、 终端 2、 终端 3和终端 4, 假设 MCU按照音量策略选出的三路音频数据分别为终端 1、 终端 2和终端 3 发送的编码后的音频数据 ,把这三个终端的音频数据分别作为一个独立的声道 打包到一个音频逻辑通道里,即该逻辑通道里的音频数据包含三个独立声道的 数据, 然后转发到各个终端, 即终端 1接收到终端 2和终端 3的音频编码数据 组成的音频数据包,终端 2接收到终端 1和终端 3的音频编码数据组成的音频 数据包, 终端 3接收到终端 1和终端 2的音频编码数据组成的音频数据包, 而 终端 4接收到终端 1、 终端 2和终端 3的音频编码数据组成的音频数据包。
步骤 406: 终端对接收到的打包音频数据进行解码并自动混音后播放。 上述本发明方法的第二实施例中, 当并非所有的终端与 MCU互通都支持 多声道分离音频编解码协议时, 则 MCU需要为不支持该协议的终端创建用于 混音和编码的资源, 并且支持自动音频协议适配, 即自动把支持多声道分离音 频编解码协议终端发送的音频数据经过解码、混音编码后发送给不支持该协议 的终端, 以保持对不支持该协议终端的兼容。
本发明音频处理方法第三实施例的结构示意图如图 5所示,图 5中控制服 务器为 MCU, 终端 Al、 终端 A2、 终端 A3和终端 A4分别与 MCU— A相连, 终端 Bl、终端 B2、终端 B3和终端 B4分别与 MCU— B相连, 上述终端通过与 MCU连接实现多点音频处理, 其中每个终端与 MCU之间均只有唯一的音频 发送通道(图中单向实线箭头所示)和音频接收通道(图中虚线箭头所示), 即 MCU与终端之间有一条音频逻辑通道, MCU之间实现一路呼叫 (图中双 向实线箭头所示)。 结合图 5所示的结构图, 本发明音频处理方法的第三实施 例流程如图 6所示,该实施例示出了采用多声道分离音频编解码协议的终端与 两个级联 MCU之间进行音频数据处理的过程:
步骤 601 : 终端发起呼叫后接入 MCU— A, 并将编码后的音频数据发送给 该 MCU— A。
步骤 602: MCU— A创建针对多声道分离音频编解码协议的解码器。
步骤 603: MCU— A才艮据解码出的音频数据音量选出需要混音的终端。 步骤 604: MCU— A从需要混音的终端的独立声道中提取音频数据。
步骤 605: MCU— A对选出的几路音频数据进行打包处理后发送给级联的
MCU— B。
步骤 606: MCU— B创建解码器后根据音量选出对 MCU— A的声道的音频 数据进行替换的音频数据。
级联 MCU— A和 MCU— B在处理与其各自相连的终端发送的音频数据时与 本发明实施例二一致 ,但是在级联 MCU— A和级联 MCU— B之间增加了一条声 道,特别当两个以上 MCU级联时会相应增加多条声道,因此在级联的 MCU— A 向 MCU— B发送打包音频数据时, MCU—B会根据收到音频数据的音量和与该 MCU— B 相连终端发送的音频数据音量进行比较, 根据比较的结果用与该 MCU— B相连的较大音量音频数据替换 MCU— A发送的音频数据包中音量相对 较小的音频数据。
结合图 5, 假设与 MCU— A相连的终端 Al、 终端 A2、 终端 A3和终端 A4 经 MCU— A音量选择后的音频数据包里包含了终端 A1、终端 A2和终端 A3的 音频数据, 当 MCU— B接收到该音频数据包后, 对该音频数据包进行比较, 假 设与 MCU— B相连的终端 B1的音频数据音量大于音频数据包中终端 A1的音 频数据音量, 则相应用终端 B1的音频数据替换音频数据包中终端 A1的音频 数据。
步骤 607: MCU— B将替换后的音频数据重新打包处理后通过一条音频逻 辑通道发送给与其相连的各个终端。
步骤 608: 终端对接收到的打包音频数据进行解码并自动混音后播放。 上述本发明实施例三中,当所有的终端都支持多声道分离音频编解码协议 时, 则通过发送端的 MCU给发送端的终端创建音频编码器, 接收端的 MCU 给接收端的终端创建音频解码器即可, 因此无论级联了多少个 MCU, 只需要 在发送端 MCU的终端进行编码, 并在接收端 MCU的终端进行解码, 整个音 频处理过程只进行一次音频编码和解码的操作。及发送端 MCU的终端发送音 频编码数据, 发送端 MCU对音频数据进行打包处理后, 该音频数据包在多个 MCU之间级联传输, 当传输到接收端 MCU时,该接收端 MCU无需进行解码 而是直接根据多声道分离音频编解码协议,对该音频数据包采用直接提取一个 声道的音频数据, 用该接收端 MCU音量较大的终端发送的音频数据进行相应 替换后, 发送至接收端 MCU的终端, 由接收端 MCU的终端对替换后的音频 数据包进行解码。
当并非所有的终端都支持多声道分离音频编解码协议时, 则发送端的 MCU无需给发送端的终端创建音频编码器, 接收端的 MCU给接收端的终端 创建音频编码器和解码器, 并且接收端 MCU需要对接收到的级联传输的音频 数据包进行解码及重新替换后的编码操作,使得各个终端之间能够兼容。因此, 无论级联了多少个 MCU,音频数据包在除了接收端 MCU外的其它 MCU之间 传输时无需进行任何编码和解码操作。由此整个级联传输的音频处理过程只需 要进行两次编解码的操作即可, 即发送端 MCU的终端发送音频编码数据, 发 送端 MCU对音频编码数据进行打包处理后 , 该音频数据包在多个 MCU之间 级联传输,当传输到接收端 MCU时,由于不支持多声道分离音频编解码协议, 该接收端 MCU需要对该音频数据包进行解码, 并用该接收端 MCU的终端发 送的较大音量的音频数据替换音频数据包中较小音量的音频数据, 接收端 MCU对替换后的音频数据重新编码后发送该接收端 MCU 的终端, 接收端 MCU的终端接收到音频数据包后进行解码。
本发明音频处理方法第四实施例的结构示意图如图 7所示,图 Ί中控制服 务器为 MCU, 四个终端分别通过与 MCU连接实现多点音频处理, 其中每个 终端与 MCU之间有三条音频发送通道(图中实线箭头所示)及一条音频接收 通道(图中虚线箭头所示), 即终端与 MCU之间的有三条音频逻辑通道, 该 实施例基于标准的 H.323协议等支持音频通信的国际标准协议,该协议支持打 开多个逻辑通道,也支持多个承载同类媒体的逻辑通道。 结合图 7所示的结构 图,本发明音频处理方法的第四实施例流程如图 8所示,该实施例示出了具有 多个音频逻辑通道的终端与一个 MCU之间进行音频数据处理的过程:
步骤 801 : 终端发起呼叫后接入 MCU 并将编码后的音频数据发送给该 MCU。
终端发起呼叫时, 通常与 MCU通过能力协商确定终端与 MCU之间支持 多个音频逻辑通道, 由于能力协商标准协议中带有非标能力协议字段, 因此通 过该非标能力协议字段描述支持多个音频逻辑通道的能力。例如,假设在能力 协商标准协议的扩展能力字段里定义 4个字节的内容 "OxOaOa", 则进行能力协 商时, MCU发现终端在非标准字段里填有" OxOaOa" , 则表示支持多个音频逻 辑通道的能力 , 当呼叫成功后的音频处理就能够按照多音频通道进行处理。
步骤 802: MCU创建针对多个音频逻辑通道的解码器。
步骤 803 : MCU根据解码出的音频数据音量选出需要混音的终端。
步骤 804: 将需要混音的终端的音频数据通过对应的三条音频逻辑通道直 接发送到各个终端。
例如, MCU接收到终端 1、 终端 2、 终端 3和终端 4发送的编码后的音频 数据后, 假设 MCU按照音频策略选出的三路音频数据分别为终端 1、 终端 2 和终端 3的音频数据, 则 MCU可以直接将选出的所有音频逻辑通道内的音频 数据发送至各个终端,即终端 1分别从终端 2的音频通道和终端 3的音频通道 接收到终端 2和终端 3的音频数据, 终端 2分别从终端 1的音频通道和终端 3 的音频通道接收到终端 1和终端 3的音频数据,终端 3分别从终端 1的音频通 道和终端 2的音频通道接收到终端 1和终端 2的音频数据,终端 4分别从终端 1的音频通道、 终端 2的音频通道和终端 3的音频通道接收终端 1、 终端 2和 终端 3的音频数据。
步骤 805: 终端对接收到的音频数据进行解码并自动混音后播放。
该实施例中的终端相应支持打开多个音频接收通道、支持同时解码多路音 频数据并且支持将解码后的多路音频数据混合后输出到扬声器。 以上述终端 1 接收到的音频数据为例,终端 1将对从终端 2的音频通道和终端 3的音频通道 接收到的两路音频数据分别进行解码后再混音输出到扬声器。
在上述本发明第四实施例中, 当并非所有的终端与 MCU互通都支持多音 频逻辑通道时, 则 MCU需要为不支持多个逻辑通道的终端创建用于混音和编 码的资源, 并且支持自动音频协议适配, 即自动把支持多音频逻辑通道终端发 送的音频数据经过解码、混音编码后发送给不支持多音频逻辑通道的终端, 以 保持对不支持多音频逻辑通道的终端的兼容。
本发明音频处理方法第五实施例的结构示意图如图 9所示,图 9中控制服 务器为 MCU, 终端 Al、 终端 A2、 终端 A3和终端 A4分别与 MCU— A相连, 终端 Bl、终端 B2、终端 B3和终端 B4分别与 MCU— B相连, 上述终端通过与 MCU连接实现多点音频处理, 其中每个终端与 MCU之间有三条音频发送通 道(如图中单向实线箭头所示)和一条音频接收通道(如图中虚线箭头所示), 图中示出了每个终端与 MCU之间有四条逻辑通道, MCU之间实现一路呼叫 (如图中双向实线箭头所示)。 结合图 9所示的结构图, 本发明音频处理方法 的第五实施例流程如图 10所示, 该实施例示出了具有多个音频逻辑通道的终 端与两个级联 MCU之间进行音频数据处理的过程:
步骤 1001 : 终端发起呼叫后接入 MCU— A并将编码后的音频数据发送给 该 MCU— A。
在发起呼叫时, 通常与 MCU通过能力协商确定终端与级联 MCU之间支 持多路呼叫级联, 由于能力协商标准协议中带有非标能力协议字段, 因此通过 该非标能力协议字段描述支持多路呼叫级联的能力 , 同样 MCU之间的级联呼 叫也使用同样的流程。例如,假设在能力协商标准协议的扩展能力字段里定义
4个字节的内容 "OxOaOb",则进行能力协商时, MCU发现终端在非标能力字段 里标注了" OxOaOb", 则表示支持多路呼叫级联的能力, 当呼叫成功后的音频处 理就能够按照多路呼叫级联的方式进行。
步骤 1002: MCU— A创建针对多个逻辑通道的解码器。
步骤 1003: MCU— A根据解码出的音频数据音量选出需要混音的终端。 步骤 1004 : 将需要混音的终端的几路音频逻辑通道数据直接转发到
MCU— B。
步骤 1005: MCU— B创建解码器后根据音量选出对 MCU— A的音频数据进 行替换的音频数据。
步骤 1006: MCU— B将替换后的几路音频数据通过三条音频逻辑通道直接 发送到各个终端。
步骤 1007: 终端对接收到的音频数据进行解码后自动混音后播放。
上述本发明方法的第五实施例中, 当所有的终端都支持多音频逻辑通道 时, 则通过发送端的 MCU给发送端的终端创建音频编码器, 接收端的 MCU 给接收端的终端创建音频解码器即可, 因此无论级联了多少个 MCU, 只需要 在发送端 MCU的终端进行编码, 并在接收端 MCU的终端对从多音频通道传 输的音频数据分别进行解码后混音,整个音频处理过程只进行一次音频编码和 解码的操作。 即发送端 MCU的终端发送音频编码数据, 发送端 MCU将音频 数据通过多音频逻辑通道在多个 MCU之间级联传输, 当传输到接收端 MCU 时, 该接收端 MCU无需进行解码而是直接根据多音频逻辑通道能力, 对多逻 辑通道的音频数据, 用该接收端 MCU音量较大的终端发送的音频逻辑通道的 音频数据进行相应替换后, 发送至接收端 MCU的终端, 由接收端 MCU的终 端对替换后的经多音频逻辑通道传输的多路音频数据分别进行解码。
当并非所有的终端都支持多音频逻辑通道时, 则发送端的 MCU无需给发 送端的终端创建音频编码器,接收端的 MCU给接收端的终端创建音频编码器 和解码器, 并且接收端 MCU需要对接收到的级联传输的音频数据包进行解码 及重新替换后的编码操作 , 使得各个终端之间能够兼容。
因此, 无论级联了多少个 MCU, 音频数据包在除了接收端 MCU外的其 它 MCU之间传输时无需进行任何编码和解码操作。 由此整个级联传输的音频 处理过程只需要进行两次编解码的操作, 即发送端 MCU将音频数据通过多音 频逻辑通道在多个 MCU之间级联传输, 当传输到接收端 MCU时, 由于不支 持多音频逻辑通道, 该接收端 MCU需要对该多音频逻辑通道的音频数据进行 解码, 并用该接收端 MCU的终端发送的较大音量的音频数据替换多音频通道 的音频数据中较小音量的音频数据,接收端 MCU对替换后的多路音频数据重 新编码后发送该接收端 MCU的终端, 接收端 MCU的终端接收到音频数据包 后进行解码。
本发明音频处理方法第六实施例的结构示意图如图 11所示,图 11中控制 服务器为 MCU,终端 1和终端 2与 MCU— A连接,终端 3和终端 4与 MCU— B 连接, 终端通过与 MCU连接实现多点音频处理, 同时在 MCU— A和 MCU— B 之间实现多路级联呼叫 ,即级联的 MCU— A和 MCU— B之间根据需要混音的终 端数目动态建立多路呼叫,每路呼叫只有一条音频通道,各个音频通道之间的 协议可以不同, 如图 11中所示在 MCU— A和 MCU— B之间建立了三路级联呼 叫 (如图中双向实线箭头所示), 每个终端与 MCU之间建立一路呼叫。 结合 图 11所示的结构图, 本发明音频处理方法的第六实施例流程如图 12所示, 该 实施例示出了 MCU之间通过多路呼叫级联进行音频数据处理的过程:
步骤 1201 : 终端发起呼叫后接入 MCU— A并将编码后的音频数据发送给 该 MCU— A。
步骤 1202: MCU— A为接入的终端创建解码器。
步骤 1203: MCU— A根据解码后的音频数据音量选出需要混音的终端。 步骤 1204: MCU— A将需要混音的终端的音频数据分别从对应的 MCU— A 的相应音频协议端口转发到 MCU— B上支持该音频协议的端口。
步骤 1205: MCU— B创建解码器后对从 MCU— A各个端口发送的音频数据 进行解码。
步骤 1206: MCU— B按照音量大小从接收到的 MCU— A发送的多路音频数 据和该 MCU— B的终端发送的多路音频数据中选出需要混音的音频数据。
步骤 1207: MCU— B将选出的多路音频数据混音后发送到各个终端。 步骤 1208: 终端对接收到的音频数据进行解码并自动混音后播放。 对于 MCU之间的级联通常使用一对 MCU级联端口实现音频呼叫, 但是 上述本发明第六实施例中在两个级联的 MCU之间通过多对端口实现支持不同 音频协议的多路呼叫 , 由此实现对多路音频数据进行多路混音。
当有终端支持多声道分离音频编解码协议或支持多音频逻辑通道时,则可 以直接把级联 MCU的终端发送的不同音频协议的音频数据发送至上述终端。 因此无论中间经过多少个级联的 MCU, 则仅需要一次音频编码和一次音频解 码即可。 例如, 图 11中终端 1和终端 2分别为支持不同音频协议的终端, 而 终端 3为支持多音频逻辑通道的终端,在级联的 MCU— A和 MCU— B之间建立 了对应三个终端的三路级联呼叫,则终端 1和终端 2将各自的音频数据编码后 发送到 MCU— A, MCU— A将终端 1的音频数据和终端 2的音频数据通过级联 呼叫 1和级联呼叫 2分别发送到 MCU— B, MCU— B将两路音频数据组包后发 送到终端 3, 由终端 3对该音频数据包进行解码即可。
当终端分别支持多种音频协议时, 则通过发送端的 MCU给发送端的终端 创建音频编码器, 然后接收端的 MCU对接收到的级联传输的多路音频数据进 行解码、 混音编码后发送到接收端的终端进行解码, 接收端的 MCU给接收端 的终端创建音频解码器即可。 因此, 无论级联了多少个 MCU, 音频数据包在 除了发送端 MCU和接收端 MCU外的其它 MCU之间传输时无需进行任何编 码和解码操作,整个级联传输的音频处理过程只需要进行两次编解码的操作即 可。 例如, 图 11中终端 1、 终端 2和终端 3分别为支持不同音频协议的终端, 在级联的 MCU— A和 MCU— B之间建立了对应三个终端的三路级联呼叫 ,则终 端 1和终端 2将各自的音频数据编码后发送到 MCU— A, MCU— A将终端 1和 终端 2的音频数据通过级联呼叫 1和级联呼叫 2分别发送到 MCU— B, MCU— B 对接收到的两路音频数据进行解码,然后混音并重新编码为对应终端 3的音频 协议的音频数据, 并将该编码后的音频数据发送到终端 3, 终端 3接收到音频 数据后按照其所支持的音频协议对音频数据进行解码。
结合本发明方法实施例, 当业务操作平台对 MCU进行调度时, 能够按照 与终端进行能力协商时获取的能力情况, 自动选择合适的 MCU级联方案。 例 如, 对于级联^义, 当所有终端都支持多声道分离音频编解码协议时, 则自动 调度多声道分离音频编解码协议的级联会议;当所有终端都支持多个音频逻辑 通道时, 则自动调度多个音频逻辑通道的级联会议; 当部分终端支持多声道分 离音频编解码协议, 而部分终端为普通终端时, 则自动调度含有多声道分离音 频编解码协议的终端和其它音频协议的终端的多路呼叫级联会议;当部分终端 支持多个音频逻辑通道而部分终端为普通终端时,则自动调度含有所有音频协 议的级联会场。 对于单 MCU 义, 当所有终端都支持多声道分离音频编解码 协议时 , 则自动调度多声道分离音频编解码协议的单 MCU会议; 当所有终端 都支持多音频逻辑通道时 , 则自动调度多音频逻辑通道的单 MCU会议。
与本发明音频处理方法的实施例相对应 ,本发明还提供了音频处理系统的 实施例。
本发明音频处理系统的实施例框图如图 13所示:
该系统包括: 至少一个控制服务器 1310和多个终端 1320。
其中, 控制服务器 1310用于通过能力协商获取所述终端的音频能力, 并 按照所述音频能力转发编码后的音频数据到各个终端; 终端 1320用于接入所 述控制服务器, 并对接收到的音频数据进行解码并自动混音后播放。
与本发明音频处理方法和系统的实施例相对应,本发明还提供了一种控制 服务器。
本发明控制服务器包括: 获取单元 1410和转发单元 1420。 其中, 获取单 元 1410用于通过能力协商获取所述终端的音频能力;转发单元 1420用于按照 所述音频能力转发所述编码后的音频数据到各个终端。
进一步的,当选择多路音频数据进行打包后在一个音频逻辑通道内转发时
(即获取单元 1410获取的音频能力为支持多声道分离音频编解码协议),所述 转发单元 1420包括(图 14中未示出): 选择单元, 用于按照所述音频数据的 音量选择进行混音的几路终端的音频数据;提取单元, 用于提取所述几路终端 的独立声道内的音频数据;发送单元, 用于将所述提取出的音频数据进行打包 后通过一个音频逻辑通道发送到各个终端。
当选择多路音频数据进行打包后在一个音频逻辑通道内转发(即获取单元 1410获取的音频能力为支持多声道分离音频编解码协议), 且所述控制服务器 为级联的多个控制服务器中的发送端控制服务器,所述转发单元 1420包括(图 14中未示出): 选择单元, 用于按照所述音频数据的音量选择进行混音的几路 终端的音频数据; 提取单元, 提取所述几路终端的独立声道内的音频数据; 传 输单元,用于将所述提取出的音频数据进行打包后通过一个音频逻辑通道级联 传输到接收端控制服务器。
当选择多路音频数据进行打包后在一个音频逻辑通道内转发,且所述控制 服务器为级联的多个控制服务器中的接收端控制服务器, 所述转发单元 1420 包括(图 14中未示出 ): 选择单元, 用于根据音量选出对所述发送端控制服务 器发送的音频数据进行替换的接收端的音频数据;发送单元, 用于将替换后的 所述音频数据重新打包后通过一个音频逻辑通道发送到各个终端。
当多路音频数据在多个音频逻辑通道内转发时(即获取单元 1410获取的 音频能力为支持多音频逻辑通道),所述转发单元 1420包括(图 14中未示出): 选择单元, 用于按照所述音频数据的音量选择进行混音的几路终端的音频数 据;发送单元,用于将所述几路终端的音频数据通过所述多个音频逻辑通道直 接发送到各个终端。
当多路音频数据在多个音频逻辑通道内转发(即获取单元 1410获取的音 频能力为支持多音频逻辑通道), 且所述控制服务器为级联的多个控制服务器 中的发送端控制服务器, 所述转发单元 1420包括(图 14中未示出 ): 选择单 元, 用于按照所述音频数据的音量选择进行混音的几路终端的音频数据;传输 单元,用于将所述几路终端的音频数据通过多个音频逻辑通道级联传输到接收 端控制服务器。
当多路音频数据在多个音频逻辑通道内转发,且所述控制服务器为级联的 多个控制服务器中的接收端控制服务器, 所述转发单元 1420包括(图 14中未 示出): 选择单元, 用于根据音量选出对所述发送端控制服务器发送的音频数 据进行替换的接收端的音频数据;发送单元, 用于将所述替换后的音频数据通 过所述多个音频逻辑通道直接发送到各个终端。
当控制服务器为多路呼叫级联的多个控制服务器中的发送端控制服务器, 所述转发单元 1420包括(图 14中未示出): 选择单元, 用于按照所述音频数 据的音量选择进行混音的几路终端的音频数据; 传输单元, 用于将所述几路终 端的音频数据分别从与所述终端对应的音频协议的端口级联传输到接收端控 制服务器的对应端口。 当所述控制服务器为多路呼叫级联的多个控制服务器中的接收端控制服 务器, 所述转发单元 1420包括(图 14中未示出 ): 选择单元, 用于根据音量 从接收到的发送端控制服务器发送的音频数据和该接收端的音频数据中选择 进行混音的几路音频数据;发送单元, 用于将所述几路音频数据混音后发送到 各个终端。
当接收音频数据的终端不支持多声道分离音频编解码协议和多音频逻辑 通道时, 控制服务器还可以包括: 创建单元, 用于为所述终端创建用于混音和 编码的资源; 所述转发单元 1420包括(图 14中未示出): 选择单元, 用于按 照预先设置的策略选择进行混音的几路终端的音频数据; 传输单元, 用于将所 述音频数据通过所述资源进行解码和混音编码后发送到所述终端。
需要说明的是, 上述实施例中控制服务器均根据音量选择进行混音的终 端 ,但是在实际应用中 ,还可以根据预先设置的其它策略选择进行混音的终端 , 这些策略可以包括: 按照终端的呼叫标识选择进行混音的终端(例如, 具有特 殊标识的终端为待选择的终端)、 或按照终端的呼叫顺序选择进行混音的终端 (例如, 呼叫靠前的若干终端为待选择的终端)等。
通过本发明实施例的描述可知,本发明实施例中的音频数据无需每经过一 个控制服务器都进行一次音频编解码操作 ,因此极大降低了控制服务器的编解 码次数,特别在仅有一个控制服务器的情况下, 终端与终端之间的音频延时只 有网络传输、发送终端的编码和接收终端的解码, 而控制服务器由于只对音频 数据进行抽包和组包的重组, 因此延时可以忽略不计,增强了终端之间交互的 实时性, 并且减少了控制服务器对音频编解码资源的占用, 降低了成本。 在减 少了控制服务器自身编解码次数的情况下实现多路混音,能够与现有标准协议 控制服务器保持良好的兼容性,可以广泛应用于会议电视和会议电话等通信领 域。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤 是可以通过程序来指令相关的硬件来完成,所述的程序可以存储于一计算机可 读取存储介质中, 该程序在执行时, 包括如下步骤: 终端接入控制服务器后, 控制服务器通过能力协商获取所述终端的音频能力;控制服务器按照所述音频 能力转发编码后的音频数据到各个终端。 所述的存储介质, 如: ROM/RAM、 磁碟、 光盘等。
虽然通过实施例描绘了本发明,本领域普通技术人员知道,本发明有许多 变形和变化而不脱离本发明的精神,希望所附的权利要求包括这些变形和变化 而不脱离本发明的精神。

Claims

权 利 要 求
1、 一种音频处理方法, 其特征在于, 包括:
控制服务器接收接入到所述控制服务器上的各个终端发送的编码后的音 频数据;
控制服务器通过与所述各个终端进行能力协商获取所述各个终端的音频 能力;
控制服务器将从所述音频数据中提取的音频数据按照所述音频能力转发 到所述各个终端。
2、 根据权利要求 1所述的方法, 其特征在于, 所述控制服务器按照所述 音频能力使用下述任一方式转发所述提取的音频数据到各个终端;
当所述各个终端均支持多声道分离音频编解码协议时,控制服务器选择所 述音频数据中的多路音频数据进行打包后,通过一个音频逻辑通道传输到各个 终端;
当所述各个终端支持多音频逻辑通道时 ,控制服务器选择所述音频数据中 的多路音频数据, 通过多个音频逻辑通道传输到各个终端。
3、 根据权利要求 2所述的方法, 其特征在于, 仅有一个控制服务器时, 所述控制服务器将从所述音频数据中提取的音频数据按照所述音频能力转发 到接入所述控制服务器的各个终端; 或
级联多个控制服务器时,所述多个控制服务器按照所述音频能力级联传输 发送端控制服务器从接入所述发送端控制服务器的各个终端发送的编码后的 音频数据中提取的音频数据至接收端控制服务器,并由所述接收端控制服务器 转发所述提取的音频数据到接入该接收端控制服务器的各个终端。
4、 根据权利要求 3所述的方法, 其特征在于, 仅有一个控制服务器, 且 终端支持多声道分离音频编解码协议,控制服务器选择所述音频数据中的多路 音频数据进行打包后, 通过一个音频逻辑通道传输到各个终端包括:
所述控制服务器按照预先设置的策略选择进行混音的几路终端的音频数 据;
提取所述几路终端的独立声道内的音频数据;
将所述提取出的音频数据进行打包后通过一个音频逻辑通道发送到各个 终端。
5、 根据权利要求 3所述的方法, 其特征在于, 级联多个控制服务器, 且 终端支持多声道分离音频编解码协议,控制服务器选择所述音频数据中的多路 音频数据进行打包后, 通过一个音频逻辑通道传输到各个终端包括:
发送端控制服务器按照预先设置的策略选择进行混音的几路终端的音频 数据;
发送端控制服务器提取所述几路终端的独立声道内的音频数据; 将所述提取出的音频数据进行打包后级联传输到接收端控制服务器; 接收端控制服务器根据预先设置的策略选出对所述发送端控制服务器发 送的音频数据进行替换的接收端的音频数据;
接收端控制服务器将替换后的所述音频数据重新打包后通过一个音频逻 辑通道发送到各个终端。
6、 根据权利要求 4或 5所述的方法, 其特征在于, 所述对音频数据打包 包括:
抽取所述不同声道内的音频数据,并将所述抽取出的音频数据合并成一个 音频数据包; 或
将不同声道的音频数据直接进行分离式打包。
7、 根据权利要求 3所述的方法, 其特征在于, 仅有一个控制服务器, 且 终端支持多音频逻辑通道, 控制服务器选择所述音频数据中的多路音频数据, 通过多个音频逻辑通道传输到各个终端包括:
所述控制服务器按照预先设置的策略选择进行混音的几路终端的音频数 据;
将所述几路终端的音频数据通过所述多个音频逻辑通道直接发送到各个 终端。
8、 根据权利要求 3所述的方法, 其特征在于, 级联多个控制服务器, 且 终端支持多音频逻辑通道, 控制服务器选择所述音频数据中的多路音频数据, 通过多个音频逻辑通道传输到各个终端包括:
所述控制服务器按照预先设置的策略选择进行混音的几路终端的音频数 据; 将所述几路终端的音频数据级联传输到接收端控制服务器;
接收端控制服务器根据预先设置的策略选出对所述发送端控制服务器发 送的音频数据进行替换的接收端的音频数据;
将所述替换后的音频数据通过所述多个音频逻辑通道直接发送到各个终 端。
9、 根据权利要求 3所述的方法, 其特征在于, 多个级联控制服务器之间 具有多路呼叫,控制服务器将从所述音频数据中提取的音频数据按照所述音频 能力转发到所述各个终端包括:
发送端控制服务器按照预先设置的策略选择进行混音的几路终端的音频 数据;
将所述几路终端的音频数据分别从与所述终端对应的音频协议的端口级 联传输到接收端控制服务器的对应端口;
接收端控制服务器根据预先设置的策略从接收到的音频数据和该接收端 的音频数据中选择进行混音的几路音频数据;
接收端控制服务器将所述几路音频数据混音后发送到各个终端。
10、 根据权利要求 4、 5、 7、 8、 9任意一项所述的方法, 其特征在于, 所 述预先设置的策略包括: 所述音频数据的音量大小、 所述终端的呼叫标识、 或 所述终端的呼叫顺序。
11、根据权利要求 3所述的方法, 其特征在于, 当终端不支持所述多声道 分离音频编解码协议和多音频逻辑通道时,还包括: 控制服务器为所述终端创 建用于混音和编码的资源;
所述控制服务器将从所述音频数据中提取的音频数据按照所述音频能力 转发到所述各个终端包括:
所述控制服务器按照预先设置的策略选择进行混音的几路终端的音频数 据;
将所述音频数据通过所述资源进行解码和混音编码后发送到所述终端。
12、 一种音频处理系统, 其特征在于, 包括: 至少一个控制服务器和多个 终端,
所述控制服务器用于,接收接入到所述控制服务器上的各个终端发送的编 码后的音频数据 ,通过与所述各个终端进行能力协商获取所述各个终端的音频 能力,将从所述音频数据中提取的音频数据按照所述音频能力转发到所述各个 终端;
所述终端用于接入所述控制服务器,并对接收到的音频数据进行解码并自 动混音后播放。
13、 一种控制服务器, 其特征在于, 包括:
获取单元,用于接收接入到所述控制服务器上的各个终端发送的编码后的 音频数据 , 通过与所述各个终端进行能力协商获取所述各个终端的音频能力; 转发单元,用于将从所述音频数据中提取的音频数据按照所述音频能力转 发到所述各个终端。
14、 根据权利要求 13所述的控制服务器, 其特征在于, 所述获取单元获 取的音频能力为支持多声道分离音频编解码协议, 所述转发单元包括:
选择单元, 用于按照预先设置的策略选择进行混音的几路终端的音频数 据;
提取单元, 用于提取所述几路终端的独立声道内的音频数据;
发送单元,用于将所述提取出的音频数据进行打包后通过一个音频逻辑通 道发送到各个终端或级联端口。
15、 根据权利要求 13所述的控制服务器, 其特征在于, 所述获取单元获 取的音频能力为支持多声道分离音频编解码协议,且所述控制服务器为级联的 多个控制服务器中的发送端控制服务器, 所述转发单元包括:
选择单元, 用于按照预先设置的策略选择进行混音的几路终端的音频数 据;
提取单元, 提取所述几路终端的独立声道内的音频数据;
传输单元,用于将所述提取出的音频数据进行打包后通过一个音频逻辑通 道级联传输到接收端控制服务器。
16、 根据权利要求 13所述的控制服务器, 其特征在于, 所述获取单元获 取的音频能力为支持多音频逻辑通道, 所述转发单元包括:
选择单元, 用于按照预先设置的策略选择进行混音的几路终端的音频数 据; 发送单元,用于将所述几路终端的音频数据通过所述多个音频逻辑通道直 接发送到各个终端。
17. 根据权利要求 13所述的控制服务器, 其特征在于, 所述获取单元获 取的音频能力为支持多音频逻辑通道,且所述控制服务器为级联的多个控制服 务器中的发送端控制服务器, 所述转发单元包括:
选择单元, 用于按照预先设置的策略选择进行混音的几路终端的音频数 据;
传输单元,用于将所述几路终端的音频数据通过多个音频逻辑通道级联传 输到接收端控制服务器。
18、 根据权利要求 13所述的控制服务器, 其特征在于, 所述控制服务器 为多路呼叫级联的发送端控制服务器, 所述转发单元包括:
选择单元, 用于按照预先设置的策略选择进行混音的几路终端的音频数 据;
传输单元,用于将所述几路终端的音频数据分别从与所述终端对应的音频 协议的端口级联传输到接收端控制服务器的对应端口。
19、 根据权利要求 13所述的控制服务器, 其特征在于, 还包括: 创建单 元, 用于为所述终端创建用于混音和编码的资源;
所述转发单元包括:
选择单元, 用于按照预先设置的策略选择进行混音的几路终端的音频数 据;
传输单元,用于将所述音频数据通过所述资源进行解码和混音编码后发送 到所述终端。
PCT/CN2008/073694 2007-12-28 2008-12-24 Procédé, système et serveur de contrôle de traitement audio WO2009089717A1 (fr)

Priority Applications (5)

Application Number Priority Date Filing Date Title
JP2010540017A JP5320406B2 (ja) 2007-12-28 2008-12-24 オーディオ処理の方法、システム、及び制御サーバ
EP08870951.4A EP2216941B1 (en) 2007-12-28 2008-12-24 Audio processing method, system and control server
KR1020107014148A KR101205386B1 (ko) 2007-12-28 2008-12-24 오디오 처리 방법, 시스템 및 제어 서버
US12/824,892 US8531994B2 (en) 2007-12-28 2010-06-28 Audio processing method, system, and control server
US13/669,151 US8649300B2 (en) 2007-12-28 2012-11-05 Audio processing method, system, and control server

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200710305684.6 2007-12-28
CN200710305684.6A CN101471804B (zh) 2007-12-28 2007-12-28 一种音频处理方法、系统和控制服务器

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/824,892 Continuation US8531994B2 (en) 2007-12-28 2010-06-28 Audio processing method, system, and control server

Publications (1)

Publication Number Publication Date
WO2009089717A1 true WO2009089717A1 (fr) 2009-07-23

Family

ID=40828946

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2008/073694 WO2009089717A1 (fr) 2007-12-28 2008-12-24 Procédé, système et serveur de contrôle de traitement audio

Country Status (6)

Country Link
US (2) US8531994B2 (zh)
EP (1) EP2216941B1 (zh)
JP (1) JP5320406B2 (zh)
KR (1) KR101205386B1 (zh)
CN (1) CN101471804B (zh)
WO (1) WO2009089717A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102196106A (zh) * 2010-03-11 2011-09-21 华为软件技术有限公司 实现主被叫通话的方法和相关设备

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102404543B (zh) * 2010-09-13 2014-01-01 华为终端有限公司 级联会议中级联会场的处理方法、装置及级联会议系统
CN101977308B (zh) * 2010-12-02 2012-05-30 上海海视电子有限公司 视频和音频双向传输和交换联网的可视指挥系统
CN102655584B (zh) * 2011-03-04 2017-11-24 中兴通讯股份有限公司 一种远程呈现技术中媒体数据发送和播放的方法及系统
CN102868880B (zh) * 2011-07-08 2017-09-05 中兴通讯股份有限公司 一种基于远程呈现的媒体传输方法及系统
US8880412B2 (en) * 2011-12-13 2014-11-04 Futurewei Technologies, Inc. Method to select active channels in audio mixing for multi-party teleconferencing
CN102611562B (zh) * 2012-02-06 2015-06-03 华为技术有限公司 一种建立多级联通道的方法及装置
CN102710922B (zh) 2012-06-11 2014-07-09 华为技术有限公司 一种多点控制服务器的级联建立方法,设备及系统
CN103067221B (zh) * 2012-12-24 2016-08-03 广东威创视讯科技股份有限公司 一种音频通话测试系统
CN103067848B (zh) * 2012-12-28 2015-08-05 小米科技有限责任公司 实现多声道播放声音的方法、设备及系统
CN104009991B (zh) * 2014-05-28 2017-09-01 广州华多网络科技有限公司 音频通信系统和方法
CN106331396B (zh) * 2015-06-15 2020-08-14 深圳市潮流网络技术有限公司 一种电话会议的多媒体处理方法及系统
CN105187760B (zh) * 2015-07-30 2018-04-20 武汉随锐亿山科技有限公司 一种多点控制单元集群系统及方法
GB201620317D0 (en) 2016-11-30 2017-01-11 Microsoft Technology Licensing Llc Audio signal processing
CN108270935A (zh) * 2016-12-30 2018-07-10 展讯通信(上海)有限公司 会议电话装置
CN109087656B (zh) * 2017-06-14 2020-11-17 广东亿迅科技有限公司 一种基于mcu的多媒体会议混音方法及装置
CN107484075A (zh) * 2017-08-31 2017-12-15 深圳市豪恩声学股份有限公司 混音装置及声音处理系统
CN110390940A (zh) * 2018-04-16 2019-10-29 北京松果电子有限公司 音频分发系统
CN110049277A (zh) * 2019-05-28 2019-07-23 南京南方电讯有限公司 一种会议接入实现方法
CN111049848B (zh) 2019-12-23 2021-11-23 腾讯科技(深圳)有限公司 通话方法、装置、系统、服务器及存储介质
CN113038060B (zh) * 2019-12-25 2022-11-18 中国电信股份有限公司 多路音频处理方法和系统
CN111200717A (zh) * 2019-12-31 2020-05-26 苏州科达科技股份有限公司 级联会议的混音方法、装置、电子设备及介质
CN112188144B (zh) * 2020-09-14 2023-03-24 浙江华创视讯科技有限公司 音频的发送方法及装置、存储介质和电子装置
CN114520687B (zh) * 2022-02-17 2023-11-03 深圳震有科技股份有限公司 应用于卫星系统的音频数据处理方法、装置及设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1610349A (zh) * 2003-10-17 2005-04-27 华为技术有限公司 实时消息传送方法
CN1885785A (zh) * 2006-07-04 2006-12-27 华为技术有限公司 Mcu级联系统和该系统的创建及通信方法
CN1937664A (zh) * 2006-09-30 2007-03-28 华为技术有限公司 一种实现多语言会议的系统及方法

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2970645B2 (ja) * 1998-03-11 1999-11-02 日本電信電話株式会社 多地点接続会議システム構成方法及び多地点接続会議システム及びサーバ装置及びクライアント装置及び多地点接続会議システム構成プログラムを格納した記憶媒体
US7006616B1 (en) * 1999-05-21 2006-02-28 Terayon Communication Systems, Inc. Teleconferencing bridge with EdgePoint mixing
US7009943B2 (en) * 2000-11-02 2006-03-07 Polycom, Inc. Conferencing network resource management for call connectivity
JP2003023499A (ja) * 2001-07-10 2003-01-24 Matsushita Electric Ind Co Ltd 会議サーバ装置および会議システム
FI114129B (fi) * 2001-09-28 2004-08-13 Nokia Corp Konferenssipuhelujärjestely
US6981022B2 (en) * 2001-11-02 2005-12-27 Lucent Technologies Inc. Using PSTN to convey participant IP addresses for multimedia conferencing
US20030236892A1 (en) * 2002-05-31 2003-12-25 Stephane Coulombe System for adaptation of SIP messages based on recipient's terminal capabilities and preferences
WO2004030328A1 (ja) 2002-09-27 2004-04-08 Ginganet Corporation テレビ電話通訳システムおよびテレビ電話通訳方法
CN1270533C (zh) 2002-12-23 2006-08-16 中兴通讯股份有限公司 会议电视多点控制设备中数据处理的方法
US7319745B1 (en) 2003-04-23 2008-01-15 Cisco Technology, Inc. Voice conference historical monitor
ES2277010T3 (es) * 2003-09-11 2007-07-01 Sony Ericsson Mobile Communications Ab Llamada de multiples partes de dispositivos portatiles con identificacion del posicionamiento de las partes.
US20060055771A1 (en) 2004-08-24 2006-03-16 Kies Jonathan K System and method for optimizing audio and video data transmission in a wireless system
EP1672866A1 (en) * 2004-12-15 2006-06-21 Siemens S.p.A. Method and system to the instant transfer of multimedia files between mobile radio users within the scope of combinational services
CN100505864C (zh) * 2005-02-06 2009-06-24 中兴通讯股份有限公司 一种多点视频会议系统及其媒体处理方法
US7612793B2 (en) 2005-09-07 2009-11-03 Polycom, Inc. Spatially correlated audio in multipoint videoconferencing
US7800642B2 (en) 2006-03-01 2010-09-21 Polycom, Inc. Method and system for providing continuous presence video in a cascading conference
CN100438613C (zh) 2006-04-05 2008-11-26 北京华纬讯电信技术有限公司 多媒体视频会议系统中音视频码流的传输方法
CN100512422C (zh) 2006-11-23 2009-07-08 北京航空航天大学 多mcu视频会议系统中的混音方法
US7782993B2 (en) 2007-01-04 2010-08-24 Nero Ag Apparatus for supplying an encoded data signal and method for encoding a data signal
US8300556B2 (en) 2007-04-27 2012-10-30 Cisco Technology, Inc. Optimizing bandwidth in a multipoint video conference

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1610349A (zh) * 2003-10-17 2005-04-27 华为技术有限公司 实时消息传送方法
CN1885785A (zh) * 2006-07-04 2006-12-27 华为技术有限公司 Mcu级联系统和该系统的创建及通信方法
CN1937664A (zh) * 2006-09-30 2007-03-28 华为技术有限公司 一种实现多语言会议的系统及方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2216941A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102196106A (zh) * 2010-03-11 2011-09-21 华为软件技术有限公司 实现主被叫通话的方法和相关设备
CN102196106B (zh) * 2010-03-11 2013-12-04 华为软件技术有限公司 实现主被叫通话的方法和相关设备

Also Published As

Publication number Publication date
US20130064387A1 (en) 2013-03-14
EP2216941A4 (en) 2010-12-01
EP2216941A1 (en) 2010-08-11
US8531994B2 (en) 2013-09-10
EP2216941B1 (en) 2014-11-05
KR101205386B1 (ko) 2012-11-27
JP5320406B2 (ja) 2013-10-23
US20100268541A1 (en) 2010-10-21
CN101471804A (zh) 2009-07-01
KR20100086072A (ko) 2010-07-29
CN101471804B (zh) 2011-08-10
JP2011508546A (ja) 2011-03-10
US8649300B2 (en) 2014-02-11

Similar Documents

Publication Publication Date Title
WO2009089717A1 (fr) Procédé, système et serveur de contrôle de traitement audio
CN108076306B (zh) 会议实现方法、装置、设备和系统、计算机可读存储介质
KR102430838B1 (ko) 컨퍼런스 오디오 관리
US7463901B2 (en) Interoperability for wireless user devices with different speech processing formats
US7054820B2 (en) Control unit for multipoint multimedia/audio conference
US20090143029A1 (en) Press-Talk Server, Transcoder, and Communication System
US7079495B1 (en) System and method for enabling multicast telecommunications
CN113746808B (zh) 线上会议的融合通信方法、网关、电子设备及存储介质
WO2017129129A1 (zh) 即时通话方法、装置和系统
US20120134301A1 (en) Wide area voice environment multi-channel communications system and method
EP2452482B1 (en) Media forwarding for a group communication session in a wireless communications system
JP2011508546A5 (zh)
JP2019530996A (ja) マルチメディア通信におけるコンパクト並列コーデックの使用のための方法および装置
WO2012155660A1 (zh) 一种远程呈现方法、终端和系统
WO2010083737A1 (zh) 一种语音信号的处理方法、语音信号的发送方法及装置
WO2017030802A1 (en) Methods and apparatus for multimedia conferences using single source multi-unicast
CN110650260A (zh) 一种网络终端音频内外网互通系统及方法
WO2021073155A1 (zh) 视频会议方法、装置、设备及存储介质
WO2007076669A1 (fr) Procédé, dispositif, et système pour traiter un flux de données
WO2021017807A1 (zh) 通话连接建立方法和第一终端、服务器及存储介质
WO2008040186A1 (fr) Procédé, système et passerelle destinés à négocier la capacité d'un détecteur de signal des données
WO2019228534A1 (zh) 一种媒体传输方法及h323-sip网关
US7805152B2 (en) PTT architecture
CN101668092A (zh) 一种网络多媒体终端实现补充业务拨号音的方法和装置
CN109151559B (zh) 一种多方会话方法及家庭网关机顶盒一体机

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08870951

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2008870951

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 20107014148

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2010540017

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE