WO2011057511A1 - 实现混音的方法、装置和系统 - Google Patents

实现混音的方法、装置和系统 Download PDF

Info

Publication number
WO2011057511A1
WO2011057511A1 PCT/CN2010/075891 CN2010075891W WO2011057511A1 WO 2011057511 A1 WO2011057511 A1 WO 2011057511A1 CN 2010075891 W CN2010075891 W CN 2010075891W WO 2011057511 A1 WO2011057511 A1 WO 2011057511A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
audio
channel
receiving site
site
Prior art date
Application number
PCT/CN2010/075891
Other languages
English (en)
French (fr)
Inventor
詹五洲
王东琦
Original Assignee
华为终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为终端有限公司 filed Critical 华为终端有限公司
Priority to EP10829475.2A priority Critical patent/EP2490426B1/en
Publication of WO2011057511A1 publication Critical patent/WO2011057511A1/zh
Priority to US13/469,782 priority patent/US8773491B2/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/568Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a method, apparatus, and system for implementing mixing. Background technique
  • Video conferencing can be understood as a video conferencing service in the usual sense.
  • multimedia communication means using television equipment and communication networks to hold meetings, the interactive functions of images, voices and data between two places or multiple locations can be realized at the same time.
  • a video conferencing system includes a video terminal device, a communication network, and a multi-point control unit (MCU).
  • MCU multi-point control unit
  • the next-generation conference terminal usually have only mono or dual channels, generally have no spatial orientation, or can only distinguish between left and right orientations.
  • the next-generation conference terminal generally adopts a multi-screen scheme, and the image is the same size as a real person.
  • the prior art adopts two schemes: one is to encode and transmit based on multi-channel method; the other is to encode and transmit based on audio protocol of audio object, thereby It is possible to carry the orientation and spatial sense of the sound with a small increase in the code rate.
  • Existing MCU mixing methods are typically channel based mixing schemes.
  • the existing channel-based mixing method can only be compatible with a conventional conference terminal in the same conference, and is based on multi-channel based Next-generation terminals and next-generation terminals based on audio objects cannot be compatible.
  • Embodiments of the present invention provide a method, apparatus, and system for implementing mixing that can improve compatibility with different conferencing terminals.
  • a method of achieving mixing including:
  • the selected audio signal is processed according to the type of the receiving site, where the receiving site includes a channel-based receiving site and an audio object-based receiving site;
  • a device for realizing mixing specifically a multi-point control unit, comprising:
  • a receiving module configured to receive an audio signal sent by each transmitting site, where the audio signal includes a channel-based audio signal and an audio object-based audio signal;
  • a selection module configured to select an audio signal for each receiving site in the received audio signal
  • a processing module configured to process the selected audio signal according to a type of the receiving site, where the receiving site includes a channel based Receiving site and receiving site based on audio objects;
  • the sending module is configured to send the processed audio signal to each receiving site according to the type of the receiving site.
  • a system for implementing a mix comprising: a plurality of sending sites and receiving sites, and a multipoint control unit, wherein
  • the sending site is configured to send an audio signal to a multipoint control unit, where the audio signal is A channel-based audio signal and an audio object-based audio signal are included;
  • the multi-point control unit is configured to receive an audio signal sent by the sending site, select an audio signal for each receiving site in the received audio signal, and process the selected audio signal according to the type of the receiving site, according to Receiving a type of the conference site, and transmitting the processed audio signal to each receiving site, where the receiving site includes a channel-based receiving site and an audio object-based receiving site;
  • the receiving site is configured to receive the processed audio signal by the multipoint control unit.
  • the method, the device and the system for implementing the mixing provided by the embodiment of the present invention, after receiving the audio signal based on the channel and the audio signal based on the audio object, the multi-point control unit selects an audio signal for each receiving site, The selected audio signal is processed according to the type of the receiving site, and the processed audio signal is sent to each receiving site.
  • the multi-point control unit can mix and match the channel-based audio signal and the audio object-based audio signal, thereby being compatible with the traditional conference terminal and multi-channel based in the same multipoint conference.
  • Next-generation terminals and next-generation terminals based on audio objects improve the quality of user experience.
  • FIG. 1 is a flowchart of a method for implementing mixing according to Embodiment 1 of the present invention
  • FIG. 3 are flowcharts of a method for implementing mixing according to Embodiment 2 of the present invention.
  • FIG. 4 is a diagram of converting a selected audio signal into a channel-based connection according to a second embodiment of the present invention. a flow chart of an audio signal with a consistent number of channels;
  • FIG. 5 is a schematic diagram of an audio presentation manner of a telepresence terminal according to Embodiment 2 of the present invention.
  • FIG. 6 is a schematic structural diagram of an apparatus for implementing mixing according to Embodiment 3 of the present invention.
  • FIG. 7 and FIG. 8 are schematic diagrams showing the structure of an apparatus for implementing mixing according to Embodiment 4 of the present invention.
  • FIG. 9 is a schematic structural diagram of a system for implementing mixing according to Embodiment 5 of the present invention.
  • the embodiment provides a method for implementing mixing.
  • the method for implementing mixing includes: 101. Receiving an audio signal sent by each sending site, where the audio signal includes channel-based audio. a signal and an audio signal based on an audio object;
  • the selected audio signal is processed according to a type of the receiving site, where the receiving site includes a channel-based receiving site and an audio object-based receiving site;
  • the same site can send an audio signal or an audio signal. That is, the sending site and the receiving site can be the same site.
  • the multi-point control unit receives the audio signal based on the channel and the audio signal based on the audio object, and selects an audio signal for each receiving site according to the type of the receiving site. The selected audio signal is processed, and the processed audio signal is transmitted to each receiving site.
  • the multi-point control unit can mix and match the channel-based audio signal and the audio object-based audio signal, thereby being compatible with the traditional conference terminal and multi-channel based in the same multipoint conference.
  • Next-generation terminals and next-generation terminals based on audio objects improve the quality of user experience.
  • the multi-point conference system has a plurality of sending sites and a receiving site, and the sending site includes a channel-based transmitting site and an audio object-based transmitting site, and the audio signal sent by the sending site is performed by the MCU. Mixing; where, the same site can send both audio and audio signals, that is, the sending site and the receiving site can be the same site.
  • the method for implementing mixing includes:
  • the MCU receives an audio signal sent by each sending site, where the audio signal includes a channel-based audio signal and an audio object-based audio signal.
  • the type of the audio signal is determined according to the type of the sending site. If the sending site is a channel-based transmitting site, the audio signal sent by the sending site is a channel-based audio signal; if the sending site is based on an audio object Sending the site, the audio signal sent by the sending site is an audio signal based on the audio object; the channel-based transmitting site may be a single or multi-channel based transmitting site, and correspondingly, the channel-based
  • the audio signal may be a mono or multi-channel based audio and audio object, which refers to viewing the sound source as an object, and the audio object includes auxiliary information in addition to the audio signal, and the auxiliary information includes the auxiliary information.
  • the absolute absolute energy of the audio object Information such as energy ratio, spatial information, position information, different playback modes and their corresponding parameters.
  • the MCU selects an audio signal for each receiving site in the received audio signal.
  • the selecting, by the MCU, the audio signal for each receiving site may be selected according to the energy of each audio signal, and the specific selection process may include:
  • the MCU calculates energy of the channel-based audio signal and/or energy of the audio object based audio signal, respectively;
  • the energy of the channel-based audio signal is the energy of the mono
  • the energy of each channel is separately calculated, and then the maximum channel energy is taken as the energy of the channel-based audio signal; or the average value of the energy of each channel is taken as the basis The energy of the audio signal of the channel.
  • the auxiliary information of the audio object carries the maximum absolute energy and the energy ratio of the audio object, and the absolute energy of the audio object is calculated according to the maximum absolute energy in the audio object and the energy ratio of the audio object.
  • the maximum absolute energy is Emax
  • the energy ratios of the audio objects Sl, S2, and S3 are a l, a2, and a 3 , respectively
  • the absolute energies of the three audio objects are Emax ⁇ a l , Emax a2 , Emax x a 3 , respectively.
  • the MCU selects, according to the energy of the channel-based audio signal and/or the energy of the audio signal based on the audio object, a plurality of audio signals with a larger audio signal energy for the conference site; wherein, the MCU receives each The audio signals selected by the venue may be the same or different.
  • a multipoint conference system there are five conference sites, B, C, D, and E.
  • the five conference sites can receive audio signals as well as audio signals.
  • A, B, C, D, and E are distributed at five venues.
  • Audio signal pair It should be Al, Bl, Cl, Dl, El;
  • the MCU selects three audio signals of Bl, Cl, and Dl according to the energy of the audio signal sent by each site. Since each site usually does not receive the audio signal from its own site, therefore, The audio signals selected by the MCU for each site are shown in Table 1:
  • the audio signals sent by the two conference sites that need to be privately talked can only be selected by the other party of the private chat, and cannot be selected by other conference sites, therefore,
  • the audio signals selected by the MCU for each venue are shown in Table 2:
  • the selected audio signal is processed according to the type of the receiving site, where the receiving site may be a channel-based receiving site or an audio object-based receiving site;
  • the type of the receiving field handles the selected audio signal including:
  • step Ll determining the type of the selected audio signal, if the selected audio signal is a channel-based audio signal, performing step L2; if the selected audio signal is an audio signal based on the audio object, performing step L3;
  • the channel-based venue may have a single channel or multiple channels, and for multiple channels, Generally, the microphones are placed in different spatial positions, and then the signals collected by the respective microphones are encoded by one channel. Since the channels already contain spatial information, when the receiving site is played with the same number of speakers, It is possible to present the sound space information of the original sending site.
  • the receiving site is a two-channel signal
  • the channel-based audio signal selected by the MCU for the receiving site is a mono signal and a three-channel signal
  • the MCU simultaneously copies the mono signal to the receiving.
  • the left channel and the right channel of the site that is, the left and right channel signals have the same content; and for the three channel signals, the MCU copies the first channel signal to the left channel of the receiving site, and the third channel signal Copy to the right channel of the receiving site, multiply the second channel signal by the gain of 0. 707, and then simultaneously add to the left and right channels of the receiving site, so that the mono signal and the three The channel signals are converted to two-channel signals.
  • step L3 determining whether the auxiliary information of the audio object carries a play mode that is consistent with the number of the receiving site, if the auxiliary information of the audio object carries the same number of channels as the receiving site The method, step L4 is performed; if the auxiliary information of the audio object does not carry the playing mode that matches the number of the receiving site, step L5 is performed;
  • the auxiliary information of the audio object carries multiple playing modes and corresponding parameters thereof;
  • the playback mode refers to playing with several channels, for example: two channels, five channels, etc., the parameters of each play mode indicate the energy distribution of the audio objects in each channel, and the energy distribution can be over time. Variety.
  • the receiving site is a two-channel
  • the auxiliary information carries a two-channel playing mode
  • the MCU extracts a parameter corresponding to the two-channel playing mode, and the audio object signal is allocated according to the parameter.
  • L5. Convert the audio object into a signal consistent with the number of receiving venue channels according to the orientation information of the audio object, where the orientation information is carried in the auxiliary information of the audio object.
  • the auxiliary information has only a two-channel playback mode and a 5-channel playback mode
  • the receiving site is 6 channels
  • the MCU 4 converts the audio object into the audio object according to the orientation information of the audio object. 6-channel audio signal.
  • the signal that is converted according to the orientation information of the audio object to the number of the receiving site channels may be: according to the orientation of the audio object and the speaker corresponding to each channel of the receiving site Position, determining a speaker that is closest to the audio object; copying the audio object-based audio signal to a channel corresponding to the speaker closest to the audio object, and the other channels do not impart any signal.
  • the processing of the selected audio signal according to the type of the receiving site includes:
  • 203c Convert the selected audio signal into an audio object according to a presentation manner of the receiving site; wherein, converting the selected audio signal into an audio pair according to a presentation manner of the receiving site
  • the specifics can include:
  • a presentation manner of the selected audio signal according to the presentation manner of the receiving site; for example, as shown in FIG. 5, it is an audio presentation mode of the telepresence terminal, and the image display screen is composed of three screens. There are two speakers at the bottom of each screen, one speaker on each side of the screen, and a total of eight speakers.
  • the conference site that is speaking may not be the site displayed on the current screen. At this time, the site's voice can be assigned to two speakers on both sides of the screen.
  • three audio signals sl, s2, and s3 are selected, wherein the video signal of the site corresponding to the audio signal s 1 is being viewed by the receiving site, and the audio signal s 1 can be set to be played by six speakers at the bottom of the screen. If the video signals of the site corresponding to the audio signals s2 and s 3 are not currently viewed, the audio signals s2 and s 3 may be designated to be played by two speakers on both sides of the screen.
  • the selected audio signal is a channel-based audio signal, converting the selected audio signal into an audio object according to the set presentation manner;
  • the selected audio signal is an audio signal based on the audio object, modify the original relevant parameters of the audio object according to the set presentation manner to meet the requirements of the set presentation mode.
  • the multi-point control unit selects an audio signal for each receiving site according to the type of the receiving site.
  • the selected audio signal is processed, and the processed audio signal is transmitted to each receiving site.
  • the multipoint control unit can be based on The audio signal of the channel and the audio signal based on the audio object are mixed, so that the same multi-point conference is compatible with the conventional conference terminal, the multi-channel based next generation terminal, and the audio object-based next generation terminal, and the user is improved. Quality of experience.
  • the embodiment provides a device for implementing mixing.
  • the device for implementing mixing includes:
  • the receiving module 61 is configured to receive an audio signal sent by each sending site, where the audio signal includes a channel-based audio signal and an audio object-based audio signal;
  • the type of the audio signal is determined according to the type of the sending site. If the sending site is a channel-based transmitting site, the audio signal sent by the sending site is a channel-based audio signal; if the sending site is based on an audio object Sending the site, the audio signal sent by the sending site is an audio signal based on the audio object; the channel-based transmitting site may be a single or multi-channel based transmitting site, and correspondingly, the channel-based
  • the audio signal can be an audio signal based on mono or multi-channel.
  • An audio object refers to viewing a sound source as an object.
  • An audio object includes auxiliary information in addition to an audio signal, and the auxiliary information includes maximum absolute energy, energy ratio, spatial information, and orientation information of the audio object. , different playback modes and their corresponding parameters and other information.
  • the selection module 62 is configured to select an audio signal for each receiving site in the received audio signal;
  • the processing module 63 is configured to process the selected audio signal according to the type of the receiving site, where the receiving site includes a receiving site of the channel and a receiving site based on the audio object;
  • the sending module 64 is configured to send the processed audio signal to each receiving site according to the type of the receiving site.
  • the same site can send an audio signal or an audio signal. That is, the sending site and the receiving site can be the same site.
  • the multi-point control unit receives the audio signal based on the channel and the audio signal based on the audio object, and selects an audio signal for each receiving site according to the type of the receiving site. The selected audio signal is processed, and the processed audio signal is transmitted to each receiving site.
  • the multi-point control unit can mix and match the channel-based audio signal and the audio object-based audio signal, thereby being compatible with the traditional conference terminal and multi-channel based in the same multipoint conference.
  • Next-generation terminals and next-generation terminals based on audio objects improve the quality of user experience.
  • the device for implementing the mixing includes:
  • the receiving module 61 is configured to receive an audio signal sent by each sending site, where the audio signal includes a channel-based audio signal and an audio object-based audio signal;
  • the type of the audio signal is determined according to the type of the sending site. If the sending site is a channel-based transmitting site, the audio signal sent by the sending site is a channel-based audio signal; if the sending site is based on an audio object Sending the site, the audio signal sent by the sending site is an audio signal based on the audio object; the channel-based transmitting site may be a single or multi-channel based transmitting site, and correspondingly, the channel-based
  • the audio signal can be an audio signal based on mono or multi-channel.
  • An audio object refers to viewing a sound source as an object.
  • An audio object includes auxiliary information in addition to an audio signal, and the auxiliary information includes maximum absolute energy, energy ratio, spatial information, and orientation information of the audio object. , different playback modes and their corresponding parameters and other information.
  • the selection module 62 is configured to select an audio signal for each receiving site in the received audio signal;
  • the processing module 63 is configured to process the selected audio signal according to the type of the receiving site, where the receiving site includes a receiving site of the channel and a receiving site based on the audio object;
  • a sending module 64 configured to separately process the processed audio signal according to a type of the receiving site Send to each receiving site.
  • the same site can send an audio signal or an audio signal. That is, the sending site and the receiving site can be the same site.
  • the selection module 62 includes:
  • a calculating unit 621 configured to separately calculate energy of the channel-based audio signal and/or energy of the audio signal based on the audio object;
  • the selecting unit 622 is configured to select an audio signal according to the energy of the channel-based audio signal and/or the energy of the audio signal based on the audio object.
  • the selecting unit 622 may select, according to the energy of the channel-based audio signal and the energy of the audio signal based on the audio object, a plurality of audio signals having a larger audio signal energy for each receiving site; the selecting unit 622 is The audio signals selected by each receiving site may be the same or different.
  • the processing module 63 may include:
  • a first converting unit 631 configured to convert the selected audio signal into an audio signal that is consistent with the number of receiving venue channels
  • the mixing unit 632 is configured to mix the converted audio signal based on the channel of the receiving site.
  • first converting unit 631 may include:
  • a first determining subunit 6311 configured to determine a type of the selected audio signal
  • a first conversion subunit 6312 configured to convert the channel-based audio signal into a signal consistent with the number of the receiving conference channel when the selected audio signal is a channel-based audio signal
  • a second determining sub-unit 6313 configured to determine, when the selected audio signal is an audio signal based on the audio object, whether the auxiliary information of the audio object carries the same number of channels as the receiving site Play mode
  • a second conversion sub-unit 6314 configured to: when the auxiliary information of the audio object carries a play mode consistent with the number of the receiving site channels, according to parameters of a corresponding play mode in the auxiliary information, the audio Converting the object into an audio signal that is consistent with the number of the receiving site channels;
  • a third conversion sub-unit 6315 configured to convert the audio object into an audio object according to the orientation information of the audio object when the auxiliary information of the audio object does not carry a playback manner consistent with the number of the received conference channel And a signal that is consistent with the number of the receiving site channels, wherein the orientation information is carried in the auxiliary information of the audio object.
  • the processing module 63 may include:
  • a second converting unit 633 configured to convert the selected audio signal into an audio object according to a manner in which the receiving site is presented;
  • the merging unit 634 is configured to merge the converted audio objects into one audio object stream.
  • the second converting unit 633 may include:
  • a setting subunit 6331 configured to set a presentation manner of the selected audio signal according to a presentation manner of the receiving site
  • a fourth conversion subunit 6332 configured to convert the selected audio signal into an audio object according to the set presentation manner when the selected audio signal is a channel-based audio signal
  • the modifying subunit 6333 is configured to modify a parameter of the audio object according to the set presentation manner when the selected audio signal is an audio signal based on the audio object.
  • the device for mixing the audio signal After receiving the channel-based audio signal and the audio object-based audio signal sent by each transmitting site, the device for mixing the audio signal is selected for each receiving site, and the selected audio signal is selected according to the type of the receiving site. Processing, and processing the processed audio signals separately Received by the venue. Compared with the prior art, the device can mix the channel-based audio signal and the audio object-based audio signal, thereby being compatible with the traditional conference terminal and the multi-channel based next generation in the same multipoint conference. Terminals and next-generation terminals based on audio objects improve the quality of user experience.
  • the embodiment provides a system for implementing the mixing.
  • the system for implementing the mixing includes: a plurality of sending sites 91 and a receiving site 93, and a multipoint control unit 92, wherein
  • the transmitting site 91 is configured to send an audio signal to the multipoint control unit 92, where the audio signal includes a channel based audio signal and an audio object based audio signal;
  • the multipoint control unit 92 is configured to receive an audio signal sent by the sending site 91, select an audio signal for each receiving site 93 in the received audio signal, and select the selected audio signal according to the type of the receiving site 93. And processing, according to the type of the receiving site, the processed audio signal is sent to each receiving site 93, wherein the receiving site 93 includes a channel-based receiving site and an audio object-based receiving site;
  • the receiving site 93 is configured to receive, by the multi-point control unit 92, the processed audio message, wherein the same site can send an audio signal or an audio signal, that is, send the site and receive the site. Can be the same venue.
  • the multi-point control unit selects an audio signal for each receiving site according to the type of the receiving site.
  • the selected audio signal is processed, and the processed audio signal is transmitted to each receiving site.
  • the multi-point control unit can mix the channel-based audio signal and the audio object-based audio signal, so that at the same point It is compatible with traditional conference terminals, multi-channel based next-generation terminals, and next-generation terminals based on audio objects to improve the user experience quality.
  • the apparatus and system for implementing mixing provided by the embodiments of the present invention can implement the method embodiments provided above.
  • the method, device and system for implementing mixing provided by the embodiments of the present invention can be applied to compatible with a conventional conference terminal, a multi-channel based next generation terminal, and a next generation terminal based on an audio object in the same multipoint conference, but not only Limited to this.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Description

实现混音的方法、 装置和系统 本申请要求于 2009 年 11 月 13 日提交中国专利局、 申请号为 200910207184.8、 发明名称为"实现混音的方法、 装置和系统"的中国专利申请的 优先权, 其全部内容通过引用结合在本申请中。
技术领域
本发明涉及通信技术领域, 特别涉及一种实现混音的方法、 装置和系统。 背景技术
目前, 随着通讯技术的发展, 视讯会议得到了广泛的开展和应用。 视讯会 议可以理解为通常意义上的电视会议业务。 通过多媒体通信手段, 利用电视设 备和通信网络召开会议, 可以同时实现两地或多个地点之间的图像、 语音、 数 据的交互功能。 通常, 视讯会议系统包括视讯终端设备、 通信网络、 多点控制 单元 ( Mul t ipoint Control Uni t , MCU )等几部分。
传统的会议终端通常只有单声道或双声道, 一般没有空间方位感, 或者只 能区分左右方位。 下一代会议终端一般采用多屏方案, 图像与真人大小相同, 为了具有很强的临场感和沉浸感, 一般要求有很强的声音方位感和空间感, 传 统的双声道已不能满足要求。 为了体现较强的方位感和空间感, 现有技术采用 能够两种方案: 一种是基于多声道的方式进行编码和传输; 另外一种是基于音 频对象的音频协议进行编码和传输, 从而实现在增加较少码率的情况下 , 可以 携带声音的方位和空间感。 现有的 MCU混音方法通常为基于声道的混音方案。
在实现本发明的过程中, 发明人发现现有技术中至少存在如下问题: 现有的基于声道的混音方法, 在同一会议中只能兼容传统的会议终端, 而 对于基于多声道的下一代终端以及基于音频对象的下一代终端, 不能实现兼容。 发明内容
本发明的实施例提供一种实现混音的方法、 装置和系统, 能够提高对不同 会议终端的兼容性。
本发明实施例采用的技术方案为:
一种实现混音的方法, 包括:
接收各发送会场发送的音频信号, 其中, 所述音频信号包括基于声道的音 频信号和基于音频对象的音频信号;
在所述接收的音频信号中为各接收会场选择音频信号;
根据接收会场的类型对所选择的音频信号进行处理, 其中, 所述接收会场 包括基于声道的接收会场和基于音频对象的接收会场;
按照接收会场的类型 , 将所述经过处理的音频信号分别向各接收会场发送。 一种实现混音的装置, 具体为多点控制单元, 包括:
接收模块, 用于接收各发送会场发送的音频信号, 其中, 所述音频信号包 括基于声道的音频信号和基于音频对象的音频信号;
选择模块, 用于在所述接收的音频信号中为各接收会场选择音频信号; 处理模块, 用于根据接收会场的类型对所选择的音频信号进行处理, 其中, 所述接收会场包括基于声道的接收会场和基于音频对象的接收会场;
发送模块, 用于按照接收会场的类型, 将所述经过处理的音频信号分别向 各接收会场发送。
一种实现混音的系统, 包括: 多个发送会场和接收会场、 多点控制单元, 其中,
所述发送会场, 用于向多点控制单元发送音频信号, 其中, 所述音频信号 包括基于声道的音频信号和基于音频对象的音频信号;
所述多点控制单元, 用于接收所述发送会场发送的音频信号, 在所述接收 的音频信号中为各接收会场选择音频信号, 根据接收会场的类型对所选择的音 频信号进行处理, 按照接收会场的类型, 将所述经过处理的音频信号分别向各 接收会场发送, 其中, 所述接收会场包括基于声道的接收会场和基于音频对象 的接收会场;
所述接收会场, 用于由所述多点控制单元接收所述经过处理的音频信号。 本发明实施例提供的实现混音的方法、 装置和系统, 多点控制单元接收到 各发送会场发送的基于声道的音频信号和基于音频对象的音频信号后, 为各接 收会场选择音频信号, 根据接收会场的类型对所选择的音频信号进行处理, 并 将所述经过处理的音频信号分别向各接收会场发送。 与现有技术相比, 多点控 制单元能够对基于声道的音频信号和基于音频对象的音频信号进行混音处理, 从而在同一多点会议中兼容传统的会议终端、 基于多声道的下一代终端以及基 于音频对象的下一代终端, 提高用户的体验质量。 附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实施 例或现有技术描述中所需要使用的附图作简单地介绍, 显而易见地, 下面描述 中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付 出创造性劳动的前提下, 还可以根据这些附图获得其它的附图。
图 1为本发明实施例一提供的实现混音的方法流程图;
图 2、 图 3为本发明实施例二提供的实现混音的方法流程图;
图 4为本发明实施例二提供的将所选择的音频信号转换为与基于声道的接 收会场声道数一致的音频信号的流程图;
图 5为本发明实施例二提供的网真终端的音频呈现方式示意图;
图 6为本发明实施例三提供的实现混音的装置结构示意图;
图 7、 图 8为本发明实施例四提供的实现混音的装置结构示意图;
图 9为本发明实施例五提供的实现混音的系统结构示意图。
具体实施方式
下面将结合本发明实施例中的附图 , 对本发明实施例中的技术方案进行清 楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而不是 全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有做出创造 性劳动前提下所获得的所有其它实施例, 都属于本发明保护的范围。
为使本发明技术方案的优点更加清楚, 下面结合附图和实施例对本发明作 详细说明。
实施例一
本实施例提供一种实现混音的方法,如图 1所示,所述实现混音的方法包括: 101、 接收各发送会场发送的音频信号, 其中, 所述音频信号包括基于声道 的音频信号和基于音频对象的音频信号;
102、 在所述接收的音频信号中为各接收会场选择音频信号;
103、 根据接收会场的类型对所选择的音频信号进行处理, 其中, 所述接收 会场包括基于声道的接收会场和基于音频对象的接收会场;
104、 按照接收会场的类型, 将所述经过处理的音频信号分别向各接收会场 发送。
其中, 同一个会场既可以发送音频信号, 也可以接收音频信号, 也就是说, 发送会场和接收会场可以为同一个会场。 本发明实施例实现混音的方法, 多点控制单元接收到各发送会场发送的基 于声道的音频信号和基于音频对象的音频信号后, 为各接收会场选择音频信号, 根据接收会场的类型对所选择的音频信号进行处理 , 并将所述经过处理的音频 信号分别向各接收会场发送。 与现有技术相比, 多点控制单元能够对基于声道 的音频信号和基于音频对象的音频信号进行混音处理, 从而在同一多点会议中 兼容传统的会议终端、 基于多声道的下一代终端以及基于音频对象的下一代终 端, 提高用户的体验质量。
实施例二
在本实施例中, 多点会议系统中有多个发送会场和接收会场, 所述发送会 场包括基于声道的发送会场和基于音频对象的发送会场, 所述发送会场发出的 音频信号由 MCU进行混音; 其中, 同一个会场既可以发送音频信号, 也可以接收 音频信号, 也就是说, 发送会场和接收会场可以为同一个会场。
如图 2、 图 3所示, 所述实现混音的方法包括:
201、 MCU接收各发送会场发送的音频信号, 其中, 所述音频信号包括基于 声道的音频信号和基于音频对象的音频信号;
其中, 所述音频信号的类型根据发送会场的类型确定, 若发送会场为基于 声道的发送会场, 则该发送会场发出的音频信号为基于声道的音频信号; 若发 送会场为基于音频对象的发送会场, 则该发送会场发出的音频信号为基于音频 对象的音频信号; 所述基于声道的发送会场可以为基于单声道或多声道的发送 会场, 相应地, 所述基于声道的音频信号可以为基于单声道或多声道的音频信 音频对象指的是将音源作为一个对象来看待, 一个音频对象除了包括音频 信号之外, 还包括辅助信息, 所述辅助信息中包括该音频对象的最大绝对能量、 能量比值、 空间信息、 方位信息、 不同播放方式及其对应参数等信息。
202、 MCU在所述接收的音频信号中为各接收会场选择音频信号;
其中, 所述 MCU为各接收会场选择音频信号可以根据各音频信号的能量大小 进行选择, 具体选择过程可以包括:
202a , MCU分别计算基于声道的音频信号的能量和 /或基于音频对象的音频 信号的能量;
( 1 )计算基于声道的音频信号的能量
当所述声道为单声道时, 所述基于声道的音频信号的能量即为该单声道的 能量;
当所述声道为多声道时, 分别计算各个声道的能量, 然后取最大的声道能 量作为该基于声道的音频信号的能量; 或者取各个声道的能量的平均值作为该 基于声道的音频信号的能量。
( 2 )计算基于音频对象的音频信号的能量
所述音频对象的辅助信息中携带最大绝对能量和该音频对象的能量比值, 根据音频对象中最大绝对能量以及该音频对象的能量比值, 来计算该音频对象 的绝对能量。 例如, 最大绝对能量为 Emax , 音频对象 Sl、 S2、 S 3的能量比值分 别是 a l、 a2、 a 3 , 则这三个音频对象的绝对能量分别是 Emax χ a l、 Emax a2 , Emax x a 3。
202b , MCU根据所述基于声道的音频信号的能量和 /或基于音频对象的音频 信号的能量大小, 为^妻收会场选择音频信号能量较大的多个音频信号; 其中, MCU为各接收会场选择的音频信号可以相同, 也可以不同。
例如, 多点会议系统中有 、 B、 C , D、 E五个会场, 所述五个会场既可以接 收音频信号, 也可以发送音频信号, A、 B、 C . D、 E五个会场发出的音频信号对 应为 Al、 Bl、 Cl、 Dl、 El; MCU根据各会场发出的音频信号的能量大小选择出 Bl、 Cl、 Dl三个音频信号, 由于各会场通常不接收自己会场发出的音频信号, 因此, MCU为各会场选择的音频信号如表 1所示:
Figure imgf000009_0001
表 1
进一步地, 为了实现多点会议中两个会场的私聊, 对于需要进行私聊的两 个会场发出的音频信号, 只能由私聊的另外一方来选择, 而不能被其它会场选 择, 因此, MCU为各会场选择的音频信号如表 2所示:
Figure imgf000009_0002
表 2
203、 根据接收会场的类型对所选择的音频信号进行处理, 其中, 所述接收 会场可以为基于声道的接收会场或基于音频对象的接收会场;
( 1 ) 当所述接收会场为基于声道的接收会场时, 如图 2所示, 所述根据接 收会场的类型对所选择的音频信号进行处理包括:
203a , 将所选择的音频信号转换为与接收会场声道数一致的音频信号; 如图 4所示, 所述将所选择的音频信号转换为与接收会场声道数一致的音频 信号具体可以包括:
Ll、 判断所选择的音频信号的类型, 若所选择的音频信号为基于声道的音 频信号, 执行步驟 L2; 若所选择的音频信号为基于音频对象的音频信号, 执行 步骤 L3;
L2、 将所述基于声道的音频信号转换成与所述接收会场声道数一致的信号; 所述基于声道的会场其声道数可以是单或多声道, 对于多声道, 一般是将 麦克风放置于不同空间位置, 然后对各个麦克风釆集的信号各用一个声道进行 编码, 由于声道之间已经包含了空间信息, 因此在接收会场用相同数目的扬声 器进行播放时 , 就可以呈现出原始发送会场的声音空间信息。
例如, 所述接收会场为双声道信号, MCU为该接收会场选择的基于声道的音 频信号为单声道信号和三声道信号, 则 MCU将所述单声道信号同时复制到该接收 会场的左声道和右声道, 即左右声道信号内容相同; 而对于所述三声道信号, MCU将第一声道信号复制到该接收会场的左声道, 将第三声道信号复制到该接收 会场的右声道, 将第二声道信号乘上 0. 707的增益, 然后同时加到该接收会场的 左声道和右声道上 , 这样所述单声道信号和三声道信号都转换成了双声道信号。
L 3、 判断所述音频对象的辅助信息中是否携带与所述接收会场声道数一致 的播放方式, 若所述音频对象的辅助信息中携带与所述接收会场声道数一致的 播放方式, 执行步驟 L4; 若所述音频对象的辅助信息中没有携带与所述接收会 场声道数一致的播放方式, 执行步驟 L5;
其中, 所述音频对象的辅助信息中携带多种播放方式及其对应参数; 所述 播放方式指的是釆用几声道进行播放, 例如: 双声道、 5声道等, 每种播放方式 的参数表示所述音频对象在各个声道的能量分配情况, 该能量分配可以随时间 变化。
L4、 根据所述辅助信息中相应播放方式的参数, 将所述音频对象转换成与 所述接收会场声道数一致的音频信号;
例如, 所述接收会场为双声道, 且所述辅助信息中携带双声道的播放方式, 则 MCU提取双声道的播放方式对应的参数 , 根据该参数将所述音频对象信号分配 到该接收会场的左右声道上。
L5、 根据所述音频对象的方位信息将所述音频对象转换成与接收会场声道 数一致的信号, 其中, 所述方位信息在所述音频对象的辅助信息中携带。
例如, 所述辅助信息中只有双声道的播放方式和 5声道的播放方式, 而所述 接收会场为 6声道, 则 MCU4艮据所述音频对象的方位信息将所述音频对象转换成 6 声道的音频信号。
其中, 所述根据所述音频对象的方位信息将所述音频对象转换成与接收会 场声道数一致的信号具体可以为: 根据音频对象的方位和所述接收会场各个声 道对应的扬声器的位置, 确定与所述音频对象距离最近的扬声器; 将所述基于 音频对象的音频信号复制给与所述音频对象距离最近的扬声器对应的声道, 而 其它声道不赋予任何信号。
203b , 基于所述接收会场的声道, 对所述转换后的音频信号进行混音。 ( 2 ) 当所述接收会场为基于音频对象的接收会场时, 如图 3所示, 所述根 据接收会场的类型对所选择的音频信号进行处理包括:
203c , 根据接收会场的呈现方式, 将所选择的音频信号转换成音频对象; 其中, 所述根据接收会场的呈现方式, 将所选择的音频信号转换成音频对 象具体可以包括:
51、 根据接收会场的呈现方式, 设定所选择的音频信号的呈现方式; 例如, 如图 5所示, 为一种网真终端的音频呈现方式, 图像显示屏幕由三个 屏幕组合而成, 在每个屏幕下方有两个扬声器, 在屏幕两侧各有一个扬声器, 共有 8个扬声器。为了实现图像和声音的匹配, 当屏幕中某个位置有人在发言时, 由该位置下方附近的扬声器发出声音, 从而实现声像匹配。 在多点会议时, 正 在发言的会场可能并不是当前屏幕中显示的会场, 此时该会场的声音可以分配 到屏幕两侧的两个扬声器。 例如, 有三个音频信号 s l、 s2、 s 3被选择, 其中音 频信号 s 1对应的会场的视频信号正在被该接收会场观看 , 则可设定音频信号 s 1 由屏幕下方的 6个扬声器来播放, 音频信号 s2、 s 3对应的会场的视频信号当前并 没有被观看, 则可指定音频信号 s2、 s 3分别由屏幕两侧的两个扬声器来播放。
52、 当所选择的音频信号为基于声道的音频信号时, 根据所述设定的呈现 方式, 将所选择的音频信号转换成音频对象;
53、 当所选择的音频信号为基于音频对象的音频信号时, 根据所述设定的 呈现方式, 对所述音频对象原来的相关参数进行修改, 以满足所设定的呈现方 式的要求。
203d, 将所述转换后的音频对象合并为一个音频对象流。
204、 按照接收会场的类型, 将所述经过处理的音频信号分别向各接收会场 发送。
本发明实施例实现混音的方法, 由多点控制单元接收到各发送会场发送的 基于声道的音频信号和基于音频对象的音频信号后, 为各接收会场选择音频信 号, 根据接收会场的类型对所选择的音频信号进行处理, 并将所述经过处理的 音频信号分别向各接收会场发送。 与现有技术相比, 多点控制单元能够对基于 声道的音频信号和基于音频对象的音频信号进行混音处理, 从而在同一多点会 议中兼容传统的会议终端、 基于多声道的下一代终端以及基于音频对象的下一 代终端, 提高用户的体验质量。
实施例三
本实施例提供一种实现混音的装置, 如图 6所示, 所述实现混音的装置, 包 括:
接收模块 61 , 用于接收各发送会场发送的音频信号, 其中, 所述音频信号 包括基于声道的音频信号和基于音频对象的音频信号;
其中, 所述音频信号的类型根据发送会场的类型确定, 若发送会场为基于 声道的发送会场, 则该发送会场发出的音频信号为基于声道的音频信号; 若发 送会场为基于音频对象的发送会场, 则该发送会场发出的音频信号为基于音频 对象的音频信号; 所述基于声道的发送会场可以为基于单声道或多声道的发送 会场, 相应地, 所述基于声道的音频信号可以为基于单声道或多声道的音频信 号。 音频对象指的是将音源作为一个对象来看待, 一个音频对象除了包括音频 信号之外, 还包括辅助信息, 所述辅助信息中包括该音频对象的最大绝对能量、 能量比值、 空间信息、 方位信息、 不同播放方式及其对应参数等信息。
选择模块 62 , 用于在所述接收的音频信号中为各接收会场选择音频信号; 处理模块 63, 用于根据接收会场的类型对所选择的音频信号进行处理, 其 中, 所述接收会场包括基于声道的接收会场和基于音频对象的接收会场;
发送模块 64 , 用于按照接收会场的类型, 将所述经过处理的音频信号分别 向各接收会场发送。
其中, 同一个会场既可以发送音频信号, 也可以接收音频信号, 也就是说, 发送会场和接收会场可以为同一个会场。 本发明实施例实现混音的装置, 多点控制单元接收到各发送会场发送的基 于声道的音频信号和基于音频对象的音频信号后, 为各接收会场选择音频信号, 根据接收会场的类型对所选择的音频信号进行处理 , 并将所述经过处理的音频 信号分别向各接收会场发送。 与现有技术相比, 多点控制单元能够对基于声道 的音频信号和基于音频对象的音频信号进行混音处理, 从而在同一多点会议中 兼容传统的会议终端、 基于多声道的下一代终端以及基于音频对象的下一代终 端, 提高用户的体验质量。
实施例四
如图 7、 图 8所示, 所述实现混音的装置, 包括:
接收模块 61 , 用于接收各发送会场发送的音频信号, 其中, 所述音频信号 包括基于声道的音频信号和基于音频对象的音频信号;
其中, 所述音频信号的类型根据发送会场的类型确定, 若发送会场为基于 声道的发送会场, 则该发送会场发出的音频信号为基于声道的音频信号; 若发 送会场为基于音频对象的发送会场, 则该发送会场发出的音频信号为基于音频 对象的音频信号; 所述基于声道的发送会场可以为基于单声道或多声道的发送 会场, 相应地, 所述基于声道的音频信号可以为基于单声道或多声道的音频信 号。 音频对象指的是将音源作为一个对象来看待, 一个音频对象除了包括音频 信号之外, 还包括辅助信息, 所述辅助信息中包括该音频对象的最大绝对能量、 能量比值、 空间信息、 方位信息、 不同播放方式及其对应参数等信息。
选择模块 62 , 用于在所述接收的音频信号中为各接收会场选择音频信号; 处理模块 63, 用于根据接收会场的类型对所选择的音频信号进行处理, 其 中 , 所述接收会场包括基于声道的接收会场和基于音频对象的接收会场;
发送模块 64 , 用于按照接收会场的类型, 将所述经过处理的音频信号分别 向各接收会场发送。
其中, 同一个会场既可以发送音频信号, 也可以接收音频信号, 也就是说, 发送会场和接收会场可以为同一个会场。
其中, 所述选择模块 62包括:
计算单元 621 , 用于分别计算基于声道的音频信号的能量和 /或基于音频对 象的音频信号的能量;
选择单元 622 , 用于根据所述基于声道的音频信号的能量和 /或基于音频对 象的音频信号的能量选择音频信号。 所述选择单元 622可以根据所述基于声道的 音频信号的能量和基于音频对象的音频信号的能量大小, 为各接收会场选择音 频信号能量较大的多个音频信号; 所述选择单元 622为各接收会场选择的音频信 号可以相同, 也可以不同。
当所述接收会场为基于声道的接收会场时, 如图 7所示, 所述处理模块 63可 以包括:
第一转换单元 631 , 用于将所选择的音频信号转换为与接收会场声道数一致 的音频信号;
混音单元 632 , 用于基于所述接收会场的声道, 对所述转换后的音频信号进 行混音。
进一步, 所述第一转换单元 631可以包括:
第一判断子单元 6311, 用于判断所选择的音频信号的类型;
第一转换子单元 6312 , 用于当所选择的音频信号为基于声道的音频信号时, 将所述基于声道的音频信号转换成与所述接收会场声道数一致的信号;
第二判断子单元 6313 , 用于当所选择的音频信号为基于音频对象的音频信 号时, 判断所述音频对象的辅助信息中是否携带与所述接收会场声道数一致的 播放方式;
第二转换子单元 6314 , 用于当所述音频对象的辅助信息中携带与所述接收 会场声道数一致的播放方式时, 根据所述辅助信息中相应播放方式的参数, 将 所述音频对象转换成与所述接收会场声道数一致的音频信号;
第三转换子单元 6315 , 用于当所述音频对象的辅助信息中没有携带与所述 接收会场声道数一致的播放方式时, 根据所述音频对象的方位信息将所述音频 对象转换成与接收会场声道数一致的信号, 其中, 所述方位信息在所述音频对 象的辅助信息中携带。
当所述接收会场为基于音频对象的接收会场时, 如图 8所示, 所述处理模块 63可以包括:
第二转换单元 633, 用于根据接收会场的呈现方式, 将所选择的音频信号转 换成音频对象;
合并单元 634 , 用于将所述转换后的音频对象合并为一个音频对象流。
进一步, 所述第二转换单元 633可以包括:
设定子单元 6331 , 用于根据接收会场的呈现方式, 设定所选择的音频信号 的呈现方式;
第四转换子单元 6332, 用于当所选择的音频信号为基于声道的音频信号时, 根据所述设定的呈现方式, 将所选择的音频信号转换成音频对象;
修改子单元 6333, 用于当所选择的音频信号为基于音频对象的音频信号时, 根据所述设定的呈现方式, 对所述音频对象的参数进行修改。
本发明实施例实现混音的装置接收到各发送会场发送的基于声道的音频信 号和基于音频对象的音频信号后, 为各接收会场选择音频信号, 根据接收会场 的类型对所选择的音频信号进行处理, 并将所述经过处理的音频信号分别向各 接收会场发送。 与现有技术相比, 该装置能够对基于声道的音频信号和基于音 频对象的音频信号进行混音处理, 从而在同一多点会议中兼容传统的会议终端、 基于多声道的下一代终端以及基于音频对象的下一代终端, 提高用户的体验质 量。
实施例五
本实施例提供一种实现混音的系统, 如图 9所示, 所述实现混音的系统, 包 括: 多个发送会场 91和接收会场 93、 多点控制单元 92, 其中,
所述发送会场 91 , 用于向多点控制单元 92发送音频信号, 其中, 所述音频 信号包括基于声道的音频信号和基于音频对象的音频信号;
所述多点控制单元 92 , 用于接收所述发送会场 91发送的音频信号, 在所述 接收的音频信号中为各接收会场 93选择音频信号 , 根据接收会场 93的类型对所 选择的音频信号进行处理, 按照接收会场的类型, 将所述经过处理的音频信号 分别向各接收会场 93发送, 其中, 所述接收会场 93包括基于声道的接收会场和 基于音频对象的接收会场;
所述接收会场 93, 用于由所述多点控制单元 92接收所述经过处理的音频信 其中, 同一个会场既可以发送音频信号, 也可以接收音频信号, 也就是说, 发送会场和接收会场可以为同一个会场。
本发明实施例实现混音的系统中, 多点控制单元接收到各发送会场发送的 基于声道的音频信号和基于音频对象的音频信号后, 为各接收会场选择音频信 号, 根据接收会场的类型对所选择的音频信号进行处理, 并将所述经过处理的 音频信号分别向各接收会场发送。 与现有技术相比, 多点控制单元能够对基于 声道的音频信号和基于音频对象的音频信号进行混音处理, 从而在同一多点会 议中兼容传统的会议终端、 基于多声道的下一代终端以及基于音频对象的下一 代终端, 提高用户的体验质量。
本发明实施例提供的实现混音的装置和系统可以实现上述提供的方法实施 例。 本发明实施例提供的实现混音的方法、 装置和系统可以适用于在同一多点 会议中兼容传统的会议终端、 基于多声道的下一代终端以及基于音频对象的下 一代终端, 但不仅限于此。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程, 是可以通过计算机程序来指令相关的硬件来完成, 所述的程序可存储于一计算 机可读取存储介质中, 该程序在执行时, 可包括如上述各方法的实施例的流程。 其中, 所述的存储介质可为磁碟、 光盘、 只读存储记忆体(Read-Only Memory, ROM )或随机存储记忆体(Random Access Memory, RAM )等。
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限于 此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易想到 的变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护范围 应该以权利要求的保护范围为准。

Claims

权 利 要求
1、 一种实现混音的方法, 其特征在于, 包括:
接收各发送会场发送的音频信号, 其中, 所述音频信号包括基于声道的音 频信号和基于音频对象的音频信号;
在所述接收的音频信号中为各接收会场选择音频信号;
根据接收会场的类型对所选择的音频信号进行处理, 其中, 所述接收会场 包括基于声道的接收会场和基于音频对象的接收会场;
按照接收会场的类型 , 将所述经过处理的音频信号分别向各接收会场发送。
2、 根据权利要求 1所述的实现混音的方法, 其特征在于, 所述在所述接收 的音频信号中为各接收会场选择音频信号具体为:
分别计算基于声道的音频信号的能量和 /或基于音频对象的音频信号的能 量;
根据所述基于声道的音频信号的能量和 /或基于音频对象的音频信号的能 量选择音频信号。
3、 根据权利要求 2所述的实现混音的方法, 其特征在于, 所述计算基于声 道的音频信号的能量包括:
当所述声道为单声道时, 将所述基于声道的音频信号的能量作为该单声道 的能量;
当所述声道为多声道时, 分别计算各个声道的能量, 取最大的声道能量作 为该基于声道的音频信号的能量, 或者取各个声道的能量的平均值作为该基于 声道的音频信号的能量。
4、 根据权利要求 2所述的实现混音的方法, 其特征在于, 所述计算基于音 频对象的音频信号的能量包括: 才艮据最大绝对能量和所述音频对象的能量比值, 计算所述音频对象的绝对 能量, 其中, 所述最大绝对能量和所述音频对象的能量比值在所述音频对象的 辅助信息中携带。
5、 根据权利要求 1所述的实现混音的方法, 其特征在于, 当所述接收会场 为基于声道的接收会场时, 所述根据接收会场的类型对所选择的音频信号进行 处理包括:
将所选择的音频信号转换为与接收会场声道数一致的音频信号; 基于所述接收会场的声道, 对所述转换后的音频信号进行混音。
6、 根据权利要求 5所述的实现混音的方法, 其特征在于, 所述将所选择的 音频信号转换为与接收会场声道数一致的音频信号包括:
判断所选择的音频信号的类型;
若所选择的音频信号为基于声道的音频信号 , 则将所述基于声道的音频信 号转换成与所述接收会场声道数一致的信号;
若所选择的音频信号为基于音频对象的音频信号, 则判断所述音频对象的 辅助信息中是否携带与所述接收会场声道数一致的播放方式;
若所述音频对象的辅助信息中携带与所述接收会场声道数一致的播放方 式, 根据所述辅助信息中相应播放方式的参数, 将所述音频对象转换成与所述 接收会场声道数一致的音频信号;
若所述音频对象的辅助信息中没有携带与所述接收会场声道数一致的播放 方式, 则才艮据所述音频对象的方位信息将所述音频对象转换成与接收会场声道 数一致的信号, 其中, 所述方位信息在所述音频对象的辅助信息中携带。
7、 根据权利要求 6所述的实现混音的方法, 其特征在于, 所述根据所述音 频对象的方位信息将所述音频对象转换成与接收会场声道数一致的信号包括: 根据音频对象的方位和所述接收会场各个声道对应的扬声器的位置 , 确定 与所述音频对象距离最近的扬声器;
将所述基于音频对象的音频信号复制给与所述音频对象距离最近的扬声器 对应的声道。
8、 根据权利要求 1所述的实现混音的方法, 其特征在于, 当所述接收会场 为基于音频对象的接收会场时, 所述根据接收会场的类型对所选择的音频信号 进行处理包括:
根据接收会场的呈现方式, 将所选择的音频信号转换成音频对象; 将所述转换后的音频对象合并为一个音频对象流。
9、 根据权利要求 8所述的实现混音的方法, 其特征在于, 所述根据接收会 场的呈现方式, 将所选择的音频信号转换成音频对象包括:
根据接收会场的呈现方式, 设定所选择的音频信号的呈现方式;
当所选择的音频信号为基于声道的音频信号时, ^居所述设定的呈现方式, 将所选择的音频信号转换成音频对象;
当所选择的音频信号为基于音频对象的音频信号时, 根据所述设定的呈现 方式, 对所述音频对象的参数进行修改。
10、 一种实现混音的装置, 具体为多点控制单元, 其特征在于, 所述装置 包括:
接收模块, 用于接收各发送会场发送的音频信号, 其中, 所述音频信号包 括基于声道的音频信号和基于音频对象的音频信号;
选择模块, 用于在所述接收的音频信号中为各接收会场选择音频信号; 处理模块, 用于根据接收会场的类型对所选择的音频信号进行处理, 其中, 所述接收会场包括基于声道的接收会场和基于音频对象的接收会场; 发送模块, 用于按照接收会场的类型, 将所述经过处理的音频信号分别向 各接收会场发送。
11、 根据权利要求 10所述的实现混音的装置, 其特征在于, 所述选择模块 包括:
计算单元, 用于分别计算基于声道的音频信号的能量和基于音频对象的音 频信号的能量;
选择单元, 用于根据所述基于声道的音频信号的能量和基于音频对象的音 频信号的能量选择音频信号。
12、 根据权利要求 10所述的实现混音的装置, 其特征在于, 所述处理模块 包括:
第一转换单元, 用于将所选择的音频信号转换为与接收会场声道数一致的 音频信号;
混音单元, 用于基于所述接收会场的声道, 对所述转换后的音频信号进行 混音。
13、 根据权利要求 12所述的实现混音的装置, 其特征在于, 所述第一转换 单元包括:
第一判断子单元, 用于判断所选择的音频信号的类型;
第一转换子单元, 用于当所选择的音频信号为基于声道的音频信号时, 将 所述基于声道的音频信号转换成与所述接收会场声道数一致的信号;
第二判断子单元, 用于当所选择的音频信号为基于音频对象的音频信号时, 判断所述音频对象的辅助信息中是否携带与所述接收会场声道数一致的播放方 式;
第二转换子单元, 用于当所述音频对象的辅助信息中携带与所述接收会场 声道数一致的播放方式时, 根据所述辅助信息中相应播放方式的参数, 将所述 音频对象转换成与所述接收会场声道数一致的音频信号;
第三转换子单元, 用于当所述音频对象的辅助信息中没有携带与所述接收 会场声道数一致的播放方式时, 根据所述音频对象的方位信息将所述音频对象 转换成与接收会场声道数一致的信号, 其中, 所述方位信息在所述音频对象的 辅助信息中携带。
14、 根据权利要求 10所述的实现混音的装置, 其特征在于, 所述处理模块 包括:
第二转换单元, 用于根据接收会场的呈现方式, 将所选择的音频信号转换 成音频对象;
合并单元, 用于将所述转换后的音频对象合并为一个音频对象流。
15、 根据权利要求 14所述的实现混音的装置, 其特征在于, 所述第二转换 单元包括:
设定子单元, 用于根据接收会场的呈现方式, 设定所选择的音频信号的呈 现方式;
第四转换子单元, 用于当所选择的音频信号为基于声道的音频信号时, 根 据所述设定的呈现方式, 将所选择的音频信号转换成音频对象;
修改子单元, 用于当所选择的音频信号为基于音频对象的音频信号时, 根 据所述设定的呈现方式, 对所述音频对象的参数进行修改。
16、 一种实现混音的系统, 其特征在于, 包括: 多个发送会场和接收会场、 多点控制单元, 其中,
所述发送会场, 用于向多点控制单元发送音频信号, 其中, 所述音频信号 包括基于声道的音频信号和基于音频对象的音频信号; 所述多点控制单元, 用于接收所述发送会场发送的音频信号, 在所述接收 的音频信号中为各接收会场选择音频信号, 根据接收会场的类型对所选择的音 频信号进行处理, 按照接收会场的类型, 将所述经过处理的音频信号分别向各 接收会场发送, 其中, 所述接收会场包括基于声道的接收会场和基于音频对象 的接收会场;
所述接收会场, 用于由所述多点控制单元接收所述经过处理的音频信号。
PCT/CN2010/075891 2009-11-13 2010-08-11 实现混音的方法、装置和系统 WO2011057511A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP10829475.2A EP2490426B1 (en) 2009-11-13 2010-08-11 Method, apparatus and system for implementing audio mixing
US13/469,782 US8773491B2 (en) 2009-11-13 2012-05-11 Method, apparatus, and system for implementing audio mixing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200910207184A CN102065265B (zh) 2009-11-13 2009-11-13 实现混音的方法、装置和系统
CN200910207184.8 2009-11-13

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/469,782 Continuation US8773491B2 (en) 2009-11-13 2012-05-11 Method, apparatus, and system for implementing audio mixing

Publications (1)

Publication Number Publication Date
WO2011057511A1 true WO2011057511A1 (zh) 2011-05-19

Family

ID=43991193

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/075891 WO2011057511A1 (zh) 2009-11-13 2010-08-11 实现混音的方法、装置和系统

Country Status (4)

Country Link
US (1) US8773491B2 (zh)
EP (1) EP2490426B1 (zh)
CN (1) CN102065265B (zh)
WO (1) WO2011057511A1 (zh)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102226944B (zh) * 2011-05-25 2014-11-12 贵阳朗玛信息技术股份有限公司 混音方法及设备
CN103050124B (zh) * 2011-10-13 2016-03-30 华为终端有限公司 混音方法、装置及系统
CN102436818A (zh) * 2011-10-25 2012-05-02 浙江万朋网络技术有限公司 一种基于能量优先的服务器端选路混音方法
US9373335B2 (en) 2012-08-31 2016-06-21 Dolby Laboratories Licensing Corporation Processing audio objects in principal and supplementary encoded audio signals
CN103024339B (zh) * 2012-10-11 2015-09-30 华为技术有限公司 一种基于视频源实现混音的方法和装置
CN103369158B (zh) * 2013-06-18 2016-01-13 华为技术有限公司 多方通话控制方法和相关设备及通信系统
KR101514830B1 (ko) * 2013-10-14 2015-04-23 주식회사 세나테크놀로지 블루투스 헤드셋의 멀티태스킹 시스템
US10079941B2 (en) * 2014-07-07 2018-09-18 Dolby Laboratories Licensing Corporation Audio capture and render device having a visual display and user interface for use for audio conferencing
CN104167210A (zh) * 2014-08-21 2014-11-26 华侨大学 一种轻量级的多方会议混音方法和装置
CN105704423A (zh) * 2014-11-24 2016-06-22 中兴通讯股份有限公司 语音输出方法及装置
CN104539816B (zh) * 2014-12-25 2017-08-01 广州华多网络科技有限公司 一种多方语音通话的智能混音方法及装置
US10325610B2 (en) 2016-03-30 2019-06-18 Microsoft Technology Licensing, Llc Adaptive audio rendering
CN105847096B (zh) * 2016-05-12 2018-10-30 腾讯科技(深圳)有限公司 一种包含音频数据的通信方法、装置及系统
CN108616800B (zh) * 2018-03-28 2021-04-09 腾讯科技(深圳)有限公司 音频的播放方法和装置、存储介质、电子装置
CN113257256A (zh) * 2021-07-14 2021-08-13 广州朗国电子科技股份有限公司 一种语音处理方法、会议一体机、系统及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006103584A1 (en) * 2005-03-30 2006-10-05 Koninklijke Philips Electronics N.V. Multi-channel audio coding
CN1953537A (zh) * 2006-11-23 2007-04-25 北京航空航天大学 多mcu视频会议系统中的混音方法
US20080005246A1 (en) * 2000-03-30 2008-01-03 Microsoft Corporation Multipoint processing unit
CN101179693A (zh) * 2007-09-26 2008-05-14 深圳市丽视视讯科技有限公司 一种会议电视系统的混音处理方法
WO2008120933A1 (en) * 2007-03-30 2008-10-09 Electronics And Telecommunications Research Institute Apparatus and method for coding and decoding multi object audio signal with multi channel
CN101466043A (zh) * 2008-12-30 2009-06-24 深圳华为通信技术有限公司 一种多路音频信号处理的方法、设备及系统

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8482614B2 (en) * 2005-06-14 2013-07-09 Thx Ltd Content presentation optimizer
EP1855455B1 (en) * 2006-05-11 2011-10-05 Global IP Solutions (GIPS) AB Audio mixing
US7647229B2 (en) * 2006-10-18 2010-01-12 Nokia Corporation Time scaling of multi-channel audio signals
US20080159507A1 (en) * 2006-12-27 2008-07-03 Nokia Corporation Distributed teleconference multichannel architecture, system, method, and computer program product
GB0710878D0 (en) * 2007-06-06 2007-07-18 Skype Ltd Method of transmitting data in a communication system
US8391513B2 (en) * 2007-10-16 2013-03-05 Panasonic Corporation Stream synthesizing device, decoding unit and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080005246A1 (en) * 2000-03-30 2008-01-03 Microsoft Corporation Multipoint processing unit
WO2006103584A1 (en) * 2005-03-30 2006-10-05 Koninklijke Philips Electronics N.V. Multi-channel audio coding
CN1953537A (zh) * 2006-11-23 2007-04-25 北京航空航天大学 多mcu视频会议系统中的混音方法
WO2008120933A1 (en) * 2007-03-30 2008-10-09 Electronics And Telecommunications Research Institute Apparatus and method for coding and decoding multi object audio signal with multi channel
CN101179693A (zh) * 2007-09-26 2008-05-14 深圳市丽视视讯科技有限公司 一种会议电视系统的混音处理方法
CN101466043A (zh) * 2008-12-30 2009-06-24 深圳华为通信技术有限公司 一种多路音频信号处理的方法、设备及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2490426A4 *

Also Published As

Publication number Publication date
EP2490426B1 (en) 2014-06-04
CN102065265A (zh) 2011-05-18
EP2490426A1 (en) 2012-08-22
EP2490426A4 (en) 2012-08-22
US20120224023A1 (en) 2012-09-06
CN102065265B (zh) 2012-10-17
US8773491B2 (en) 2014-07-08

Similar Documents

Publication Publication Date Title
WO2011057511A1 (zh) 实现混音的方法、装置和系统
EP2487903B1 (en) Automatic video layouts for multi-stream multi-site telepresence conferencing system
US8860776B2 (en) Conference terminal, conference server, conference system and data processing method
EP1763241B1 (en) Spatially correlated audio in multipoint videoconferencing
JP5534813B2 (ja) 多言語会議を実現するシステム、方法、及び多地点制御装置
US7707247B2 (en) System and method for displaying users in a visual conference between locations
US20110261151A1 (en) Video and audio processing method, multipoint control unit and videoconference system
WO2011026382A1 (zh) 视频会议虚拟会场的呈现方法、设备及系统
US8836753B2 (en) Method, apparatus, and system for processing cascade conference sites in cascade conference
NO318911B1 (no) Distribuert sammensetting av sanntids-media
WO2011140812A1 (zh) 多画面合成方法、系统及媒体处理装置
KR20070103051A (ko) 멀티 포인트 화상회의 시스템 및 해당 미디어 프로세싱방법
WO2011153905A1 (zh) 一种音频信号的混音处理方法及装置
US20090019112A1 (en) Audio and video conferencing using multicasting
WO2014094461A1 (zh) 视频会议中的视音频信息的处理方法、装置及系统
WO2010094219A1 (zh) 一种语音信号的处理、播放方法和装置
WO2011127816A1 (zh) 一种音频信号的混音处理方法、装置及系统
WO2016082577A1 (zh) 视频会议的处理方法及装置
WO2012175025A1 (zh) 远程呈现会议系统、远程呈现会议的录制与回放方法
CN103843323A (zh) 一种多媒体会议实现方法、相关设备及系统
WO2012055291A1 (zh) 音频数据传输方法及系统
US20200329083A1 (en) Video conference transmission method and apparatus, and mcu
WO2010094213A1 (zh) 多路媒体流传输和接收的方法、装置及系统
WO2014026478A1 (zh) 一种视频会议信号处理的方法、视频会议服务器及系统
CN102685443B (zh) 多方视频会议系统及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10829475

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2010829475

Country of ref document: EP