WO2013053336A1 - 混音方法、装置及系统 - Google Patents

混音方法、装置及系统 Download PDF

Info

Publication number
WO2013053336A1
WO2013053336A1 PCT/CN2012/082952 CN2012082952W WO2013053336A1 WO 2013053336 A1 WO2013053336 A1 WO 2013053336A1 CN 2012082952 W CN2012082952 W CN 2012082952W WO 2013053336 A1 WO2013053336 A1 WO 2013053336A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound source
site
source object
sound
mixed
Prior art date
Application number
PCT/CN2012/082952
Other languages
English (en)
French (fr)
Inventor
王东琦
詹五洲
Original Assignee
华为终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为终端有限公司 filed Critical 华为终端有限公司
Publication of WO2013053336A1 publication Critical patent/WO2013053336A1/zh
Priority to US14/225,536 priority Critical patent/US9456273B2/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1822Conducting the conference, e.g. admission, detection, selection or grouping of participants, correlating users to one or more conference sessions, prioritising transmission
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/403Arrangements for multi-party communication, e.g. for conferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/403Arrangements for multi-party communication, e.g. for conferences
    • H04L65/4038Arrangements for multi-party communication, e.g. for conferences with floor control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/70Media network packetisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/764Media network packet handling at the destination 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/765Media network packet handling intermediate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/568Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a mixing method, apparatus, and system. Background technique
  • a typical video conferencing system consists of an MCU (Micro Control Unit) and a terminal device.
  • a site is often configured with one terminal device, and one MCU is connected to multiple sites.
  • the function of the terminal device is to collect and process the sounds and images in the venue, and then transmit the data to the connected MCU device through the network, and the terminal also receives the data of other sites sent by the MCU connected thereto;
  • the function of the MCU is to send the received audio signals from other sites to the terminal device.
  • the MCU in the prior art does not send the audio signals in all other sites to the terminal, but selects some audio signals for mixing according to a certain method and then sends them to the terminal.
  • the MCU receives the audio signals from the various sites, and then selects a predetermined number of sites from all the sites in the order of the volume of each site for mixing. At this time, even if the main source objects are concentrated in one site, the audio stream of other unnecessary sites needs to be mixed, and the unnecessary sites participating in the mixing are too many, resulting in poor sound quality after mixing, and consumption. Unnecessary computing resources.
  • Embodiments of the present invention provide a method, apparatus, and system for mixing, which can ensure sound quality after mixing, and at the same time reduce consumption of computing resources.
  • a mixing method including:
  • a mixing device comprising:
  • An analysis unit configured to receive an audio code stream of each site, and analyze the audio code streams of the respective sites to obtain sound feature values of the sound source objects corresponding to the respective sites;
  • a selecting unit configured to select a predetermined number of sound source objects from the sound source object as a primary sound source object according to a sound source object sound feature value from largest to smallest;
  • a site determining unit configured to determine a site where the primary sound source object is located
  • An audio determining unit configured to determine, according to a relationship between a target site and a site where the primary sound source object is located, an audio code stream that needs to be mixed in the target site;
  • a sending unit configured to mix the determined audio code stream that needs to be mixed in the target site, and send the mixed audio code stream to the target site;
  • the sending unit is further configured to send the determined audio code stream that needs to be mixed in the target site to the target site, and perform mixing at the target site.
  • a mixing system includes: a mixing device and a venue terminal.
  • the sound mixing method, device and system provided by the embodiments of the present invention select the main sound source by using the energy value of each sound source object compared with the prior art, by selecting the volume participating in the mixing by the volume of each site. Objects, and according to the main sound source object to determine the venue to participate in the mix, to avoid unnecessary venues to participate in the mix, to ensure the sound quality after mixing, while reducing the consumption of computing resources.
  • Embodiment 1 is a flowchart of a method according to Embodiment 1 of the present invention.
  • FIG. 3 is a schematic diagram of generating an audio code stream according to Embodiment 2 of the present invention.
  • FIG. 4 is a schematic diagram of a connection between a site and an MCU according to Embodiment 2 of the present invention.
  • Figure 5 is a schematic view showing the composition of a sound mixing device in Embodiment 3 of the present invention.
  • FIG. 6 is a schematic diagram showing the composition of another mixing device in Embodiment 3 of the present invention.
  • FIG. 7 is a schematic diagram showing the composition of another mixing device in Embodiment 3 of the present invention.
  • Embodiment 8 is a schematic diagram showing the composition of another mixing device in Embodiment 3 of the present invention.
  • Figure 9 is a schematic diagram showing the composition of a mixing system in Embodiment 3 of the present invention. detailed description
  • the embodiment of the invention provides a mixing method. As shown in FIG. 1, the method includes:
  • the received audio code stream is an audio code stream that includes sound source object information corresponding to each site, so that the sound source object corresponding to each site can be parsed therefrom.
  • SAOC Space Audio Object Coding
  • the MCU receives not the solid The audio stream corresponding to the fixed independent channel, but the code stream containing the downmix signal and the corresponding spatial side information signal. Decoding the spatial side information in the code stream. If the spatial side information includes parameters such as absolute energy (NPG) and objec t Leve l Difference (OLD), The parameters carried in the stream calculate the energy value of the sound source object contained in the site.
  • NPG absolute energy
  • OLD objec t Leve l Difference
  • the code stream needs to be decoded to obtain an uncompressed audio signal, and then the sound feature value of the audio signal is calculated.
  • the sound feature values are calculated from the audio signal and may be energy, volume, envelope, or other characteristic values of the audio signal.
  • the sound feature value can be obtained by parsing the parameter information corresponding to the sound feature value in the code stream. It can reduce the amount of calculation of the MCU.
  • the relevant parameter of the sound feature value may be a parameter that indirectly reflects the characteristic value such as the energy or volume of the audio signal. For example, if the code stream contains the average energy value of a set of audio signals and the percentage of energy occupied by each audio signal, the energy values of the respective audio signals can be calculated by using the above two related parameter information. The method for directly acquiring the sound feature value needs to complete the calculation of the sound feature value on the terminal device, and the sound feature value information is included in the audio code stream and sent to the MCU.
  • the energy information corresponding to each sound source object can be obtained through the NRG and OLD parameters.
  • the NRG parameter is the largest energy value among all the energy values of the object
  • the sound source object is sorted from large to small according to the sound feature value calculated in step 101, and a predetermined number of sound source objects having a large sound feature value are selected as the main sound source object. 1 03. Determine the site where the primary source object is located.
  • the main sound source object is used to determine the venue to participate in the mixing, instead of selecting the venue for mixing according to the volume of the entire venue in the prior art.
  • the method for determining the audio code stream that needs to be mixed in the target site according to the relationship between the target site and the site where the primary sound source object is located is: determining whether the target site is the primary source object. If the target site is the site where the primary source object is located, the audio stream of the site where the primary source object is located other than the target site is determined as the audio that needs to be mixed in the target site. If the target site is not the site where the primary source object is located, the audio stream of the site where all the primary source objects are located is determined as the audio stream that needs to be mixed in the target site.
  • the determined target site needs to be mixed with the mixed audio stream, and the mixed audio stream is sent to the target site; or the determined target site needs to be mixed.
  • the audio stream is sent to the target venue, and the sound is mixed at the target venue.
  • the audio stream that needs to be mixed in the target site may be mixed on the MCU, and then the mixed code stream is sent to the Target venue.
  • the audio stream that needs to be mixed in the target site may be combined and sent to the target site, and then the terminal device of the target site may mix. Thereby reducing the amount of calculation of the MCU.
  • the sound mixing method provided by the embodiment of the present invention selects a main sound source object by using the energy value of each sound source object compared with the field size of each site by selecting the volume of each site to be compared, and according to
  • the primary source object is used to determine the venue in which to participate in the mix. It can avoid unnecessary venue participation in the mix, guarantee the sound quality after mixing, and reduce the consumption of computing resources.
  • the embodiment of the invention provides a mixing method. As shown in FIG. 2, the method includes:
  • the terminal device of each site Before receiving the audio code stream of each site, the terminal device of each site adopts the following steps: The terminal devices of each site collect audio signals, encode them, and send them to the MCU. For example, as shown in Figure 3, the terminal device captures the sound information through X microphones in the venue, and converts the AD to obtain the X-channel digitized microphone signal. The sound source object is extracted from the digitized microphone signal of the X-channel, and the s-channel microphone signal with a large volume can be selected as the sound source object according to the volume of each microphone signal; the microphone array can also be used, and the beamforming technology can be adopted. , extracting s sound source signals.
  • the extracted sound source object is then encoded, and the encoding method here can use the spatial audio object encoding (Spa t ia l Aud i o Objec t Cod ing , SAOC ). Mixing the audio signals of multiple channels into one or two signals, plus a little space side information, can effectively reduce the bit rate and reduce the transmission cost.
  • SAOC spatial audio object encoding
  • the MCU receives an audio signal that is not a fixed independent channel, but a downmix signal and a corresponding spatial side information signal. After analyzing the downmix signal and the corresponding spatial side information signal, the sound source objects corresponding to the respective venues are obtained.
  • the sound feature value referred to here can be energy, volume, envelope or other feature value.
  • the sound feature value calculation may be completed on the terminal device, and the sound feature value information is included in the spatial side information and directly sent to the MCU.
  • the NRG parameter is the largest energy value among all the energy values of the object
  • the audio stream and the corresponding sound feature values sent by each site received by the MCU-1 in a certain frame are as shown in Table 1.
  • m is the number of the MCU
  • n is the number of the site connected to the MCU
  • i is the i-th of all the sound source objects contained in the site audio stream.
  • the predetermined quantity is a preset value, and can be set according to actual conditions and experience.
  • four sound source objects are selected as an example, and four of the sound source objects in Table 1 are selected from high to low according to the size of the sound feature value as an important sound source object for further mixing. After comparison, the four sound source objects with higher sound feature values are ', ° ⁇ 1 and 2 .
  • the important venues are determined by important sound source objects, instead of selecting the venues by volume in the prior art, the selection of the venues can be more accurate and reliable.
  • the site where the main source object is located can be determined as Tl-1, Tl-2, T4-1, and ⁇ 4-2.
  • step 205 Determine whether the target site is the site where the primary sound source object is located; If the site is the site where the primary source object is located, step 206 is performed. If the site is not the site where the primary source object is located, step 207 is performed.
  • the sound information can be directly transmitted in the target venue, it can be presented in the venue without mixing. Therefore, before the mixing is performed, it is judged whether the target site is the site where the main sound source object is located, which can reduce the amount of calculation and prevent the sound from being repeated.
  • the audio code stream of the site where the primary sound source object is located other than the target site is determined as the audio code stream that needs to be mixed in the target site.
  • the audio stream of the target site is not required to be mixed, so the target site is removed, and the audio stream of the site where the other primary source objects are located is mixed.
  • the target site T1-1 which belongs to one of the sites where the primary source object is located
  • the audio streams of the fields T1-2, T4-1, and ⁇ 4-2 are mixed to serve as the target venue.
  • T1-1's mixing code stream can be.
  • the audio stream of the site where all the primary sound source objects are located needs to be mixed.
  • the target site T1-3 it does not belong to any of the sites where the primary source object is located, so the audio codes of all the major sites Tl-1, Tl-2, T4-1, and ⁇ 4-2 are used.
  • the streams are mixed and used as the mixed code stream of the target venue T1-3.
  • the selected audio stream of the site T1-1 includes, in addition to the main sound source object, an unimportant sound source object and .
  • the sound source object in the mixed audio code stream does not belong to the main sound source object, it is not an important sound source object that should be presented during the mixing process. In order for unimportant source objects to not affect the effect of the mix, these unimportant source objects need to be eliminated.
  • the specific method may be that when the sound source object of the SA0C is mixed and generated as an output code stream, the coefficients corresponding to the unimportant sound source objects are all set to 0 in the rendering matrix.
  • the audio code stream after the mixing corresponding to each target site has been generated in the previous step. Therefore, the mixed audio code stream corresponding to each target site is transmitted to the corresponding target site.
  • the main sound source object is separated from the audio code stream of the corresponding site, and the main sound source object is mixed according to the relationship between the target site and the site where the main sound source object is located. sound.
  • the mixing of the main sound source object according to the relationship between the target venue and the venue where the main sound source object is located may be implemented by the following method:
  • different sound source objects should be selected to participate in the mixing.
  • the target site is the site where the primary sound source object is located. If the target site is the site where the primary sound source object is located, The main sound source objects other than the target venue are mixed; if the target venue is not the venue where the primary sound source is located, all of the primary sound source objects are mixed.
  • the target site T1-1 it belongs to one of the sites where the main source object is located, so the reject source object mixes the other main source objects 2 , 1, and 2 as the target site T1-
  • the mixed code stream of 1 can be.
  • the mixing method provided by the embodiment of the present invention may further be: sending the determined audio code stream that needs to be mixed to the target site in the target site, and the target site is on the terminal device of the target site.
  • the audio stream that needs to be mixed is mixed.
  • the sound mixing method provided by the embodiment of the present invention selects a main sound source object by using the energy value of each sound source object compared with the field size of each site by selecting the volume of each site to be compared, and according to
  • the primary source object is used to determine the venue in which to participate in the mix. It can avoid unnecessary venue participation and mixing, ensuring the sound quality after mixing, and reducing the consumption of computing resources.
  • the method for separating the sound source object is used, and only the important sound source object is allowed to participate in the mixing, which is compared with the method of directly mixing the audio stream of the conference site in the prior art.
  • The influence of the important sound source object on the mixing effect, further improving the sound quality after mixing.
  • the embodiment of the present invention provides a sound mixing device. As shown in FIG. 5, the device includes: an analyzing unit 31, a selecting unit 32, a meeting place determining unit 33, an audio determining unit 34, and a transmitting unit 35.
  • the analyzing unit 31 is configured to receive the audio code streams of the respective sites, and analyze the audio code streams of the respective sites to obtain the sound feature values of the sound source objects corresponding to the respective sites.
  • the MCU receives the audio signal corresponding to the fixed independent channel, but the downmix signal of the Spatial Audio Object Coding (SAOC) and the corresponding spatial side information signal. After the downmix signal and the corresponding spatial side information signal are decoded, the sound source objects corresponding to the respective sites are obtained.
  • SAOC Spatial Audio Object Coding
  • the selecting unit 32 is configured to select a predetermined number of sound source objects as the main sound source objects from the sound source objects in descending order of sound feature values of the sound source objects.
  • the sound feature values referred to herein may be energy, volume, envelope or other sound characteristics.
  • the sound feature value calculation may be completed on the terminal device, and the sound feature information is included in the spatial side information and directly sent to the MCU.
  • the NRG and OLD parameters can be added.
  • the NRG parameter is the largest energy value among all the energy values of the object
  • the site determining unit 33 is configured to determine a site where the primary sound source object is located.
  • the important venues are determined by important sound source objects, instead of selecting the venues by volume in the prior art, the selection of the venues can be more accurate and reliable.
  • the audio determining unit 34 is configured to determine, according to a relationship between the target site and the site where the primary sound source object is located, an audio code stream that needs to be mixed in the target site.
  • the method for determining the audio code stream that needs to be mixed in the target site according to the relationship between the target site and the site where the primary sound source object is located is: determining whether the target site is the primary source object. If the target site is the site where the primary source object is located, the audio code stream of the site where the primary source object is located other than the target site determines the audio code of the target site to be mixed. And if the selected target site is not the site where the primary sound source object is located, the audio code stream of the site where all the primary sound source objects are located determines the audio code stream that needs to be mixed in the target site.
  • the sending unit 35 is configured to mix the determined audio code stream that needs to be mixed in the target site, and send the mixed audio code stream to the target site; or the sending unit 35 is further used to And sending the determined audio code stream that needs to be mixed to the target site, and mixing at the target site.
  • the analyzing unit 31 is further configured to: decode the received audio code stream and calculate a sound feature value of the sound source object.
  • the analyzing unit 31 is further configured to extract a sound feature value of the sound source object from the received audio code stream.
  • the MCU-1 in a certain frame, the received audio stream and the corresponding sound feature values are shown in Table 1.
  • m represents The number of the MCU, where n is the number of the site connected to the MCU, and i is the i-th of all the sound source objects included in the site audio stream.
  • the predetermined quantity is a preset value, and can be set according to actual conditions and experience.
  • four sound source objects are selected as an example, and four of the sound source objects in Table 1 are selected from high to low according to the size of the sound feature value as an important sound source object for further mixing. After comparison, the four sound source objects with higher sound feature values are ', ° ⁇ i and 2 .
  • the audio determining unit 34 includes:
  • the first determining module 341 is configured to determine whether the target site is the site where the primary sound source object is located.
  • the sound information can be directly transmitted in the target venue, it can be presented in the venue without mixing. Therefore, before the mixing is performed, it is determined whether the target site is the site of the main sound source object, which can reduce the amount of calculation and prevent the sound from being repeated.
  • the first determining module 342 is configured to: when the first determining module 341 determines that the target site is the site where the primary sound source object is located, the audio of the site where the primary sound source object is located other than the target site
  • the code stream is determined to be an audio code stream that needs to be mixed at the target venue.
  • the target site is the site where the primary source object is located
  • the audio stream of the site where the source object is located is mixed.
  • the target site T1-1 which belongs to one of the sites where the primary source object is located
  • the audio streams of the fields T1-2, T4-1, and ⁇ 4-2 are mixed as the target site.
  • T1-1's mixing code stream can be.
  • the first mixing module 342 is further configured to: when the first determining module 341 determines that the selected target site is not the site where the primary sound source object is located, the audio of the site where all the primary sound source objects are located.
  • the code stream is determined to be an audio code stream that needs to be mixed at the target venue.
  • the audio code stream of the site where all the primary sound source objects are located needs to be determined as the audio code stream that needs to be mixed in the target site when the target site is not the site where the primary source object is located.
  • the target site T1-3 the site that is not the primary source object is located. Any one of them, so the audio stream of all the main sites Tl-1, Tl-2, T4-1 and ⁇ 4-2 is mixed as the mixed code stream of the target venue T1-3.
  • the sending unit 35 further includes:
  • a second determining module 351 configured to determine, before the mixing the audio code stream to the target site, whether the sound source object in the mixed audio code stream belongs to the primary sound source object .
  • the audio stream of the site where the main source object is located is mixed, other unimportant source objects in these sites are also mixed.
  • the audio stream of the main site T1-1 contains not only the primary source object but also the unimportant source objects and .
  • the eliminating module 352 is configured to: when the determining module 351 determines that the sound source object in the mixed audio code stream does not belong to the primary sound source object, the sound that does not belong to the primary sound source object The audio stream of the source object is eliminated.
  • the sound source object in the mixed audio code stream does not belong to the main sound source object, it is not an important sound source object that should be presented during the mixing process. In order for unimportant source objects to not affect the effect of the mix, these unimportant source objects need to be eliminated.
  • the specific method may be that when the sound source object of the SA0C is mixed and generated as an output code stream, the coefficients corresponding to the unimportant sound source objects are all set to 0 in the rendering matrix.
  • the sending unit 35 further includes:
  • the separation module 353 is configured to separate the primary sound source object from the audio code stream of its corresponding site.
  • the mixing module 354 is configured to mix the main sound source objects according to a relationship between the target venue and the venue where the primary sound source object is located.
  • the second determining module 355 is configured to determine whether the target site is the site where the primary sound source object is located.
  • the sound mixing module 354 is further configured to: when the second determining module 355 determines that the target site is the site where the primary sound source object is located, mix the main sound source objects except the target site sound. For example, in Figure 4, for the target site T1-1, it belongs to one of the sites where the main source object is located, so the reject source object mixes the other main source objects 2 , 1, and 2 as the target site T1-
  • the mixed code stream of 1 can be.
  • the mixing module 354 is further configured to mix all the primary sound source objects when the second determining module 355 determines that the target site is not the site where the primary sound source object is located.
  • the embodiment of the invention further provides a mixing system, as shown in FIG. 9, comprising: a mixing device 41 and a venue terminal 42.
  • the site terminal 42 is configured to collect audio signals in the respective sites, and after the encoded audio signals are encoded and compressed, are sent to the mixing device 41 in an audio stream.
  • the site terminal 42 is further configured to receive an audio code stream that needs to be mixed in the target site sent by the sound mixing device 41, and mix the received audio code stream in the target site.
  • the sound mixing device and the system provided by the embodiments of the present invention select the main sound source object by using the energy value of each sound source object compared with the prior art, by selecting the volume participating in the mixing by the volume of each site. And determine the venue to participate in the mix based on the primary source object. It can avoid unnecessary venue participation in the mix, ensuring the sound quality after mixing, and reducing the consumption of computing resources.
  • the method for separating the sound source object is used, and only the important sound source object is allowed to participate in the mixing, which is compared with the method of directly mixing the audio stream of the conference site in the prior art. ⁇ The influence of the important sound source object on the mixing effect, further improving the sound quality after mixing.
  • the sound mixing code stream is analyzed and selected on the sound mixing device, and the audio code stream that needs to participate in the mixing is transmitted to the terminal of the target venue, and then mixed by the terminal of the venue.
  • the sound processing can effectively save the calculation amount of the mixing device and improve the mixing efficiency.
  • the present invention can be implemented by means of software plus necessary general hardware, and of course, can also be through hardware, but in many cases the former is a better implementation. .
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer.
  • a hard disk or optical disk or the like includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

Abstract

本发明实施例公开了一种混音方法、装置及系统,涉及通信技术领域,能够保证混音后的音质,并降低计算资源的消耗。本发明的方法包括:接收各个会场的音频码流,并分别对各个会场的音频码流进行分析,得到声源对象的声音特征值;按照声源对象声音特征值从大到小的顺序,从声源对象中选取预定数量的声源对象作为主要声源对象;确定主要声源对象所在会场;根据目标会场和主要声源对象所在会场之间的关系,确定目标会场需要进行混音的音频码流;将目标会场需要进行混音的音频码流进行混音,并将混音后的音频码流发送给目标会场;或者将目标会场需要进行混音的音频码流发送给目标会场,并在目标会场进行混音。本发明实施例主要用于混音的过程中。

Description

混音方法、 装置及系统 技术领域
本发明涉及通信技术领域, 尤其涉及一种混音方法、 装置及系统。 背景技术
随着通信技术的发展,人们已经可以通过视频会议系统实现与异地会场中 的与会者进行实时通信。 典型的视频会议系统由 MCU ( Micro control uni t , 多点控制单元)及终端设备组成, 往往一个会场配置一个终端设备, 一个 MCU 与多个会场相连。 该终端设备的作用是釆集会场内的声音和影像并进行处理 后, 通过网络传输给与之相连的 MCU设备, 同时该终端也接收由与之相连的 MCU发送过来的其它会场的数据; 该 MCU的作用是将接收到的其他会场中的音 频信号发送给终端设备。 然而由于设备成本和带宽的限制, 现有技术中 MCU 并不是将所有其它会场中的音频信号都发送给终端,而是根据一定的方法选择 某些音频信号进行混音后再发送给终端。
现有技术的方法为, MCU接收来自各个会场的音频信号, 然后按照各个会 场的音量大小顺序从所有会场中选取预定数量的会场进行混音。 此时, 即使主 要的声源对象都集中在一个会场中, 仍需要混入其他不必要的会场的音频码 流, 参与混音的不必要的会场太多, 导致混音后音质变差, 而且消耗了不必要 的计算资源。
发明内容
本发明的实施例提供一种混音方法、装置及系统,能够保证混音后的音质, 同时降低计算资源的消耗。
为达到上述目的, 本发明的实施例釆用如下技术方案:
一种混音方法, 包括:
接收各个会场的音频码流, 并分别对所述各个会场的音频码流进行分析, 得到所述各个会场对应的声源对象的声音特征值; 按照声源对象声音特征值从大到小的顺序,从所述声源对象中选取预定数 量的声源对象作为主要声源对象;
确定所述主要声源对象所在会场;
根据目标会场和所述主要声源对象所在会场之间的关系 ,确定所述目标会 场需要进行混音的音频码流;
将确定的所述目标会场需要进行混音的音频码流进行混音,并将混音后的 音频码流发送给所述目标会场; 或者
将确定的所述目标会场需要进行混音的音频码流发送给所述目标会场,并 在所述目标会场进行混音。
一种混音装置, 包括:
分析单元, 用于接收各个会场的音频码流, 并分别对所述各个会场的音频 码流进行分析, 得到所述各个会场对应的声源对象的声音特征值;
选取单元, 用于按照声源对象声音特征值从大到小的顺序,从所述声源对 象中选取预定数量的声源对象作为主要声源对象;
会场确定单元, 用于确定所述主要声源对象所在会场;
音频确定单元,用于根据目标会场和所述主要声源对象所在会场之间的关 系, 确定所述目标会场需要进行混音的音频码流;
发送单元, 用于将确定的所述目标会场需要进行混音的音频码流进行混 音, 并将混音后的音频码流发送给所述目标会场; 或者
所述发送单元还用于,将确定的所述目标会场需要进行混音的音频码流发 送给所述目标会场, 并在所述目标会场进行混音。
一种混音系统, 包括: 混音装置和会场终端。
本发明实施例提供的混音方法、装置及系统, 与现有技术中通过各个会场 的音量大小来选取参与混音的会场相比,通过各个声源对象的能量值大小来选 取主要的声源对象, 并根据主要声源对象来确定参与混音的会场,避免不必要 的会场参与混音, 保证了混音后的音质, 同时降低了计算资源的消耗。 附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施 例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地, 下面描述 中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲, 在不付 出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。
图 1为本发明实施例 1的方法流程图;
图 2为本发明实施例 2的方法流程图;
图 3为本发明实施例 2中的一种音频码流生成示意图;
图 4为本发明实施例 2中的一种会场和 MCU连接示意图;
图 5为本发明实施例 3中的一种混音装置组成示意图;
图 6为本发明实施例 3中的另一种混音装置组成示意图;
图 7为本发明实施例 3中的另一种混音装置组成示意图;
图 8为本发明实施例 3中的另一种混音装置组成示意图;
图 9为本发明实施例 3中的一种混音系统组成示意图。 具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清 楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而不是 全部的实施例。基于本发明中的实施例, 本领域普通技术人员在没有作出创造 性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。
实施例 1
本发明实施例提供一种混音方法, 如图 1所示, 该方法包括:
101、 接收各个会场的音频码流, 并分别对所述各个会场的音频码流进行 分析, 得到所述各个会场对应的声源对象的声音特征值。
其中,所述接收到的音频码流为包含各个会场对应的声源对象信息的音频 码流, 所以可以从中解析得到各个会场对应的声源对象。 例如, 釆用空间音频 对象编码 ( Spa t ia l Audio Object Coding , SAOC ) 时, MCU接收到的并非固 定的独立声道对应的音频码流,而是包含了下混信号和相应的空间边信息信号 的码流。对码流中的空间边信息进行解码,如果在空间边信息中包含了对象绝 对能量 ( abso lute object energy , NRG ) 和对象声级差 ( Objec t Leve l Difference , OLD )等参数, 即可根据码流中携带的参数计算得到会场中包含 的声源对象的能量值。
其中,如果码流中仅包含了音频信号, 而不包含声音特征值相关的参数信 息, 则需要对码流进行解码, 得到无压缩的音频信号, 然后计算音频信号的声 音特征值。 所述声音特征值从音频信号中计算得到, 可以是能量、 音量、 包络 或音频信号的其他特征值。
另夕卜,如果在码流中除了包含音频信号之外,还包含了声音特征值相关的 参数信息 ,则通过解析在码流中与声音特征值对应的参数信息即可得到声音特 征值, 这可以减少 MCU的计算量。 声音特征值的相关参数可以为间接体现音频 信号的能量或音量等特征值的参数。例如: 码流中包含一组音频信号的平均能 量值以及各个音频信号所占的能量百分比,则可以通过上述两个相关参数信息 计算得到各个音频信号的能量值。这种直接获取声音特征值的方法需要在终端 设备上完成声音特征值的计算,并将声音特征值信息包含在音频码流中发送给 MCU。 例如, 在釆用 SA0C技术进行空间对象编码时, 通过 NRG和 OLD参数即可 获得各个声源对象对应的能量信息。 NRG参数为所有对象的能量值中的最大的 能量值, OLD参数则为各个对象的能量值与 NRG的比值, 所以第 i个声源对象 的能量值为: ENG n = OLDm' n xNRGm n, l≤i≤ Pm n。 当 NRG和 OLD都被计算和编码, 并传送给 MCU时,在 MCU上提取各个对象对应的 NRG和 OLD即可获得各个对象 对应的能量值。
102、 按照所述声源对象的声音特征值从大到小的顺序, 从所述声源对象 中选取预定数量的声源对象作为主要声源对象。
其中,根据步骤 101中计算得到的声音特征值,将声源对象进行从大到小 的排序, 并选择预定数量的声音特征值较大的声源对象, 作为主要声源对象。 1 03、 确定所述主要声源对象所在会场。
其中,通过主要声源对象确定参与混音的会场, 而不是像现有技术中以整 个会场的音量大小来选取会场进行混音。
1 04、 根据目标会场和所述主要声源对象所在会场之间的关系, 确定所述 目标会场需要进行混音的音频码流。将所述主要声源对象所在会场的音频码流 进行混音, 并将混音后的音频码流发送给目标会场。
其中,根据目标会场和所述主要声源对象所在会场之间的关系, 确定所述 目标会场需要进行混音的音频码流的方法为:判断所述目标会场是否为所述主 要声源对象所在会场; 若所述目标会场为所述主要声源对象所在会场, 则将除 所述目标会场以外的所述主要声源对象所在会场的音频码流确定为所述目标 会场需要进行混音的音频码流;若所述目标会场不为所述主要声源对象所在会 场,则将所有所述主要声源对象所在会场的音频码流均确定为所述目标会场需 要进行混音的音频码流。
1 05、 将确定的所述目标会场需要进行混音的音频码流进行混音, 并将混 音后的音频码流发送给所述目标会场;或者将确定的所述目标会场需要进行混 音的音频码流发送给所述目标会场, 并在所述目标会场进行混音。
其中,在确定了目标会场需要进行混音的音频码流之后, 可以在 MCU上将 所述目标会场需要进行混音的音频码流进行混音,然后将混音后的码流发送给 所述目标会场。
另外,在确定了目标会场需要进行混音的音频码流之后, 也可以将所述目 标会场需要进行混音的音频码流组合后发送给目标会场,然后由目标会场的终 端设备进行混音, 从而减轻 MCU的计算量。
本发明实施例提供的混音方法,与现有技术中通过各个会场的音量大小来 选取参与混音的会场相比,通过各个声源对象的能量值大小来选取主要的声源 对象, 并根据主要声源对象来确定参与混音的会场。 能够避免不必要的会场参 与混音, 保证了混音后的音质, 同时降低了计算资源的消耗。 实施例 2
本发明实施例提供一种混音方法, 如图 2所示, 该方法包括:
201、 接收各个会场的音频码流, 并分别对所述各个会场的音频码流进行 分析, 得到所述各个会场对应的声源对象。
其中, 在接收各个会场的音频码流之前, 各个会场的终端设备的做法为: 各个会场的终端设备釆集音频信号, 并进行编码后发送给 MCU。 例如, 如图 3 所示, 终端设备在会场中通过 X个麦克风捕捉到声音信息, 并经 AD转化后得 到 X路数字化的麦克风信号。对该 X路数字化的麦克风信号进行声源对象提取, 可以根据各路麦克风信号的音量,选取其中音量较大的 s路麦克风信号作为声 源对象; 也可以釆用麦克风阵列, 釆用波束成形技术, 提取出 s个声源信号。 然后将提取出来的声源对象进行编码,这里的编码方式可以釆用空间音频对象 编码 ( Spa t ia l Aud i o Objec t Cod ing , SAOC )。 将多个声道的音频信号下混生 成 1路或者 2路信号, 加上少许的空间边信息, 可以有效降低码率, 降低传输 成本。
其中, MCU接收到的并非固定的独立声道对应的音频信号, 而是下混信号 和相应的空间边信息信号。 对下混信号和相应的空间边信息信号进行分析后, 得到所述各个会场对应的声源对象。
202、 对接收到的音频码流进行解码并计算所述声源对象的声音特征值。 其中, 为了从来自各个会场的声源对象中选取主要声源对象, 需要计算所 有声源对象的声音特征值, 以便下一步根据声音特征值大小进行比较和选取。 这里说的声音特征值可以是能量、 音量、 包络或其他特征值。
另外, 为了减少 MCU的计算量, 可以在终端设备上完成声音特征值计算, 并将声音特征值信息包含在空间边信息中直接发送给 MCU。例如,在釆用 SA0C 技术进行空间对象编码时, 增加 NRG和 OLD参数即可。 NRG参数为所有对象的 能量值中的最大的能量值, OLD参数则为各个对象的能量值与 NRG的比值, 所 以第 i个声源对象的能量值为: ENG n = OLD n xNRG \≤ i≤ Pm n。 当 NRG和 OLD 都被计算和编码, 并传送给 MCU时, 在 MCU上提取各个对象对应的 NRG和 OLD 即可获得各个对象对应的能量值。
203、 按照声源对象声音特征值从大到小的顺序, 从所述声源对象中选取 预定数量的声源对象作为主要声源对象。
其中, 以 MCU级联的场景为例, 如图 4所示, MCU-1在某一帧中, 接收到 的各个会场发送过来的音频码流和对应的声音特征值如表 1所示。其中 m代表 MCU的编号, n代表与 MCU连接的会场的编号, i代表一个会场音频码流中包 含的所有声源对象中的第 i个。
Figure imgf000008_0001
表 1
其中, 所述预定数量为预先设定的数值, 可以根据实际情况和经验设定。 本实施例以选取四个声源对象为例,从表 1中的所有声源对象中按照声音特征 值的大小从高到低选取其中四个, 作为重要的声源对象, 以便进一步混音。 经 过比较, 声音特征值较高的四个声源对象为 '、 °^ 1和 2
204、 确定所述主要声源对象所在会场。
其中,通过重要的声源对象确定重要的会场, 而不是像现有技术中以音量 大小来选取会场, 可以使会场的选取更准确和可靠。仍以表 1中主要声源对象 为例, 所述主要声源对象所在会场可确定为 Tl-1、 Tl-2、 T4-1和 Τ4-2。
205、 判断所述目标会场是否为所述主要声源对象所在会场; 若所述目标 会场为所述主要声源对象所在会场, 则执行步骤 206 ; 若所述选目标会场不为 所述主要声源对象所在会场, 则执行步骤 207。
其中, 由于在目标会场中, 声音信息是可以直接传播的, 不需要混音便可 以呈现在会场中。 所以在进行混音之前,先进行判断目标会场是否是主要声源 对象所在会场, 既可以减少计算量, 也可以防止声音重复。
206、 将除所述目标会场以外的所述主要声源对象所在会场的音频码流确 定为所述目标会场需要进行混音的音频码流。
其中, 当所述目标会场为所述主要声源对象所在会场时, 无需将目标会场 的音频码流进行混音, 所以剔除目标会场,将其他主要声源对象所在会场的音 频码流进行混音。 例如, 在图 4 中, 对于目标会场 T1-1 , 属于主要声源对象 所在会场中的一个, 所以将会场 Tl-2、 T4-1和 Τ4-2的音频码流进行混音, 作 为目标会场 T1-1的混音码流即可。
207、 将所有所述主要声源对象所在会场的音频码流确定为所述目标会场 需要进行混音的音频码流。
其中, 当所述选目标会场不为所述主要声源对象所在会场时, 需要将所有 主要声源对象所在会场的音频码流进行混音。 例如, 在图 4中, 对于目标会场 T1-3 , 不属于主要声源对象所在会场中的任何一个, 所以将所有主要会场 Tl-1、 Tl-2、 T4-1和 Τ4-2的音频码流都进行混音, 作为目标会场 T1-3的混 音码流。
208、 将确定的所述目标会场需要进行混音的音频码流进行混音。
209、 确定所述混音后的音频码流中的声源对象是否属于所述主要声源对 象。
其中,在将主要声源对象所在会场的音频码流进行混音时, 同时也混入了 这些会场中的其他不重要的声源对象。 例如, 在图 4中, 选取的会场 T1-1的 音频码流中除了包含主要声源对象 , 还包含不重要的声源对象 和 。
210、 若所述混音后的音频码流中的声源对象不属于所述主要声源对象, 则将所述不属于所述主要声源对象的声源对象的音频码流消除。
其中, 若所述混音后的音频码流中的声源对象不属于所述主要声源对象 , 则不是混音过程中应呈现的重要声源对象。为了让不重要的声源对象不影响混 音的效果, 需要消除掉这些不重要的声源对象。具体的方法可以是,在将 SA0C 的声源对象混音并生成为输出码流时,在渲染矩阵中将不重要的声源对象对应 的系数均置为 0。
211、 将混音后的音频码流发送给目标会场。
其中,通过前面的步骤已经生成了与各个目标会场对应的混音后的音频码 流, 所以将各个目标会场对应的混音后的音频码流发送给对应的目标会场即 可。
另外, 需要说明的是, 在消除不必要的声源对象时, 除了上述步骤 209 和 210中描述的方法外, 还可以釆用以下方法, 该方法具体包括:
在混音之前, 将所述主要声源对象从其对应会场的音频码流中分离出来, 根据目标会场和所述主要声源对象所在会场之间的关系 ,将所述主要声源对象 进行混音。
其中,根据目标会场和所述主要声源对象所在会场之间的关系, 将所述主 要声源对象进行混音具体可以釆用以下的方法实现:
针对不同的目标会场,应选取不同的声源对象参与混音, 先判断所述目标 会场是否为所述主要声源对象所在会场,若所述目标会场为所述主要声源对象 所在会场, 则将除所述目标会场以外的所述主要声源对象进行混音; 若所述目 标会场不为所述主要声源对象所在会场, 则将所有所述主要声源对象进行混 音。 例如, 在图 4 中, 对于目标会场 T1-1 , 属于主要声源对象所在会场中的 一个, 所以剔除声源对象 将其他主要声源对象 212进行混音, 作为目标会场 T1-1的混音码流即可。 而对于目标会场 T1-3 , 不属于主要声源 对象所在会场中的任何一个, 所以将所有主要声源对象 °^ i和 2都进 行混音, 作为目标会场 T1-3的混音码流。 另外, 本发明实施例提供的混音方法还可以为: 将确定的所述目标会场需 要进行混音的音频码流发送给所述目标会场,并在目标会场的终端设备上将所 述目标会场需要进行混音的音频码流进行混音。
本发明实施例提供的混音方法,与现有技术中通过各个会场的音量大小来 选取参与混音的会场相比,通过各个声源对象的能量值大小来选取主要的声源 对象, 并根据主要声源对象来确定参与混音的会场。 能够避免不必要的会场参 与混音, 保证了混音后的音质, 同时降低了计算资源的消耗。
并且, 本发明实施例中釆用分离声源对象的方法, 只允许重要的声源对象 参与混音, 与现有技术中直接釆用会场的音频码流进行混音的方法相比,去除 了 ^ ^重要声源对象对混音效果的影响, 进一步提高混音后的音质。
实施例 3
本发明实施例提供一种混音装置, 如图 5 所示, 该装置包括: 分析单元 31、 选取单元 32、 会场确定单元 33、 音频确定单元 34、 发送单元 35。
分析单元 31 , 用于接收各个会场的音频码流, 并分别对所述各个会场的 音频码流进行分析, 得到所述各个会场对应的声源对象的声音特征值。
其中, MCU接收到的并非固定的独立声道对应的音频信号, 而是空间音频 对象编码 ( Spa t ia l Audio Object Coding , SAOC ) 的下混信号和相应的空间 边信息信号。将下混信号和相应的空间边信息信号解码后,得到所述各个会场 对应的声源对象。
选取单元 32 , 用于按照所述声源对象的声音特征值从大到小的顺序, 从 所述声源对象中选取预定数量的声源对象作为主要声源对象。
其中, 为了从来自各个会场的声源对象中选取重要的声源对象, 需要计算 所有声源对象的声音特征值, 以便下一步根据声音特征值大 d、进行比较和选 取。 这里说的声音特征值可以是能量、 音量、 包络或其他声音特征。
另外, 为了减少 MCU的计算量, 可以在终端设备上完成声音特征值计算, 并将声音特征信息包含在空间边信息中直接发送给 MCU。 例如, 在釆用 SA0C 技术进行空间对象编码时, 增加 NRG和 OLD参数即可。 NRG参数为所有对象的 能量值中的最大的能量值, OLD参数则为各个对象的能量值与 NRG的比值, 所 以第 i个声源对象的能量值为: ENGm n = OLD: n xNRGm n, \≤ i≤ Pm n。 当 NRG和 OLD 都被计算和编码, 并传送给 MCU时, 在 MCU上提取各个对象对应的 NRG和 OLD 即可获得各个对象对应的能量值。
会场确定单元 33 , 用于确定所述主要声源对象所在会场。
其中,通过重要的声源对象确定重要的会场, 而不是像现有技术中以音量 大小来选取会场, 可以使会场的选取更准确和可靠。
音频确定单元 34 , 用于根据目标会场和所述主要声源对象所在会场之间 的关系, 确定所述目标会场需要进行混音的音频码流。
其中,根据目标会场和所述主要声源对象所在会场之间的关系, 确定所述 目标会场需要进行混音的音频码流的方法为:判断所述目标会场是否为所述主 要声源对象所在会场; 若所述目标会场为所述主要声源对象所在会场, 则将除 所述目标会场以外的所述主要声源对象所在会场的音频码流确定所述目标会 场需要进行混音的音频码流;若所述选目标会场不为所述主要声源对象所在会 场,则将所有所述主要声源对象所在会场的音频码流确定所述目标会场需要进 行混音的音频码流。
发送单元 35 , 用于将确定的所述目标会场需要进行混音的音频码流进行 混音, 并将混音后的音频码流发送给所述目标会场; 或者所述发送单元 35还 用于, 将确定的所述目标会场需要进行混音的音频码流发送给所述目标会场, 并在所述目标会场进行混音。
进一步的, 所述分析单元 31还用于, 对接收到的音频码流进行解码并计 算所述声源对象的声音特征值。 所述分析单元 31还用于, 从接收到的所述音 频码流中提取所述声源对象的声音特征值。
其中, 以 MCU级联的场景为例, 如图 4所示, MCU-1在某一帧中, 接收到 的各个会场发送过来的音频码流和对应的声音特征值如表 1所示。其中 m代表 MCU的编号, n代表与 MCU连接的会场的编号, i代表一个会场音频码流中包 含的所有声源对象中的第 i个。
其中, 所述预定数量为预先设定的数值, 可以根据实际情况和经验设定。 本实施例以选取四个声源对象为例,从表 1中的所有声源对象中按照声音特征 值的大小从高到低选取其中四个, 作为重要的声源对象, 以便进一步混音。 经 过比较, 声音特征值较高的四个声源对象为 '、 °^ i和 2
进一步的, 如图 6所示, 所述音频确定单元 34包括:
第一判断模块 341 , 用于判断所述目标会场是否为所述主要声源对象所在 会场。
其中, 由于在目标会场中, 声音信息是可以直接传播的, 不需要混音便可 以呈现在会场中。 所以在进行混音之前,先进行判断目标会场是否是主要声源 对象所在会场, 既可以减少计算量, 也可以防止声音重复
第一确定模块 342 , 用于在所述第一判断模块 341判断所述目标会场为所 述主要声源对象所在会场时,将除所述目标会场以外的所述主要声源对象所在 会场的音频码流确定为所述目标会场需要进行混音的音频码流。
其中, 当所述目标会场为所述主要声源对象所在会场时, 无需将目标会场 的音频码流确定为所述目标会场需要进行混音的音频码流, 所以剔除目标会 场, 将其他主要声源对象所在会场的音频码流进行混音。 例如, 在图 4中, 对 于目标会场 T1-1 , 属于主要声源对象所在会场中的一个, 所以将会场 Tl-2、 T4-1和 Τ4-2的音频码流进行混音, 作为目标会场 T1-1的混音码流即可。
所述第一混音模块 342还用于,在所述第一判断模块 341判断所述选目标 会场不为所述主要声源对象所在会场时,将所有所述主要声源对象所在会场的 音频码流确定为所述目标会场需要进行混音的音频码流。
其中, 当所述选目标会场不为所述主要声源对象所在会场时, 需要将所有 主要声源对象所在会场的音频码流确定为所述目标会场需要进行混音的音频 码流。 例如, 在图 4 中, 对于目标会场 T1-3 , 不属于主要声源对象所在会场 中的任何一个, 所以将所有主要会场 Tl-1、 Tl-2、 T4-1和 Τ4-2的音频码流都 进行混音, 作为目标会场 T1-3的混音码流。
进一步的, 如图 7所示, 所述发送单元 35还包括:
第二确定模块 351 , 用于在所述将混音后的音频码流发送给所述目标会场 之前, 确定所述混音后的音频码流中的声源对象是否属于所述主要声源对象。
其中,在将主要声源对象所在会场的音频码流进行混音时, 同时也混入了 这些会场中的其他不重要的声源对象。 例如, 在图 4中, 主要会场 T1-1的音 频码流中除了包含主要声源对象 , 还包含不重要的声源对象 和 。
消除模块 352 , 用于在所述确定模块 351确定所述混音后的音频码流中的 声源对象不属于所述主要声源对象时,将所述不属于所述主要声源对象的声源 对象的音频码流消除。
其中, 若所述混音后的音频码流中的声源对象不属于所述主要声源对象 , 则不是混音过程中应呈现的重要声源对象。为了让不重要的声源对象不影响混 音的效果, 需要消除掉这些不重要的声源对象。具体的方法可以是,在将 SA0C 的声源对象混音并生成为输出码流时,在渲染矩阵中将不重要的声源对象对应 的系数均置为 0。
进一步的, 如图 8所示, 所述发送单元 35还包括:
分离模块 353 , 用于将所述主要声源对象从其对应会场的音频码流中分离 出来。
混音模块 354 , 用于根据目标会场和所述主要声源对象所在会场之间的关 系, 将所述主要声源对象进行混音。
第二判断模块 355 , 用于判断所述目标会场是否为所述主要声源对象所在 会场。
所述混音模块 354还用于,在所述第二判断模块 355判断所述目标会场为 所述主要声源对象所在会场时,将除所述目标会场以外的所述主要声源对象进 行混音。 例如, 在图 4 中, 对于目标会场 T1-1 , 属于主要声源对象所在会场中的 一个, 所以剔除声源对象 将其他主要声源对象 212进行混音, 作为目标会场 T1-1的混音码流即可。
所述混音模块 354还用于,在所述第二判断模块 355判断所述目标会场不 为所述主要声源对象所在会场, 则将所有所述主要声源对象进行混音。
例如, 在图 4 中, 对于目标会场 T1-3 , 不属于主要声源对象所在会场中 的任何一个, 所以将所有主要声源对象 °^ ^和 2都进行混音, 作为目 标会场 T1-3的混音码流。
本发明实施例还提供一种混音系统, 如图 9包括: 混音装置 41和会场终 端 42。
所述会场终端 42 , 用于在所述各个会场中釆集音频信号, 并将釆集到的 音频信号经编码和压缩处理后, 以音频码流的方式发送给所述混音装置 41。
所述会场终端 42还用于,接收所述混音装置 41发送的所述目标会场需要 进行混音的音频码流, 在所述目标会场将接收到的音频码流进行混音。
需要说明的是本发明实施例所述混音系统所包含功能模块的其他描述,可 以参考实施例 1和实施例 1以及实施例 3对应装置部分的描述,本发明实施例 此处将不再赘述。
本发明实施例提供的混音装置和系统 ,与现有技术中通过各个会场的音量 大小来选取参与混音的会场相比,通过各个声源对象的能量值大小来选取主要 的声源对象, 并根据主要声源对象来确定参与混音的会场。 能够避免不必要的 会场参与混音, 保证了混音后的音质, 同时降低了计算资源的消耗。
并且, 本发明实施例中釆用分离声源对象的方法, 只允许重要的声源对象 参与混音, 与现有技术中直接釆用会场的音频码流进行混音的方法相比,去除 了 ^重要声源对象对混音效果的影响, 进一步提高混音后的音质。
并且,在本发明的混音系统中,在混音装置上进行混音码流的解析和选择, 并将需要参与混音的音频码流发送给目标会场的终端,再由会场的终端进行混 音处理, 能够有效节省混音装置的计算量, 提高混音效率。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本发 明可借助软件加必需的通用硬件的方式来实现, 当然也可以通过硬件,但很多 情况下前者是更佳的实施方式。基于这样的理解, 本发明的技术方案本质上或 者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软 件产品存储在可读取的存储介质中, 如计算机的软盘, 硬盘或光盘等, 包括若 干指令用以使得一台计算机设备(可以是个人计算机, 服务器, 或者网络设备 等)执行本发明各个实施例所述的方法。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于 此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易想到 变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护范围应 以所述权利要求的保护范围为准。

Claims

1、 一种混音方法, 其特征在于, 包括:
接收各个会场的音频码流, 并分别对所述各个会场的音频码流进行分析, 得到所述各个会场对应的声源对象的声音特征值;
按照所述声源对象的声音特征值从大到小的顺序,从所述声源对象中选取 预定数量的声源对象作为主要声源对象;
确定所述主要声源对象所在会场;
根据目标会场和所述主要声源对象所在会场之间的关系 ,确定所述目标会 场需要进行混音的音频码流;
将确定的所述目标会场需要进行混音的音频码流进行混音,并将混音后的 音频码流发送给所述目标会场; 或者
将确定的所述目标会场需要进行混音的音频码流发送给所述目标会场,并 在所述目标会场进行混音。
2、 根据权利要求 1所述的混音方法, 其特征在于, 所述分别对所述各个 会场的音频码流进行分析,得到所述各个会场对应的声源对象的声音特征值包 括: 对接收到的音频码流进行解码并计算所述声源对象的声音特征值。
3、 根据权利要求 1所述的混音方法, 其特征在于, 所述分别对所述各个 会场的音频码流进行分析,得到所述各个会场对应的声源对象的声音特征值还 包括: 从接收到的所述音频码流中提取所述声源对象的声音特征值。
4、 根据权利要求 1-3任一项所述的混音方法, 其特征在于, 所述根据目 标会场和所述主要声源对象所在会场之间的关系,确定所述目标会场需要进行 混音的音频码流包括:
判断所述目标会场是否为所述主要声源对象所在会场;
若所述目标会场为所述主要声源对象所在会场,则将除所述目标会场以外 的所述主要声源对象所在会场的音频码流确定为所述目标会场需要进行混音 的音频码流;
若所述选目标会场不为所述主要声源对象所在会场,则将所有所述主要声 源对象所在会场的音频码流均确定为所述目标会场需要进行混音的音频码流。
5、 根据权利要求 4所述的混音方法, 其特征在于, 在所述将混音后的音 频码流发送给所述目标会场之前, 还包括:
确定所述混音后的音频码流中的声源对象是否属于所述主要声源对象; 若所述混音后的音频码流中的声源对象不属于所述主要声源对象,则将所 述不属于所述主要声源对象的声源对象的音频码流消除。
6、 根据权利要求 1-3任一项所述的混音方法, 其特征在于, 将确定的所 述目标会场需要进行混音的音频码流进行混音包括:
将所述主要声源对象从其对应会场的音频码流中分离出来;
根据目标会场和所述主要声源对象所在会场之间的关系,将所述主要声源 对象进行混音。
7、 根据权利要求 6所述的混音方法, 其特征在于, 所述根据目标会场和 所述主要声源对象所在会场之间的关系,将所述主要声源对象进行混音,包括: 判断所述目标会场是否为所述主要声源对象所在会场;
若所述目标会场为所述主要声源对象所在会场,则将除所述目标会场以外 的所述主要声源对象进行混音;
若所述目标会场不为所述主要声源对象所在会场,则将所有所述主要声源 对象进行混音。
8、 根据权利要求 1-3任一项所述的混音方法, 其特征在于, 还包括: 在 所述目标会场的终端设备上将所述目标会场需要进行混音的音频码流进行混 音。
9、 一种混音装置, 其特征在于, 包括:
分析单元, 用于接收各个会场的音频码流, 并分别对所述各个会场的音频 码流进行分析, 得到所述各个会场对应的声源对象的声音特征值;
选取单元, 用于按照所述声源对象的声音特征值从大到小的顺序,从所述 声源对象中选取预定数量的声源对象作为主要声源对象; 会场确定单元, 用于确定所述主要声源对象所在会场;
音频确定单元,用于根据目标会场和所述主要声源对象所在会场之间的关 系, 确定所述目标会场需要进行混音的音频码流;
发送单元, 用于将确定的所述目标会场需要进行混音的音频码流进行混 音, 并将混音后的音频码流发送给所述目标会场; 或者
所述发送单元还用于,将确定的所述目标会场需要进行混音的音频码流发 送给所述目标会场, 并在所述目标会场进行混音。
10、根据权利要求 9所述的混音装置,其特征在于,所述分析单元还用于, 对接收到的音频码流进行解码并计算所述声源对象的声音特征值。
11、根据权利要求 9所述的混音装置,其特征在于,所述分析单元还用于, 从接收到的所述音频码流中提取所述声源对象的声音特征值。
12、 根据权利要求 9-11任一项所述的混音装置, 其特征在于, 所述音频 确定单元包括:
第一判断模块, 用于判断所述目标会场是否为所述主要声源对象所在会 场;
第一确定模块,用于在所述第一判断模块判断所述目标会场为所述主要声 源对象所在会场时,将除所述目标会场以外的所述主要声源对象所在会场的音 频码流确定为所述目标会场需要进行混音的音频码流;
所述第一确定模块还用于,在所述第一判断模块判断所述选目标会场不为 所述主要声源对象所在会场时,将所有所述主要声源对象所在会场的音频码流 均确定为所述目标会场需要进行混音的音频码流。
1 3、根据权利要求 12所述的混音装置,其特征在于,所述发送单元包括,: 第二确定模块, 用于在所述将混音后的音频码流发送给所述目标会场之 前, 确定所述混音后的音频码流中的声源对象是否属于所述主要声源对象; 消除模块,用于在所述第二确定模块确定所述混音后的音频码流中的声源 对象不属于所述主要声源对象时,将所述不属于所述主要声源对象的声源对象 的音频码流消除。
14、 根据权利要求 9-11任一项所述的混音装置, 其特征在于, 所述发送 单元还包括:
分离模块, 用于将所述主要声源对象从其对应会场的音频码流中分离出 来;
混音模块, 用于根据目标会场和所述主要声源对象所在会场之间的关系, 将所述主要声源对象进行混音。
15、 根据权利要求 14所述的混音装置, 其特征在于, 所述发送单元还包 括:
第二判断模块, 用于判断所述目标会场是否为所述主要声源对象所在会 场;
所述混音模块还用于,在所述第二判断模块判断所述目标会场为所述主要 声源对象所在会场时, 将除所述目标会场以外的所述主要声源对象进行混音; 所述混音模块还用于,在所述第二判断模块判断所述目标会场不为所述主 要声源对象所在会场, 则将所有所述主要声源对象进行混音。
16、 一种混音系统, 其特征在于, 包括: 如权利要求 9-15中任一项所述 的混音装置和会场终端;
所述会场终端, 用于在所述各个会场中釆集音频信号, 并将釆集到的音频 信号经编码和压缩处理后, 以音频码流的方式发送给所述混音装置;
所述会场终端还用于,接收所述混音装置发送的所述目标会场需要进行混 音的音频码流, 在所述目标会场将接收到的音频码流进行混音。
PCT/CN2012/082952 2011-10-13 2012-10-15 混音方法、装置及系统 WO2013053336A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/225,536 US9456273B2 (en) 2011-10-13 2014-03-26 Audio mixing method, apparatus and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110309836.6A CN103050124B (zh) 2011-10-13 2011-10-13 混音方法、装置及系统
CN201110309836.6 2011-10-13

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/225,536 Continuation US9456273B2 (en) 2011-10-13 2014-03-26 Audio mixing method, apparatus and system

Publications (1)

Publication Number Publication Date
WO2013053336A1 true WO2013053336A1 (zh) 2013-04-18

Family

ID=48062739

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/082952 WO2013053336A1 (zh) 2011-10-13 2012-10-15 混音方法、装置及系统

Country Status (3)

Country Link
US (1) US9456273B2 (zh)
CN (1) CN103050124B (zh)
WO (1) WO2013053336A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109087656A (zh) * 2017-06-14 2018-12-25 广东亿迅科技有限公司 一种基于mcu的多媒体会议混音方法及装置

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BR112016001738B1 (pt) 2013-07-31 2023-04-04 Dolby International Ab Método, aparelho incluindo um sistema de renderização de áudio e meio não transitório de processamento de objetos de áudio espacialmente difusos ou grandes
CN103500580B (zh) * 2013-09-23 2017-04-12 广东威创视讯科技股份有限公司 混音处理方法及系统
CN103680508B (zh) * 2013-12-09 2018-03-16 Tcl集团股份有限公司 多级混音的动态分配方法及动态分配装置
CN104167210A (zh) * 2014-08-21 2014-11-26 华侨大学 一种轻量级的多方会议混音方法和装置
CN104469032B (zh) * 2014-10-30 2017-06-16 苏州科达科技股份有限公司 混音处理方法及系统
CN105989845B (zh) * 2015-02-25 2020-12-08 杜比实验室特许公司 视频内容协助的音频对象提取
GB2543275A (en) * 2015-10-12 2017-04-19 Nokia Technologies Oy Distributed audio capture and mixing
US10325610B2 (en) 2016-03-30 2019-06-18 Microsoft Technology Licensing, Llc Adaptive audio rendering
US10424307B2 (en) 2017-01-03 2019-09-24 Nokia Technologies Oy Adapting a distributed audio recording for end user free viewpoint monitoring
CN107204191A (zh) * 2017-05-17 2017-09-26 维沃移动通信有限公司 一种混音方法、装置及移动终端
CN107888843A (zh) * 2017-10-13 2018-04-06 深圳市迅雷网络技术有限公司 用户原创内容的混音方法、装置、存储介质及终端设备
US11032580B2 (en) 2017-12-18 2021-06-08 Dish Network L.L.C. Systems and methods for facilitating a personalized viewing experience
US10365885B1 (en) * 2018-02-21 2019-07-30 Sling Media Pvt. Ltd. Systems and methods for composition of audio content from multi-object audio
CN109901811B (zh) * 2019-02-26 2022-09-06 北京华夏电通科技股份有限公司 应用于数字化庭审中的混音方法及装置
CN110992977B (zh) * 2019-12-03 2021-06-22 北京声智科技有限公司 一种目标声源的提取方法及装置
CN113031903B (zh) * 2021-03-23 2023-01-24 青岛海信移动通信技术股份有限公司 电子设备及其音频流合成方法
CN113257256A (zh) * 2021-07-14 2021-08-13 广州朗国电子科技股份有限公司 一种语音处理方法、会议一体机、系统及存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060247045A1 (en) * 2005-04-28 2006-11-02 Hyeonkuk Jeong Audio processing in a multi-participant conference
US20070285505A1 (en) * 2006-05-26 2007-12-13 Tandberg Telecom As Method and apparatus for video conferencing having dynamic layout based on keyword detection
CN101179693A (zh) * 2007-09-26 2008-05-14 深圳市丽视视讯科技有限公司 一种会议电视系统的混音处理方法
CN101335867A (zh) * 2007-09-27 2008-12-31 深圳市迪威新软件技术有限公司 一种会议电视系统的语音激励控制方法
CN101414462A (zh) * 2007-10-15 2009-04-22 华为技术有限公司 音频编码方法和多点音频信号混音控制方法及相应设备
CN101547268A (zh) * 2009-04-24 2009-09-30 北京飞利信科技股份有限公司 一种基于局域网的数字语音传输系统
CN101877643A (zh) * 2010-06-29 2010-11-03 中兴通讯股份有限公司 多点混音远景呈现方法、装置及系统
CN102065265A (zh) * 2009-11-13 2011-05-18 华为终端有限公司 实现混音的方法、装置和系统

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6230130B1 (en) * 1998-05-18 2001-05-08 U.S. Philips Corporation Scalable mixing for speech streaming
US6898637B2 (en) 2001-01-10 2005-05-24 Agere Systems, Inc. Distributed audio collaboration method and apparatus
JP2005229297A (ja) 2004-02-12 2005-08-25 Matsushita Electric Ind Co Ltd 音響調整卓
EP1691348A1 (en) 2005-02-14 2006-08-16 Ecole Polytechnique Federale De Lausanne Parametric joint-coding of audio sources
EP1853092B1 (en) 2006-05-04 2011-10-05 LG Electronics, Inc. Enhancing stereo audio with remix capability
EP1855455B1 (en) * 2006-05-11 2011-10-05 Global IP Solutions (GIPS) AB Audio mixing
JP5134623B2 (ja) 2006-07-07 2013-01-30 フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ 複数のパラメータ的に符号化された音源を合成するための概念
BRPI0716854B1 (pt) 2006-09-18 2020-09-15 Koninklijke Philips N.V. Codificador para codificar objetos de áudio, decodificador para decodificar objetos de áudio, centro distribuidor de teleconferência, e método para decodificar sinais de áudio
EP2575129A1 (en) 2006-09-29 2013-04-03 Electronics and Telecommunications Research Institute Apparatus and method for coding and decoding multi-object audio signal with various channel
US20080269929A1 (en) * 2006-11-15 2008-10-30 Lg Electronics Inc. Method and an Apparatus for Decoding an Audio Signal
CN100512422C (zh) 2006-11-23 2009-07-08 北京航空航天大学 多mcu视频会议系统中的混音方法
JP5270566B2 (ja) 2006-12-07 2013-08-21 エルジー エレクトロニクス インコーポレイティド オーディオ処理方法及び装置
JP5455647B2 (ja) 2007-01-10 2014-03-26 コーニンクレッカ フィリップス エヌ ヴェ オーディオデコーダ
US8639498B2 (en) 2007-03-30 2014-01-28 Electronics And Telecommunications Research Institute Apparatus and method for coding and decoding multi object audio signal with multi channel
US8289362B2 (en) 2007-09-26 2012-10-16 Cisco Technology, Inc. Audio directionality control for a multi-display switched video conferencing system
US8280744B2 (en) 2007-10-17 2012-10-02 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio decoder, audio object encoder, method for decoding a multi-audio-object signal, multi-audio-object encoding method, and non-transitory computer-readable medium therefor
EP2146522A1 (en) 2008-07-17 2010-01-20 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for generating audio output signals using object based metadata

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060247045A1 (en) * 2005-04-28 2006-11-02 Hyeonkuk Jeong Audio processing in a multi-participant conference
US20070285505A1 (en) * 2006-05-26 2007-12-13 Tandberg Telecom As Method and apparatus for video conferencing having dynamic layout based on keyword detection
CN101179693A (zh) * 2007-09-26 2008-05-14 深圳市丽视视讯科技有限公司 一种会议电视系统的混音处理方法
CN101335867A (zh) * 2007-09-27 2008-12-31 深圳市迪威新软件技术有限公司 一种会议电视系统的语音激励控制方法
CN101414462A (zh) * 2007-10-15 2009-04-22 华为技术有限公司 音频编码方法和多点音频信号混音控制方法及相应设备
CN101547268A (zh) * 2009-04-24 2009-09-30 北京飞利信科技股份有限公司 一种基于局域网的数字语音传输系统
CN102065265A (zh) * 2009-11-13 2011-05-18 华为终端有限公司 实现混音的方法、装置和系统
CN101877643A (zh) * 2010-06-29 2010-11-03 中兴通讯股份有限公司 多点混音远景呈现方法、装置及系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109087656A (zh) * 2017-06-14 2018-12-25 广东亿迅科技有限公司 一种基于mcu的多媒体会议混音方法及装置
CN109087656B (zh) * 2017-06-14 2020-11-17 广东亿迅科技有限公司 一种基于mcu的多媒体会议混音方法及装置

Also Published As

Publication number Publication date
US9456273B2 (en) 2016-09-27
CN103050124A (zh) 2013-04-17
US20140205115A1 (en) 2014-07-24
CN103050124B (zh) 2016-03-30

Similar Documents

Publication Publication Date Title
WO2013053336A1 (zh) 混音方法、装置及系统
EP2439945B1 (en) Audio panning in a multi-participant video conference
CN102226944B (zh) 混音方法及设备
US8531994B2 (en) Audio processing method, system, and control server
WO2019128204A1 (zh) 会议实现方法、装置、设备和系统、计算机可读存储介质
EP3197153B1 (en) Method and system for conducting video conferences of diverse participating devices
WO2012155660A1 (zh) 一种远程呈现方法、终端和系统
CN102915736B (zh) 混音处理方法和混音处理系统
US9191516B2 (en) Teleconferencing using steganographically-embedded audio data
WO2008141539A1 (fr) Procédé d'affichage de légendes, système et appareil de communication vidéo
JP6010176B2 (ja) オーディオ信号のデコーディング方法及びその装置
CN104167210A (zh) 一种轻量级的多方会议混音方法和装置
CN110070878B (zh) 音频码流的解码方法及电子设备
US20130151242A1 (en) Method to Select Active Channels in Audio Mixing for Multi-Party Teleconferencing
WO2010094219A1 (zh) 一种语音信号的处理、播放方法和装置
CN106303661B (zh) 一种直播客户端实现自适应屏幕旋转的方法及系统
WO2012028018A1 (zh) 分布式视频处理方法及视频会议系统
WO2014094461A1 (zh) 视频会议中的视音频信息的处理方法、装置及系统
CN102348097A (zh) 视频会议中的对话方法及多点控制单元
CN101350908A (zh) 用于网络视频会议的视频数据传输系统及方法
WO2012055291A1 (zh) 音频数据传输方法及系统
CN1933480A (zh) 一种多媒体数据转换网关的方法
US10375131B2 (en) Selectively transforming audio streams based on audio energy estimate
US11800017B1 (en) Encoding a subset of audio input for broadcasting conferenced communications
WO2016101623A1 (zh) 多点音频视频通信中远程互动的方法及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12840454

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12840454

Country of ref document: EP

Kind code of ref document: A1