WO2011153905A1 - 一种音频信号的混音处理方法及装置 - Google Patents

一种音频信号的混音处理方法及装置 Download PDF

Info

Publication number
WO2011153905A1
WO2011153905A1 PCT/CN2011/074820 CN2011074820W WO2011153905A1 WO 2011153905 A1 WO2011153905 A1 WO 2011153905A1 CN 2011074820 W CN2011074820 W CN 2011074820W WO 2011153905 A1 WO2011153905 A1 WO 2011153905A1
Authority
WO
WIPO (PCT)
Prior art keywords
terminal
orientation
audio signal
adjusted
channel
Prior art date
Application number
PCT/CN2011/074820
Other languages
English (en)
French (fr)
Inventor
梁丽燕
Original Assignee
华为终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为终端有限公司 filed Critical 华为终端有限公司
Priority to EP11791896.1A priority Critical patent/EP2568702B1/en
Publication of WO2011153905A1 publication Critical patent/WO2011153905A1/zh
Priority to US13/707,332 priority patent/US20130094672A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/567Multimedia conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/568Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction

Definitions

  • the present invention relates to the field of communications, and in particular, to a method and apparatus for mixing audio signals. Background technique
  • the multimedia server (taking the MCU (Multipoint Control Unit) in the video conference as an example) performs mixing processing on the audio signals transmitted by the participants participating in the multimedia communication.
  • the video conference is taken as an example to illustrate the process of the N-party mixing process.
  • the process includes: after receiving the voice code stream collected by the multimedia terminals of each site, the MCU decodes the voice code stream to obtain audio signals of each site.
  • the voice envelope of each site is calculated, and the N-party venue with the largest voice envelope in the conference (hereinafter referred to as the largest N-party conference site) is obtained by comparing the voice envelopes of the conference sites, and the audio signals of the N-party venues determined above are determined.
  • the mixing of the N-party venues is sent to the other venues in the N-party venue, and the remixes of the N-1-party venues other than the venue are sent to the conference sites in the N-party venue. signal.
  • the multimedia terminal adds orientation information to the audio information collected by the multimedia terminal, or the multimedia server allocates orientation information for the audio information transmitted by each multimedia terminal participating in the mixing.
  • a method for mixing audio signals including:
  • the audio signal after the azimuth adjustment is mixed with other to-be-mixed signals.
  • a mixing processing device for an audio signal comprising:
  • An azimuth adjustment module configured to determine a terminal that needs to perform an azimuth adjustment of an audio signal, and adjust an orientation information of the audio signal of the terminal;
  • the mixing processing module is configured to mix the audio signal after the azimuth adjustment with other to-be-mixed signals.
  • the embodiment of the present invention adjusts the orientation information of the transmitting terminal participating in the mixing, so that the orientations of the respective transmitting terminals are separated as much as possible, so that the sounds of the respective transmitting terminals are clearer. , thus improving the user's experience of the spot experience.
  • FIG. 1 is a schematic diagram of a mixing process according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of multi-screen display according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a telepresence screen display according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of an orientation provided by an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a mixing process according to Embodiment 1 of the present invention
  • FIG. 6 is a schematic diagram of a mixing process performed by Embodiment 2 of the present invention
  • Embodiment 7 is a schematic diagram of a mixing process performed by Embodiment 3 of the present invention.
  • FIG. 8 is a schematic structural diagram of a device according to an embodiment of the present invention. detailed description
  • the listener will hear the sound of the azimuth overlap, resulting in a decrease in the listening experience of the audience.
  • the embodiment of the present invention provides a method for mixing audio signals.
  • the solution can timely adjust the orientation information of the audio signals of the transmitting terminal participating in the mixing. Therefore, the listener can clearly listen to the orientation information of the audio signal sent by the venue, and improve the audience's experience of the on-the-spot experience.
  • the processing of the method can be applied to a multi-channel communication system with audio mixing, and its implementation is as shown in FIG. 1 , including:
  • the technical solution provided by the embodiment of the present invention adjusts the audio orientation information of the terminal in which the azimuth of the audio signal overlaps, so that the orientations of the respective transmitting terminals are separated as much as possible, and the sound position of each transmitting terminal is clearer, thereby improving the user's experience of the on-the-spot experience.
  • the azimuth adjustment of the audio signal of the terminal participating in the mixing is required, not only when the audio signal of the terminal has an azimuth overlap, in the video communication system, when a certain terminal enters the mixing system, or When the video screen arrangement changes or the like occurs, if the orientation of the terminal participating in the mixing does not coincide with the orientation of the terminal in the video screen, the orientation adjustment is also required.
  • the specific implementation manner of the foregoing S101 includes: when the orientation of the audio signal of the terminal participating in the mixing is inconsistent with the orientation of the terminal in the video screen, determining that the terminal is a terminal that needs to perform orientation adjustment, according to the terminal Adjusting, in the orientation of the video picture, the orientation information of the terminal is consistent with the orientation in the video picture; or, if the terminal is a two-channel or multi-channel terminal, the terminal is in the The orientation of the video frame and the actual orientation of the terminal adjust the orientation information of the terminal.
  • the actual orientation of the audio signal from the conference site E is right, but the position of the conference field E displayed on the multi-screen is left, and the orientation of the audio signal from the conference site E is adjusted to the left.
  • the actual orientation of the site F is the right, but the display area corresponding to the site F (display 1) is on the left side of the telepresence screen, and the orientation of the site F is adjusted to the left and right.
  • the multimedia server of the embodiment of the present invention may further adjust the orientation of the terminal to be adjusted according to the specified orientation information of the participant terminal.
  • the specific implementation manner of the S101 includes: determining that the terminal specified in the location specifying information is the azimuth to be adjusted terminal, and adjusting the orientation of the azimuth to be adjusted according to the orientation specifying information sent by the participant terminal.
  • the orientation specifying information is an orientation specified by the participant terminal for the azimuth to be adjusted terminal, and the multimedia server sets the orientation information for the azimuth to be adjusted terminal according to the orientation specifying information.
  • the location specifying information may further carry the specified effective information, where the specified effective information is used to indicate that the positioning information is adjusted for the terminal to be adjusted only when the audio sent to the participant terminal performs the mixing process; or When the mixing process is performed for some or all of the participating terminals, the orientation information is adjusted for the terminal to be adjusted.
  • the multimedia server may implement the orientation adjustment of the terminal in turn according to the order in which the different orientation specifying information is received, or according to the application order.
  • the card mode adjusts the orientation of the terminal to be adjusted, and may also control the terminal's authority to adjust the orientation of the transmitting terminal according to other set rules.
  • the specific implementation manner of the azimuth adjustment is: adjusting the orientation of the terminal to be adjusted according to the received orientation specifying information, and adjusting on the same side of the original orientation of the terminal.
  • the same side adjustment means that the original orientation of the two-channel transmitting terminal B participating in the mixing is on the left, and the orientation is adjusted to the left or the middle;
  • the original designated position of the channel transmitting terminal C is rightward, the orientation is adjusted to the right.
  • the adjustment when there are multiple terminals to be adjusted, when the audio signal orientation information of the terminals to be adjusted is adjusted, the adjustment may be sequentially performed according to the preset priority.
  • the embodiment of the present invention provides a preferred priority, the priority includes: when there is a mono, two-channel, and multi-channel terminal mixed mix, the mono terminal participating in the mix has the first adjustment priority
  • the terminal that participates in the mixing for the first time has a second adjustment priority; when there is a mixed mix of mono, two-channel, and multi-channel terminals, the two-channel terminal and the multi-channel terminal that participate in the mixing have the first Three priority.
  • the multi-channel terminal A, the two-channel terminal B, and the mono terminal C participating in the mixing are all azimuth to be adjusted terminals, wherein the two-channel terminal B participates in the mixing for the first time, and firstly, the mono terminal C
  • the audio signal orientation is adjusted, and then the audio signal orientation of the two-channel terminal B is adjusted, and finally the audio signal orientation of the multi-channel terminal A is adjusted.
  • the main body for adjusting the orientation information of the audio signal to the terminal is a multimedia server or another device having an orientation information adjustment function.
  • the multimedia server is an MCU (Multipoint Control Unit), or a terminal having an MCU function module, that is, a Mini MCU.
  • MCU Multipoint Control Unit
  • Mini MCU a terminal having an MCU function module
  • the MCU completes the mixing of audio signals from the multi-channel video multimedia terminal. After receiving the voice code stream of each site in the video conference, the MCU decodes the voice code stream of each site, and calculates the voice envelope of each site after decoding. The voice packet is obtained by comparing the voice envelopes of the sites. The largest N-party venue (ie, the largest N-party venue). For maximum N The audio signal of the party venue is mixed and sent.
  • the MCU judges the channel type of the largest N-party venue participating in the mixing and the channel type of the receiving site, according to the channel type of the largest N-party venue participating in the mixing ( Mono-site, two-channel venue or multi-channel venue), respectively, the corresponding pre-mixing processing (including mixing the mono data into two-channel data or multi-channel data with a specified orientation, or The two-channel data or the multi-channel data is down-mixed into mono data, and since the upmix processing and the downmix processing are prior art for audio processing, no further comment is made thereon, and then the corresponding mixing is performed. After processing, it is sent to the receivers of different channel types.
  • the largest N-party venue (sending terminal) participating in the mix will also receive the mix signal of the other N-1 party sites other than itself.
  • the first embodiment is a mixing process when there is a site in which the audio signal azimuth overlaps in the largest N-party site of the mixing.
  • the mixing process is as shown in FIG. 5, and the specific implementation includes the following operations: S501, MCU detection
  • the audio signal orientation of the largest N-party venue of the mix hereinafter referred to as: azimuth;
  • the orientation of the venue is specified externally (specifically, can be specified by the MCU, user specified, etc.), and for the two-channel or multi-channel conference, except for the external designation In addition to the orientation, it can also be the actual orientation detected based on the data of the venue itself.
  • Two-channel and multi-channel field position detection methods In general, the human ear's perception of the sound source orientation is based on its signal difference between the ears, such as time difference or energy difference. That is to say, if the time difference or energy difference of the sound source in a certain direction is the same in both ears, then the person will feel that the sound source is in the middle of the ears, if the energy to the left ear is higher than the energy of the right ear Large, or the time to the left ear is earlier than the time to the right ear, then the person will feel the source is biased to the left.
  • time difference or energy difference of the sound source in a certain direction is the same in both ears, then the person will feel that the sound source is in the middle of the ears, if the energy to the left ear is higher than the energy of the right ear Large, or the time to the left ear is earlier than the time to the right ear, then the person will feel the source is biased to the left.
  • the time difference or/and the energy difference of the two-channel or multi-channel data is generally detected to obtain the actual orientation, that is, which side of the time or energy is biased, and the orientation is also biased to which side.
  • the assumption is divided into five directions: left, left, middle, right, right, and the energy difference between the two channels in the middle direction is within 3 dB, left or offset.
  • the energy of the two channels in the right direction differs by 3 to 6 dB, and the energy difference between the two channels in the left or right direction is greater than 6 dB.
  • the MCU determines whether there is overlap between the orientations of the largest N-party venues, if yes, execute S504, if not, execute S503;
  • the MCU performs mixing processing on the audio signal of the largest N-party venue, and the specific implementation manner of the mixing processing can be implemented by using an existing mixing method, which is not described in detail herein;
  • the MCU determines the terminal to be adjusted according to the preset orientation of the terminal to be determined. (Because in the video conference, a conference venue has a multimedia terminal, for the convenience of expression, the subsequent conference site corresponds to Terminal of the venue);
  • a preferred mode of determining the terminal to be adjusted is provided below.
  • the preferred mode of determining the target terminal is:
  • a terminal to be adjusted as an orientation is determined randomly or in the order of entering the mixer.
  • the preset priority is:
  • the mono transmitting terminal participating in the mixing has the first adjustment priority
  • the transmitting terminal that participates in the mixing for the first time has a second adjustment priority (the audio signal that is determined to enter the mixer for mixing is determined by the magnitude of the comparison energy, and the energy of the audio signal from each terminal is changed, Therefore, the maximum N terminals participating in the mix are dynamically adjusted); when there are mixed mixes of mono, two-channel, and multi-channel terminals, the two channels participating in the mix.
  • the transmitting terminal and the multi-channel transmitting terminal have a third adjustment priority.
  • the two-channel terminal B is in the orientation.
  • the terminal to be adjusted; the mono terminal C and the multi-channel terminal D that participate in the mixing overlap, and the mono terminal C is selected as the azimuth to be adjusted terminal.
  • the monophonic site 1 and the two-channel site 2 in the maximum N-party site have an azimuth overlap of the audio signals, it is determined that the mono site 1 needs to be adjusted in orientation;
  • the MCU adjusts the position of the site determined in S504 according to the preset orientation adjustment principle, so that the orientations between the largest N-party sites no longer overlap, and execute S506;
  • a preferred azimuth adjustment principle is provided below.
  • the preferred azimuth adjustment principle is based on the principle of separation and proximity, and if the terminal to be adjusted is a mono terminal, the terminal is preferentially adjusted to both sides (both sides) If the orientation to be adjusted terminal is a two-channel transmitting terminal or a multi-channel transmitting terminal, the target transmitting terminal is preferentially adjusted to the intermediate position; so-called separation and proximity means that the orientation needs to be performed.
  • the adjusted terminal adjusts to the same side orientation of the original orientation of the terminal, and still takes the orientation diagram shown in FIG. 4 as an example. When the original orientation of the two-channel transmitting terminal B participating in the mixing is on the left, the orientation is adjusted to Left or middle; if the original specified orientation of the mono transmitting terminal C participating in the mixing is to the right, the orientation is adjusted to the right;
  • the MCU performs mixing processing on the azimuth-adjusted audio signal and other audio signals.
  • the specific implementation manners of the receiving terminals for different channel types include:
  • the sub-bands of the mixed signal are obtained by comparing the energy of the audio signal of the largest N-party site participating in the mixing in each sub-band of the mixing signal.
  • the position information of the site where the audio signal energy is the highest, and the site where the audio signal energy is the largest is the site where the azimuth adjustment is performed, and the orientation information refers to the adjusted side. Bit information), and transmitting the orientation information of the largest N-party venue where the energy of the audio signal is maximized in each of the sub-bands and the mixed-mixed signal after the mixing processing to the mono-site receiving terminal;
  • the audio signal of the largest N-party site is generated as a double according to the adjusted orientation information.
  • the audio signal of the transmitting terminal is adjusted according to the adjusted orientation, and then the mixing processing is performed, and the mixing signal is mixed.
  • the implementation of the audio signal of the mono site in the largest N-party site as a two-channel audio signal may include, but is not limited to: according to the adjusted orientation information of the mono site, the mono site
  • the mono audio signal is energy-distributed to obtain a two-channel audio signal with spatial orientation information.
  • the adjusted mono field position is "Right", which can be the right channel audio in the process of generating the two-channel audio data in the mono audio signal, compared to the energy distribution of the left channel audio signal.
  • the signal distributes more energy.
  • the implementation of the audio signal of the multi-channel venue in the largest N-party venue as a two-channel audio signal may include, but is not limited to:
  • Manner 1 The audio signal of the multi-channel site is generated as a mono audio signal, and then the mono audio signal is generated as a two-channel audio signal according to the adjusted orientation information of the multi-channel site;
  • Manner 2 Generate a two-channel audio signal by energy allocation according to the adjusted orientation information of the multi-channel site.
  • the implementation of the mixing process after adjusting the audio signal of the two-channel venue according to the adjusted orientation may include, but is not limited to:
  • Method 1 generating the audio signal of the two-channel venue as a mono audio signal, and then generating the mono audio signal into a two-channel audio signal according to the adjusted orientation information of the two-channel venue, The two-channel audio signal obtained after processing is subjected to a mixing process;
  • Method 2 According to the adjusted orientation information of the two-channel venue, the energy is redistributed To the two-channel audio signal, the processed two-channel audio signal is subjected to the mixing process.
  • the multi-channel site receiving terminal if there is a mono site or a two-channel site in the maximum N-party site, the mono-site or the two-channel site is selected according to the adjusted orientation information. After the audio signal is generated as a multi-channel audio signal, the audio processing is performed. If there is a multi-channel conference field in the maximum N-party conference site, the audio signal of the multi-channel conference site is adjusted according to the adjusted orientation, and then mixed. To process the sound, and send the mixed signal to the multi-channel venue transmitting terminal;
  • the audio signal of the monophonic site as a two-channel audio signal
  • the implementation of generating the audio signal of the two-channel venue as a multi-channel audio signal may include, but is not limited to:
  • Manner 1 The audio signal of the two-channel venue is generated as a mono audio signal, and then the mono audio signal is generated into a multi-channel audio signal according to the adjusted orientation information of the two-channel venue;
  • Method 2 Generate a multi-channel audio signal by energy allocation according to the adjusted orientation information of the two-channel venue.
  • the implementation of the mixing process by adjusting the audio signal of the multi-channel venue according to the adjusted orientation may be, but is not limited to:
  • Method 1 generating the audio signal of the multi-channel site as a mono audio signal, and then generating the mono audio signal into a multi-channel audio signal according to the adjusted orientation information of the multi-channel site, The multi-channel audio signal obtained after processing is subjected to mixing processing;
  • Manner 2 According to the adjusted orientation information of the multi-channel site, the multi-channel audio signal is obtained by redistributing the energy, and the processed multi-channel audio signal is subjected to the mixing process.
  • the orientations of the audio signals between the various venues in the largest N-party venue are not overlapped, thereby improving the speech intelligibility and improving the audience's experience of the on-the-spot experience.
  • the second embodiment is that the presence of the largest N-party venue participating in the mix is different from the orientation in the video frame.
  • the mixing process of the conference site the mixing process is shown in Figure 6, the specific implementation includes the following operations: the orientation is consistent, if yes, then execute S602, if not, then execute S603;
  • the MCU performs mixing processing on the audio signal from the largest N-party conference site, and the specific implementation manner of the mixing processing can be implemented by using an existing mixing method, which is not described in detail herein;
  • the MCU adjusts the orientation of the site according to the position of the detected site that is inconsistent in the video frame, and the specific adjustment manner includes but is not limited to:
  • Azimuth adjustment is made according to the actual orientation of the site and its orientation in the video screen. For example, the actual orientation of the site 1 is right, but the orientation of the site 1 displayed in the multi-screen is the left, and the orientation of the site 1 is adjusted. Left to the right;
  • the mixing processing is performed according to the adjusted orientation information.
  • the specific mixing processing manner refers to the mixing implementation manner of the receiving terminal for different channel types in the first embodiment of the present invention.
  • the orientation of the largest N-party venue that is listened to by the user of the video communication system and the distribution of the largest N-party conference site in the video frame are performed by adjusting the orientation of the conference site whose orientation is inconsistent with the orientation of the video image. Matching, thus improving the audience's feeling of on-the-spot experience.
  • the third embodiment is a mixing process when the location of the receiving end is the maximum N-party site.
  • the mixing process is shown in Figure 7.
  • the specific implementation includes the following operations:
  • the MCU receives the location specifying information sent by the site n, where the location specifying information is used to indicate The MCU adjusts the orientation of the site a in the largest N-party site.
  • the location specification information may be sent by using a signaling manner;
  • the MCU adjusts the orientation of the field a to the orientation specified in the orientation specifying information.
  • the specified information may also carry the specified effective information, where the specified effective information is used to indicate that the location information is adjusted for the site a only when the mixing process is sent to the site n; or the sound is sent to some or all of the sites.
  • adjust the orientation information for site a may include one or several site identifiers.
  • the validation information includes a site identifier "n"
  • the MCU only specifies the location in the mixing process sent to the site n.
  • the orientation specified in the information is the orientation adjustment of the site a.
  • the MCU sends the site to the site (the venue n During the mixing process of the site b and the site c), the orientation adjustment is performed for the site a according to the orientation specified in the orientation specifying information. If a plurality of sites are specified for the site a, the MCU can adjust the orientation of the site a in turn according to the order in which the different directions are specified, or adjust the orientation of the site a according to the application token mode. Control the permissions of each site to adjust the a-direction of the site according to other rules.
  • the MCU performs a mixing process according to the adjusted orientation information.
  • the specific mixing processing mode refers to the mixing implementation manner of the receiving terminal for different channel types in the first embodiment of the present invention.
  • the site n specifies the orientation of the site a, and the location of the site a is inconsistent with the location of the site a in the video screen, by way of example and not limitation, the site may be preferentially placed according to the specified location information of the site n.
  • the MCU adjusts the orientation of the specified N-party venue according to the orientation information sent by the site, so that the user can adjust the orientation of the designated site according to his own needs, thereby improving the satisfaction of the audience experience.
  • the present invention further provides an embodiment of a sound processing device for an audio signal.
  • the device can At the same time, the orientation information of the audio signal of the transmitting terminal participating in the mixing is adjusted, so that the listener can clearly listen to the orientation information of the audio signal sent by the venue, thereby improving the audience's experience of the on-the-spot experience, and its structure is as shown in FIG.
  • the implementation structure includes:
  • the azimuth adjustment module 801 is configured to determine a terminal that needs to perform audio signal azimuth adjustment, and adjust an audio signal orientation information of the terminal;
  • the mixing processing module 802 is configured to mix the audio signal after the azimuth adjustment with other to-be-mixed signals.
  • the device provided by the embodiment of the invention adjusts the audio orientation information of the terminal in which the azimuth of the audio signal overlaps, so that the orientations of the respective transmitting terminals are separated as much as possible, and the sound orientation of each transmitting terminal is clearer, thereby improving the user's experience of the on-the-spot experience.
  • the situation in which the audio signal of the terminal participating in the mixing needs to be adjusted is not limited to when the audio signal of the terminal has an azimuth overlap, in the video communication system, when a certain terminal enters the mixing system. Or when the video screen arrangement changes, etc., if the orientation of the terminal participating in the mixing does not match the orientation of the terminal in the video screen, the orientation adjustment is also required.
  • the azimuth adjustment module 801 further includes a target terminal determining submodule 8011, configured to determine, when the terminal needs to perform audio signal orientation adjustment, when the terminal overlaps with the orientation of the audio signal of the other terminal; Or when the orientation of the audio signal of the terminal does not match the position of the video picture of the terminal in the multi-picture; or when the terminal participates in the mixing for the first time.
  • a target terminal determining submodule 8011 configured to determine, when the terminal needs to perform audio signal orientation adjustment, when the terminal overlaps with the orientation of the audio signal of the other terminal; Or when the orientation of the audio signal of the terminal does not match the position of the video picture of the terminal in the multi-picture; or when the terminal participates in the mixing for the first time.
  • the orientation adjustment module 801 is specifically configured to adjust the orientation of the terminal to Orientation displayed in the video picture; or if the terminal is a two-channel or multi-channel terminal, the orientation is adjusted in conjunction with the actual orientation of the terminal and its orientation in the video picture.
  • the actual orientation of the audio signal from the venue E is Right, but the position displayed by the site E in the multi-screen is left, then the orientation of the audio signal from the site E is adjusted to the left or right; or, as shown in Figure 3, the actual orientation of the site F is the right, but the site F corresponds.
  • the display area (display 1) is on the left side of the telepresence screen, and the orientation of the conference site F is adjusted to the left and right.
  • the apparatus provided by the embodiment of the present invention may further adjust the orientation of the terminal to be adjusted according to the specified location information of the conference terminal.
  • the orientation adjustment module 801 is specifically configured to adjust the orientation of the azimuth to be adjusted according to the orientation specifying information sent by the participant terminal, where the orientation specifying information is that the participant terminal is in the orientation Adjust the orientation specified by the terminal.
  • the location specifying information may further carry the specified effective information, where the specified effective information is used to indicate that the positioning information is adjusted for the terminal to be adjusted only when the audio sent to the participant terminal performs the mixing process; or When the mixing process is performed for some or all of the participating terminals, the orientation information is adjusted for the terminal to be adjusted.
  • the orientation adjustment module 801 alternately adjusts the orientation of the terminal according to the sequence of receiving the different orientation designation information, or according to the application.
  • the manner of the token adjusts the orientation of the terminal to be adjusted, and the terminal may also control the authority of the terminal to adjust the orientation of the transmitting terminal according to other set rules.
  • the azimuth adjustment module 801 specifically performs the orientation of the azimuth to be adjusted terminal according to the received orientation specifying information, and performs on the same side of the original orientation of the terminal. Adjustment.
  • the same side adjustment means that the original orientation of the two-channel transmitting terminal B participating in the mixing is on the left, and the orientation is adjusted to the left or the middle; When the original designated position of the channel transmitting terminal C is rightward, the orientation is adjusted to the right.
  • the orientation adjustment module 801 may sequentially adjust the audio signal orientation information of the plurality of azimuth to be adjusted terminals according to a preset priority.
  • a preferred priority which includes: when there is a mixed mix of mono, dual, and multi-channel terminals, the mono terminal participating in the mixing has the first Adjusting the priority; the terminal that participates in the mixing for the first time has the second adjustment priority; when there is a mixed mix of mono, two-channel, and multi-channel terminals, the two-channel terminal and the multi-voice that participate in the mixing
  • the track terminal has a third priority.
  • the multi-channel terminal, the two-channel terminal B, and the mono terminal C participating in the mixing are all azimuth to be adjusted terminals, wherein the two-channel terminal B participates in the mixing for the first time, firstly for the mono terminal C.
  • the audio signal orientation is adjusted, and then the audio signal orientation of the two-channel terminal B is adjusted, and finally the audio signal orientation of the multi-channel terminal A is adjusted.
  • the device for adjusting the orientation information of the audio signal to the terminal is a multimedia server or another device having an orientation information adjustment function.
  • the multimedia server is an MCU (Multipoint Control Unit), or a terminal having an MCU function module, that is, a Mini MCU.
  • MCU Multipoint Control Unit
  • Mini MCU a terminal having an MCU function module

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Telephonic Communication Services (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Description

一种音频信号的混音处理方法及装置 本申请要求于 2010年 6月 7日提交中国专利局、 申请号为
201010199195.9 ,发明名称为 "一种音频信号的混音处理方法及装置,,的中国专 利申请的优先权, 其全部内容通过引用结合在本申请中。 技术领域
本发明涉及通信领域, 尤其涉及一种音频信号的混音处理方法及装置。 背景技术
在多媒体通信系统中, 多媒体服务器(以视讯会议中的 MCU ( Multipoint Control Unit, 多点控制单元)为例)对参加多媒体通信的各方参与者所发送 的音频信号进行混音处理。下面以视讯会议为例来说明 N方混音处理的过程, 其过程具体包括: MCU接收到各会场的多媒体终端釆集的语音码流后, 对语 音码流进行解码得到各会场的音频信号, 并计算各会场解码后的语音包络, 通过比较各会场的语音包络得到会议中语音包络最大的 N方会场 (以下简称 最大 N方会场), 并对上述确定的 N方会场的音频信号进行混音; 向上述确 定的 N方会场之外的其他会场发送最大 N方会场的混音信号,向最大 N方会 场中的各会场发送除本会场之外的 N-1方会场的混音信号。
现有技术中, 为了增强用户的体验, 多媒体终端会为自身釆集到的音频 信息增加方位信息, 或者多媒体服务器会为参加混音的各个多媒体终端发送 的音频信息分配方位信息。
发明人在实现本发明的过程中, 发现现有技术中至少存在如下问题: 现有的混音处理方案中, 对于接收终端, 接收到的混音后的音频信号往 往会出现方位重叠, 使得用户无法清晰地收听到来自同一方位的多个会场的 语音信号, 从而降低了用户的临场体验感觉。 发明内容 本发明的实施例提供了一种音频信号的混音处理方法及装置, 从而提高 听众的临场体验感觉。
本发明的目的是通过以下技术方案实现的:
一种音频信号的混音处理方法, 包括:
确定需要进行音频信号方位调整的终端, 对所述终端的音频信号方位信 息进行调整;
将方位调整之后的音频信号与其他待混音信号进行混音处理。
一种音频信号的混音处理装置, 包括:
方位调整模块, 用于确定需要进行音频信号方位调整的终端, 对所述终 端的音频信号方位信息进行调整;
混音处理模块, 用于将方位调整之后的音频信号与其他待混音信号进行 混音处理。
由上述本发明的实施例提供的技术方案可以看出, 本发明实施例对参加 混音的发送终端的方位信息进行调整, 可以使得各个发送终端的方位尽量分 开, 使得各个发送终端的声音更清楚, 从而提高了用户的临场体验感觉。 附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实 施例或现有技术描述中所需要使用的附图作简单地介绍, 显而易见地, 下面 描述中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动的前提下, 还可以根据这些附图获得其它的附图。
图 1为本发明实施例提供的一种混音处理过程示意图;
图 2为本发明实施例提供的多画面显示示意图;
图 3为本发明实施例提供的网真画面显示示意图;
图 4为本发明实施例提供的方位示意图;
图 5为本发明实施例一提供的混音处理过程示意图; 图 6为本发明实施例二提供的混音处理过程示意图;
图 7为本发明实施例三提供的混音处理过程示意图;
图 8为本发明实施例提供的装置结构示意图。 具体实施方式
下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而 不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作 出创造性劳动前提下所获得的所有其他实施例 , 都属于本发明保护的范围。
在混音系统中, 如果两个以上参加混音的终端之间的方位重叠, 则会使 得听众收听到方位重叠的声音, 导致听众临场收听体验下降。
本发明实施例提供一种音频信号的混音处理方法, 当参加混音的终端的 音频信号的方位发生重叠时, 本方案能够及时对参加混音的发送终端的音频 信号的方位信息进行调整, 从而使听众能够清楚地收听到会场发送的音频信 号的方位信息, 提高听众的临场体验感觉。 该方法的处理过程可应用在含音 频混音的多路媒体通信系统中, 其实现方式如图 1所示, 包括:
5101、 确定需要进行音频信号方位调整的终端, 对所述终端的音频信号 方位信息进行调整;
5102、 将方位调整之后的音频信号与其他的待混音信号进行混音处理。 本发明实施例提供的技术方案, 调整了发生音频信号方位重叠的终端的 音频方位信息, 使得各个发送终端的方位尽量分开, 各个发送终端的声音方 位更清楚, 从而提高了用户的临场体验感觉。
上述实施例中, 需要对参加混音的终端的音频信号进行方位调整的情况, 不仅限于当终端的音频信号存在方位重叠时, 在视频通信系统中, 当某一终 端进入混音系统时, 或者当视频画面排列变化等情况发生时, 如果参加混音 的终端的方位与该终端在视频画面中的方位不一致时, 也需要进行方位调整。 相应的, 上述 S101的具体实现方式包括: 当参加混音的终端的音频信号 的方位与所述终端在视频画面中的方位不一致时, 确定该终端为需要进行方 位调整的终端, 根据所述终端在所述视频画面的方位, 调整所述终端的方位 信息与在所述视频画面中的方位一致; 或者, 如果所述终端为双声道或多声 道终端时, 居所述终端在所述视频画面的方位和所述终端的实际方位, 调 整所述终端的方位信息。
作为举例而非限定, 如图 2所示, 来自会场 E的音频信号的实际方位为 右, 但会场 E在多画面中显示的位置为左边, 则调整来自会场 E的音频信号 的方位为左边偏右; 或者, 如图 3所示, 会场 F的实际方位为右边, 但会场 F 对应的显示区域(显示器 1 )在网真画面的左边, 则调整会场 F的方位为左边 偏右。
在会议系统中, 本发明实施例多媒体服务器还可以根据与会终端的指定 方位信息对方位待调整终端的方位进行调整。 在这种情况下, S101 的具体实 现方式包括: 确定该方位指定信息中指定的终端为方位待调节终端, 并根据 上述与会终端所发出的方位指定信息, 调整所述方位待调整终端的方位。 其 中, 该方位指定信息是所述与会终端为所述方位待调节终端指定的方位, 多 媒体服务器根据该方位指定信息为所述方位待调节终端设置方位信息。
可选的, 方位指定信息中还可以携带指定生效信息, 该指定生效信息用 来指示仅在发送给该与会终端的音频进行混音处理时, 为该方位待调整终端 调整方位信息; 或者在发送给若干或全部与会终端的混音处理时, 为该方位 待调整终端调整方位信息。
可选的, 如果有多个与会终端为同一参加混音的终端指定不同的方位时, 则多媒体服务器可根据接收到不同方位指定信息的先后顺序轮流实现对该终 端的方位调整, 或者按照申请令牌方式对该方位待调整终端的方位进行调整, 也可以按照其他设定的规则控制终端调整该发送终端方位的权限。
当根据与会终端的指定方位信息对方位待调整终端的方位进行调整时, 方位调整具体实现方式是: 对方位待调整终端的方位按照接收到的方位指定 信息的指示, 在该终端原方位的同侧进行调整。 如图 4 所示的方位示意图为 例, 同侧调整是指, 参加混音的双声道发送终端 B的原方位在左, 则将其方 位调整为偏左或中间; 参加混音的单声道发送终端 C的原指定方位为偏右, 则将其方位调整为右。
本发明实施例中, 当存在多个方位待调整的终端时, 对这些方位待调整 终端的音频信号方位信息进行调整时, 可以根据预先设置的优先级依次进行 调整。 本发明实施例提供一种优选的优先级, 该优先级包括: 在存在单声道、 双声道以及多声道终端混合混音时, 参加混音的单声道终端具有第一调整优 先级; 第一次参加混音的终端具有第二调整优先级; 在存在单声道、 双声道 以及多声道终端混合混音时, 参加混音的双声道终端和多声道终端具有第三 优先级。 举例说明, 参加混音的多声道终端 A、 双声道终端 B和单声道终端 C均为方位待调整终端, 其中双声道终端 B初次参加混音, 则首先对单声道 终端 C的音频信号方位进行调整, 其次对双声道终端 B的音频信号方位进行 调整, 最后对多声道终端 A的音频信号方位进行调整。
上述对终端进行音频信号方位信息进行调整的主体, 是多媒体服务器, 或者具有方位信息调节功能的其他设备。 在视讯会议领域, 该多媒体服务器 即是 MCU ( Multipoint Control Unit, 多点控制单元), 也可以是具有 MCU功 能模块的终端, 即: Mini MCU, 上述主要基于视讯会议系统组网架构的不同 来确定。
下面将对本发明实施例在实际应用过程中的具体实现方式进行详细的说 明。
以视频通信系统为例, 其中, MCU完成来自多路视讯多媒体终端的音频 信号的混音。 MCU在接收到视频会议中各会场的语音码流后, 对各会场的语 音码流进行解码, 并计算解码后每个会场的语音包络, 通过对各个会场的语 音包络进行比较得到语音包络最大的 N方会场(即最大 N方会场)。对最大 N 方会场的音频信号进行混音处理后发送。
其中, 在进行混音处理过程中, MCU会对参加混音的最大 N方会场的声 道类型以及接收端会场的声道类型进行判断, 根据参加混音的最大 N方会场 的声道类型 (单声道会场、 双声道会场或者多声道会场), 分别进行相应的混 音前处理(包括将单声道数据上混为具有指定方位的双声道数据或者多声道 数据, 或者将双声道数据或者多声道数据下混为单声道数据, 由于所述上混 处理以及下混处理为音频处理的现有技术, 对此不再进行赞述), 进而进行相 应的混音处理后, 发送给不同声道类型的接收端会场。 参加混音的最大 N方 会场(发送终端) 同时也会接收到除自身之外的其他 N-1方会场的混音信号。
实施例一
实施例一为当参加混音的最大 N方会场中存在音频信号方位重叠的会场 时的混音处理过程, 其混音处理过程如图 5所示, 具体实现包括如下操作: S501、 MCU检测待混音的最大 N方会场的音频信号方位(下面简称: 方 位);
其中, 由于单声道会场本身没有方位, 所以会场的方位是由外部指定的 (具体可以通过 MCU指定, 用户指定等方式), 而对双声道或者多声道会场 来说, 除了是外部指定的方位以外, 还可以是根据会场本身数据检测得到的 实际方位。
一种优选的方式如下:
双声道以及多声道会场方位检测的方法: 一般来说, 人耳对声源方位的 感知是基于其在双耳间的信号差别, 如时间差或者是能量差。 也就是说, 如 果某个方位的音源在双耳中的时间差或者能量差是一样的, 则人就会感觉到 该音源在双耳的正中间, 如果到左耳的能量比到右耳的能量大, 或者是到左 耳的时间比到右耳的时间提前, 则人就会感觉到音源偏向左边。 根据这个理 论,一般检测双声道或者多声道数据的时间差或 /和能量差来得到实际的方位, 即时间或者能量偏向哪一边, 则方位也相应的偏向哪一边。 以双声道数据为例说明, 假设共分为 5 个方位: 左边、 偏左、 中间、 偏 右、 右边, 并且^^定中间方位两个声道的能量差别在 3dB以内, 偏左或者偏 右方位两个声道的能量相差 3 ~ 6dB, 左边或者右边方位两个声道的能量差别 大于 6dB。 首先分别计算两个声道数据的能量, 然后对两个声道的能量进行 比较, 如果左声道的能量比右声道的能量大 4dB, 则可判断出实际方位为偏 左方位。
5502、 MCU判断最大 N方会场的方位之间是否存在重叠, 如果是, 则执 行 S504, 如果不是, 则执行 S503;
5503、 MCU对最大 N方会场的音频信号进行混音处理, 混音处理的具体 实现方式可通过现有的混音方式实现, 这里不再详述;
5504、 MCU根据预先设置的方位待调整终端确定方式, 确定需要进行方 位调整的会场 (由于在视讯会议中, 一个参会会场具有一个多媒体终端, 为 了简便表达, 后续所指的会场, 即对应该会场的终端);
下面提供一种优选的方位待调整终端确定方式, 该优选的目标终端确定 方式为:
根据预先设置的优先级, 从方位重叠的发送终端中选择优先级最高的发 送终端; 如果所选择的终端只有一个, 则该终端为方位待调整终端, 如果所 选择的终端为两个或两个以上, 则随机选择或按照进入混音器的顺序确定一 个作为方位待调整终端。
可选的, 预先设置的优先级为:
在存在单声道、 双声道以及多声道终端混合混音时, 参加混音的单声道 发送终端具有第一调整优先级;
第一次参加混音的发送终端具有第二调整优先级(由于判定进入混音器 进行混音的音频信号通过比较能量的大小确定的, 而来自各个终端的音频信 号的能量是发生变化的, 因此参加混音的最大 N个终端是动态调整的); 在存在单声道、 双声道以及多声道终端混合混音时, 参加混音的双声道 发送终端和多声道发送终端具有第三调整优先级。
以如图 4所示的方位示意图为例, 参加混音的多声道终端 A和双声道终 端 B存在方位重叠, 且双声道终端 B初次参加混音, 则双声道终端 B为方位 待调整终端; 参加混音的单声道终端 C和多声道终端 D方位重叠, 则选择单 声道终端 C为方位待调整终端。 根据该优选的方位待调整终端确定方式, 如 果最大 N方会场中的单声道会场 1和双声道会场 2存在音频信号的方位重叠, 则确定单声道会场 1需要进行方位调整;
5505、 MCU按照预先设置的方位调整原则, 对 S504中确定的会场的方 位进行调整, 使得最大 N方会场之间的方位不再重叠, 并执行 S506;
下面提供一种优选的方位调整原则, 该优选的方位调整原则是以分开、 就近为原则, 且如果方位待调整终端是单声道终端, 则优先将该终端向两侧 方位调整(两侧方位是相对方位 "中" 而言的); 如果方位待调整终端是双声 道发送终端或多声道发送终端, 则优先将目标发送终端向中间方位调整; 所 谓分开、 就近是指将需要进行方位调整的终端向该终端原方位的同侧方位进 行调整, 仍以如图 4所示的方位示意图为例, 参加混音的双声道发送终端 B 的原方位在左,则将其方位调整为偏左或中间;参加混音的单声道发送终端 C 的原指定方位为偏右, 则将其方位调整为右;
釆用这种分开、 就近原则, 能够在对音频信号的方位信息进行调整解决 方位重叠的情况下, 保证近似初始方位, 避免了因为调整过大而影响用户对 原音频信号的听觉感受。
5506、 MCU对方位调整后的音频信号与其他的音频信号进行混音处理, 优选的, 针对不同声道类型的接收终端的具体实现方式包括:
( 1 )针对单声道会场接收终端, 完成方位调整后, 通过对混音信号中的 各个子带上参加混音的最大 N方会场的音频信号能量进行比较, 获得混音信 号中各个子带上音频信号能量最大的参加混音的会场的方位信息 (如果音频 信号能量最大的会场是进行方位调整的会场, 则该方位信息是指调整后的方 位信息 ),并将所述混音信号中各个子带上音频信号能量最大的最大 N方会场 的方位信息和经过混音处理后的混音信号发送给所述单声道会场接收终端;
( 2 )针对双声道会场接收终端, 如果所述最大 N方会场中有单声道会场 或多声道会场, 则根据调整后的方位信息将所述最大 N方会场的音频信号生 成为双声道音频信号后进行混音处理,如果所述最大 N方会场有双声道会场, 则根据调整后的方位对所述发送终端的音频信号进行调整后参加混音处理, 并将混音信号发送给所述双声道会场接收终端;
其中, 将最大 N方会场中的单声道会场的音频信号生成为双声道音频信 号的实现方式可以包括但不仅限于: 根据调整后的单声道会场的方位信息, 对该单声道会场的单声道音频信号进行能量分配, 获得具备空间方位信息的 双声道音频信号。 例如: 调整后的单声道会场的方位为 "右", 则可在单声道 音频信号生成双声道音频数据的过程中, 相对于左声道音频信号的能量分配, 为右声道音频信号分配更大的能量。
将最大 N方会场中的多声道会场的音频信号生成为双声道音频信号的实 现方式可以包括但不仅限于:
方式一: 将该多声道会场的音频信号生成为单声道音频信号, 然后根据 调整后的该多声道会场的方位信息, 将上述单声道音频信号生成为双声道音 频信号;
方式二: 根据调整后的该多声道会场的方位信息, 通过能量分配生成双 声道音频信号。
根据调整后的方位对所述双声道会场的音频信号进行调整后参加混音处 理的实现方式可以包括但不仅限于:
方式一: 将该双声道会场的音频信号生成为单声道音频信号, 然后根据 调整后的该双声道会场的方位信息, 将上述单声道音频信号生成为双声道音 频信号, 将处理后得到的双声道音频信号参加混音处理;
方式二: 根据调整后的该双声道会场的方位信息, 通过能量重新分配得 到双声道音频信号, 将处理后得到的双声道音频信号参加混音处理。
( 3 )针对多声道会场接收终端, 如果所述最大 N方会场中有单声道会场 或双声道会场, 则根据调整后的方位信息将所述单声道会场或双声道会场的 音频信号生成为多声道音频信号后进行混音处理, 如果所述最大 N方会场中 有多声道会场, 则根据调整后的方位对所述多声道会场的音频信号进行调整 后参加混音处理, 并将混音信号发送给所述多声道会场发送终端;
其中, 将单声道会场的音频信号生成为双声道音频信号的实现方式可参 照上述针对双声道会场接收终端中的实现方式, 此处不再赘述。
将双声道会场的音频信号生成为多声道音频信号的实现方式可以包括但 不仅限于是:
方式一: 将该双声道会场的音频信号生成为单声道音频信号, 然后根据 调整后的该双声道会场的方位信息, 将上述单声道音频信号生成为多声道音 频信号;
方式二: 根据调整后的该双声道会场的方位信息, 通过能量分配生成多 声道音频信号。
根据调整后的方位对所述多声道会场的音频信号进行调整后参加混音处 理的实现方式可以但不仅限于:
方式一: 将该多声道会场的音频信号生成为单声道音频信号, 然后根据 调整后的该多声道会场的方位信息, 将上述单声道音频信号生成为多声道音 频信号, 将处理后得到的多声道音频信号参加混音处理;
方式二: 根据调整后的该多声道会场的方位信息, 通过能量重新分配得 到多声道音频信号, 将处理后得到的多声道音频信号参加混音处理。
通过上述混音处理过程, 使得最大 N方会场中各个会场之间的音频信号 方位不重叠, 从而提高了语音清晰度, 并提高了听众的临场体验感觉。
实施例二
实施例二为当参加混音的最大 N方会场中存在与视频画面中的方位不一 致的会场时的混音处理过程, 其混音处理过程如图 6 所示, 具体实现方式包 括如下操作: 的方位一致, 如果是, 则执行 S602, 如果不是, 则执行 S603;
5602、 MCU对来自最大 N方会场的音频信号进行混音处理, 混音处理的 具体实现方式可通过现有的混音方式实现, 这里不再详述;
5603、 MCU根据检测到的方位不一致的会场在视频画面中的位置, 对该 会场的方位进行调整, 具体调整方式包括但不仅限于:
1 )将该会场的方位调整为其在视频画面中显示的方位, 例如, 会场 1的 实际方位为右, 但会场 1在多画面中显示的方位为中, 则调整会场 1 的方位 为中; 或者
2 )结合该会场的实际方位和其在视频画面中的方位进行方位调整,例如, 会场 1 的实际方位为右, 但会场 1在多画面中显示的方位为左边, 则调整会 场 1的方位为左边偏右;
针对不同类型终端的方位进行调整, 类似于实施例 1 的方式, 对此, 不 再进行赘述。
5604、 根据调整后的方位信息进行混音处理, 具体混音处理方式参照上 述本发明实施例一中针对不同声道类型的接收终端的混音实现方式。
本发明实施例二中, 通过对方位与视频画面中的方位不一致的会场进行 方位调整, 使得视频通信系统的使用者收听到的最大 N方会场的方位信息与 视频画面中最大 N方会场的分布相符, 从而提高了听众的临场体验感觉。 实施例三
实施例三为当接收端的会场为最大 N 方会场指定方位时的混音处理过 程, 其混音处理过程如图 7所示, 具体实现方式包括如下操作:
S701、 MCU接收会场 n发来的方位指定信息, 该方位指定信息用来指示 MCU对最大 N方会场中的会场 a进行方位调整, 作为举例而非限定, 该方位 指定信息可以通过信令方式发送;
5702、 MCU将会场 a的方位调整为上述方位指定信息中指定的方位。 其 中, 方位指定信息中还可以携带指定生效信息, 该指定生效信息用来指示仅 在发送给会场 n的混音处理时, 为会场 a调整方位信息; 或者在发送给若干 或全部会场的混音处理时, 为会场 a调整方位信息。 作为举例而非限定, 该 生效信息可以包括一个或若干个会场标识, 当该生效信息中包含一个会场标 识 "n" 时, MCU仅在发送给会场 n的混音处理过程中, 按照上述方位指定 信息中指定的方位为会场 a进行方位调整, 当该生效信息中包含若干个会场 标识(例如 "n"、 "b"、 "c" ) 时, 则 MCU在发送给这若干个会场 (会场 n、 会场 b和会场 c )的混音处理过程中, 按照上述方位指定信息中指定的方位为 会场 a进行方位调整。 如果有多个会场为会场 a指定方位, 则 MCU可根据接 收到不同方位指定信息的先后顺序轮流实现对会场 a 的方位调整, 或者按照 申请令牌方式对该会场 a 的方位进行调整, 也可以按照其他设定的规则控制 各个会场调整会场 a方位的权限。
5703、 MCU根据调整后的方位信息进行混音处理, 具体混音处理方式参 照上述本发明实施例一中针对不同声道类型的接收终端的混音实现方式。
在本发明实施例三中,如果会场 n为会场 a指定方位,且会场 a的方位与 会场 a在视频画面中的位置不一致, 作为举例而非限定, 可优先按照会场 n 的指定方位信息对会场 a进行方位调整。
本发明实施例三中, MCU根据会场发来的方位指定信息对指定的最大 N 方会场进行方位调整, 可以实现用户根据自己的需要对指定的会场进行方位 调整, 提高了听众的临场体验满意度。 针对本发明的方法实施例, 本发明还提供了一种音频信号的混音处理装 置实施例, 当参加混音的终端的音频信号的方位发生重叠时, 该装置能够及 时对参加混音的发送终端的音频信号的方位信息进行调整, 从而使听众能够 清楚地收听到会场发送的音频信号的方位信息, 提高听众的临场体验感觉, 其结构如图 8所示, 具体实现结构包括:
方位调整模块 801 , 用于确定需要进行音频信号方位调整的终端,对所述 终端的音频信号方位信息进行调整;
混音处理模块 802 ,用于将方位调整之后的音频信号与其他待混音信号进 行混音处理。
本发明实施例提供的装置, 调整了发生音频信号方位重叠的终端的音频 方位信息, 使得各个发送终端的方位尽量分开, 各个发送终端的声音方位更 清楚, 从而提高了用户的临场体验感觉。
上述本发明实施例中, 需要对参加混音的终端的音频信号进行方位调整 的情况, 不仅限于当终端的音频信号存在方位重叠时, 在视频通信系统中, 当某一终端进入混音系统时, 或者当视频画面排列变化等情况发生时, 如果 参加混音的终端的方位与该终端在视频画面中的方位不一致时, 也需要进行 方位调整。
相应的, 上述方位调整模块 801还包括目标终端确定子模块 8011 , 用于 在如下情况下, 确定需要进行音频信号方位调整的终端: 当所述终端与其他 终端的音频信号的方位发生重叠时; 或当所述终端的音频信号的方位与多画 面中的该终端的视频画面的位置不相匹配时; 或当所述终端第一次参加混音 时。
当所述终端的音频信号的方位与多画面中的该终端的视频画面的位置不 相匹配而需要进行所述终端的方位调整时, 方位调整模块 801 具体用于将该 终端的方位调整为其在视频画面中显示的方位; 或者如果所述终端为双声道 或多声道终端时, 结合该终端的实际方位和其在视频画面中的方位进行方位 调整。
作为举例而非限定, 如图 2所示, 来自会场 E的音频信号的实际方位为 右, 但会场 E在多画面中显示的位置为左边, 则调整来自会场 E的音频信号 的方位为左边偏右; 或者, 如图 3所示, 会场 F的实际方位为右边, 但会场 F 对应的显示区域(显示器 1 )在网真画面的左边, 则调整会场 F的方位为左边 偏右。
在会议系统中, 本发明实施例提供的装置还可以根据与会终端的指定方 位信息对方位待调整终端的方位进行调整。在这种情况下, 方位调整模块 801 具体用于根据与会终端发出的方位指定信息, 调整所述方位待调整终端的方 位, 其中, 所述的方位指定信息为所述与会终端为所述方位待调整终端指定 的方位。 可选的, 方位指定信息中还可以携带指定生效信息, 该指定生效信 息用来指示仅在发送给该与会终端的音频进行混音处理时, 为该方位待调整 终端调整方位信息; 或者在发送给若干或全部与会终端的混音处理时, 为该 方位待调整终端调整方位信息。
可选的, 如果存在多个与会终端为同一参加混音的终端指定不同的方位 时, 则方位调整模块 801 根据接收到不同方位指定信息的先后顺序轮流实现 对该终端的方位调整, 或按照申请令牌的方式对所述方位待调整终端的方位 进行调整, 也可以按照其他设定的规则控制终端调整该发送终端方位的权限。
当根据与会终端的指定方位信息对方位待调整终端的方位进行调整时, 方位调整模块 801 具体对方位待调整终端的方位按照接收到的方位指定信息 的指示, 在该终端原方位的同侧进行调整。 如图 4 所示的方位示意图为例, 同侧调整是指, 参加混音的双声道发送终端 B的原方位在左, 则将其方位调 整为偏左或中间; 参加混音的单声道发送终端 C的原指定方位为偏右, 则将 其方位调整为右。
本发明实施例中, 当存在多个方位待调整的终端时, 方位调整模块 801 可以根据预先设置的优先级依次对该多个方位待调整终端的音频信号方位信 息进行调整。 本发明实施例提供一种优选的优先级, 该优先级包括: 在存在 单声道、 双声道以及多声道终端混合混音时, 参加混音的单声道终端具有第 一调整优先级; 第一次参加混音的终端具有第二调整优先级; 在存在单声道、 双声道以及多声道终端混合混音时, 参加混音的双声道终端和多声道终端具 有第三优先级。 举例说明, 参加混音的多声道终端 、 双声道终端 B和单声 道终端 C均为方位待调整终端, 其中双声道终端 B初次参加混音, 则首先对 单声道终端 C的音频信号方位进行调整, 其次对双声道终端 B的音频信号方 位进行调整, 最后对多声道终端 A的音频信号方位进行调整。
上述对终端进行音频信号方位信息进行调整的装置, 是多媒体服务器, 或者具有方位信息调节功能的其他设备。 在视讯会议领域, 该多媒体服务器 即是 MCU ( Multipoint Control Unit, 多点控制单元), 也可以是具有 MCU功 能模块的终端, 即: Mini MCU, 上述主要基于视讯会议系统组网架构的不同 来确定。
以上所述, 仅为本发明较佳的具体实施方式, 但本发明的保护范围并不 局限于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可 轻易想到的变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明 的保护范围应该以权利要求的保护范围为准。

Claims

权 利 要求 书
1、 一种音频信号的混音处理方法, 其特征在于, 包括:
确定需要进行音频信号方位调整的终端, 对所述终端的音频信号方位信 息进行调整;
将方位调整之后的音频信号与其他待混音信号进行混音处理。
2、 根据权利要求 1所述的方法, 其特征在于, 所述确定需要进行音频信 号方位调整的终端, 包括:
当所述终端与其他终端的音频信号的方位发生重叠时; 或
当所述终端的音频信号的方位与多画面中的该终端的视频画面的位置不 相匹配时; 或
当所述终端第一次参加混音时;
确定所述终端为需要进行音频信号方位调整的终端。
3、 根据权利要求 1所述的方法, 其特征在于, 当存在多个方位待调整的 终端时, 所述确定需要进行音频信号方位调整的终端, 包括: 根据预先设置 的优先级依次对所述多个方位待调整的终端的音频信号方位信息进行调整。
4、 根据权利要求 3所述的方法, 其特征在于, 所述预先设置的优先级包 括:
在存在单声道、 双声道以及多声道终端混合混音时, 参加混音的单声道 终端具有第一调整优先级;
第一次参加混音的终端具有第二调整优先级;
在存在单声道、 双声道以及多声道终端混合混音时, 参加混音的双声道 终端和多声道终端具有第三优先级。
5、 根据权利要求 1所述的方法, 其特征在于, 对所述终端的音频信号方 位信息进行调整包括:
根据与会终端发出的方位指定信息, 调整所述方位待调整终端的方位, 其中, 所述的方位指定信息为所述与会终端为所述方位待调整终端指定的方 位;
当存在多个与会终端对所述终端发出多次方位指定信息时, 则根据接收 到不同方位指定信息的先后顺序, 或按照申请令牌的方式对所述方位待调整 终端的方位进行调整。
6、 根据权利要求 1或 5所述的方法, 其特征在于, 对所述终端的音频信 号方位信息进行调整, 包括: 对方位待调整终端的方位按照接收到的方位指 定信息的指示, 在该终端原方位的同侧进行调整。
7、 根据权利要求 1所述的方法, 其特征在于, 当所述终端的音频信号的 方位与多画面中的该终端的视频画面的位置不相匹配而需要进行所述终端的 方位调整时, 对所述终端的音频信号方位信息进行调整包括:
将该终端的方位调整为其在视频画面中显示的方位; 或者
结合该终端的实际方位和其在视频画面中的方位进行方位调整。
8、 一种音频信号的混音处理装置, 其特征在于, 包括:
方位调整模块, 用于确定需要进行音频信号方位调整的终端, 对所述终 端的音频信号方位信息进行调整;
混音处理模块, 用于将方位调整之后的音频信号与其他待混音信号进行 混音处理。
9、 根据权利要求 8所述的装置, 其特征在于, 所述方位调整模块包括目 标终端确定子模块, 用于当所述终端与其他终端的音频信号的方位发生重叠 时; 或当所述终端的音频信号的方位与多画面中的该终端的视频画面的位置 不相匹配时; 或当所述终端第一次参加混音时, 确定所述终端为需要进行音 频信号方位调整的终端。
10、 根据权利要求 8 所述的装置, 其特征在于, 当存在多个方位待调整 的终端时, 所述方位调整模块具体用于根据预先设置的优先级依次对所述多 个方位待调整的终端的音频信号方位信息进行调整。
11、 根据权利要求 10所述的装置, 其特征在于, 所述预先设置的优先级 包括:
在存在单声道、 双声道以及多声道终端混合混音时, 参加混音的单声道 终端具有第一调整优先级;
第一次参加混音的终端具有第二调整优先级; 在存在单声道、 双声道以及多声道终端混合混音时, 参加混音的双声道 终端和多声道终端具有第三优先级。
12、 根据权利要求 8 所述的装置, 其特征在于, 所述方位调整模块具体 用于根据与会终端发出的方位指定信息, 调整所述方位待调整终端的方位, 其中, 所述的方位指定信息为所述与会终端为所述方位待调整终端指定的方 位;
当存在多个与会终端对所述终端发出多次方位指定信息时, 所述方位调 整模块根据接收到不同方位指定信息的先后顺序, 或按照申请令牌的方式对 所述方位待调整终端的方位进行调整。
13、 根据权利要求 8或 12所述的装置, 其特征在于, 所述方位调整模块 具体用于对方位待调整终端的方位按照接收到的方位指定信息的指示, 在该 终端原方位的同侧进行调整。
14、 根据权利要求 8 所述的装置, 其特征在于, 当所述终端的音频信号 的方位与多画面中的该终端的视频画面的位置不相匹配而需要进行所述终端 的方位调整时, 所述方位调整模块具体用于将该终端的方位调整为其在视频 画面中显示的方位; 或者结合该终端的实际方位和其在视频画面中的方位进 行方位调整。
PCT/CN2011/074820 2010-06-07 2011-05-28 一种音频信号的混音处理方法及装置 WO2011153905A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP11791896.1A EP2568702B1 (en) 2010-06-07 2011-05-28 Method and device for audio signal mixing processing
US13/707,332 US20130094672A1 (en) 2010-06-07 2012-12-06 Audio mixing processing method and apparatus for audio signals

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010199195.9 2010-06-07
CN2010101991959A CN102270456B (zh) 2010-06-07 2010-06-07 一种音频信号的混音处理方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/707,332 Continuation US20130094672A1 (en) 2010-06-07 2012-12-06 Audio mixing processing method and apparatus for audio signals

Publications (1)

Publication Number Publication Date
WO2011153905A1 true WO2011153905A1 (zh) 2011-12-15

Family

ID=45052733

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/074820 WO2011153905A1 (zh) 2010-06-07 2011-05-28 一种音频信号的混音处理方法及装置

Country Status (4)

Country Link
US (1) US20130094672A1 (zh)
EP (1) EP2568702B1 (zh)
CN (1) CN102270456B (zh)
WO (1) WO2011153905A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104350768A (zh) * 2012-03-27 2015-02-11 无线电广播技术研究所有限公司 用于混合至少两个音频信号的布置
CN112218167A (zh) * 2019-07-10 2021-01-12 腾讯科技(深圳)有限公司 多媒体信息播放方法、服务器、终端及存储介质

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103680550B (zh) * 2012-09-17 2017-02-08 扬智科技股份有限公司 多视窗架构下的音频播放方法、音频播放装置与系统
CN103024339B (zh) * 2012-10-11 2015-09-30 华为技术有限公司 一种基于视频源实现混音的方法和装置
CN102968995B (zh) * 2012-11-16 2018-10-02 新奥特(北京)视频技术有限公司 一种音频信号的混音方法及装置
WO2014088328A1 (ko) 2012-12-04 2014-06-12 삼성전자 주식회사 오디오 제공 장치 및 오디오 제공 방법
CN104519454B (zh) * 2013-09-26 2016-12-07 华为技术有限公司 一种音频信号输出方法及装置
CN104036789B (zh) * 2014-01-03 2018-02-02 北京智谷睿拓技术服务有限公司 多媒体处理方法及多媒体装置
CN104869523B (zh) * 2014-02-26 2018-03-16 北京三星通信技术研究有限公司 虚拟多声道播放音频文件的方法、终端及系统
CN104616665B (zh) * 2015-01-30 2018-04-24 深圳市云之讯网络技术有限公司 基于语音类似度的混音方法
CN106341563A (zh) * 2015-07-06 2017-01-18 北京视联动力国际信息技术有限公司 一种基于终端通信的回声抑制方法和装置
US9986357B2 (en) 2016-09-28 2018-05-29 Nokia Technologies Oy Fitting background ambiance to sound objects
EP3312718A1 (en) 2016-10-20 2018-04-25 Nokia Technologies OY Changing spatial audio fields
EP3313101B1 (en) 2016-10-21 2020-07-22 Nokia Technologies Oy Distributed spatial audio mixing
US10560661B2 (en) 2017-03-16 2020-02-11 Dolby Laboratories Licensing Corporation Detecting and mitigating audio-visual incongruence
CN107195308B (zh) * 2017-04-14 2021-03-16 苏州科达科技股份有限公司 音视频会议系统的混音方法、装置及系统
EP4005248A1 (en) * 2019-07-30 2022-06-01 Dolby Laboratories Licensing Corporation Managing playback of multiple streams of audio over multiple speakers
CN111770301B (zh) * 2020-07-16 2021-12-10 北京百家视联科技有限公司 一种视频会议数据的处理方法及装置
CN112135226B (zh) * 2020-08-11 2022-06-10 广东声音科技有限公司 Y轴音频再生方法以及y轴音频再生系统
CN114173011B (zh) * 2021-11-29 2024-03-19 河北远东通信系统工程有限公司 一种面向协同指挥媒体引擎的混音控制方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09271006A (ja) * 1996-04-01 1997-10-14 Ricoh Co Ltd 多地点テレビ会議装置
JP2006180251A (ja) * 2004-12-22 2006-07-06 Yamaha Corp 複数話者による同時発声を可能とする音声信号処理装置およびプログラム
CN1929593A (zh) * 2005-09-07 2007-03-14 宝利通公司 多点视频会议中的空间相关音频
US20090015661A1 (en) * 2007-07-13 2009-01-15 King Keith C Virtual Multiway Scaler Compensation
CN101510988A (zh) * 2009-02-19 2009-08-19 深圳华为通信技术有限公司 一种语音信号的处理、播放方法和装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1393847A (zh) * 2001-06-27 2003-01-29 景秉仁 混音定位的控制方法及装置
CN100397855C (zh) * 2003-04-30 2008-06-25 华为技术有限公司 一种分布式混音处理方法
US20060064300A1 (en) * 2004-09-09 2006-03-23 Holladay Aaron M Audio mixing method and computer software product
CN1719512B (zh) * 2005-07-15 2010-09-29 北京中星微电子有限公司 数字音频混响模拟系统以及数字音频混响模拟方法
US20080298610A1 (en) * 2007-05-30 2008-12-04 Nokia Corporation Parameter Space Re-Panning for Spatial Audio
EP2009891B1 (fr) * 2007-06-26 2019-01-16 Orange Transmission de signal audio dans un système de conférence audio immersive
EP2009892B1 (fr) * 2007-06-29 2019-03-06 Orange Positionnement de locuteurs en conférence audio 3D
US8073125B2 (en) * 2007-09-25 2011-12-06 Microsoft Corporation Spatial audio conferencing
CN101478619B (zh) * 2009-01-05 2011-11-23 腾讯科技(深圳)有限公司 实现多路语音混音的方法、系统及节点设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09271006A (ja) * 1996-04-01 1997-10-14 Ricoh Co Ltd 多地点テレビ会議装置
JP2006180251A (ja) * 2004-12-22 2006-07-06 Yamaha Corp 複数話者による同時発声を可能とする音声信号処理装置およびプログラム
CN1929593A (zh) * 2005-09-07 2007-03-14 宝利通公司 多点视频会议中的空间相关音频
US20090015661A1 (en) * 2007-07-13 2009-01-15 King Keith C Virtual Multiway Scaler Compensation
CN101510988A (zh) * 2009-02-19 2009-08-19 深圳华为通信技术有限公司 一种语音信号的处理、播放方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2568702A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104350768A (zh) * 2012-03-27 2015-02-11 无线电广播技术研究所有限公司 用于混合至少两个音频信号的布置
CN112218167A (zh) * 2019-07-10 2021-01-12 腾讯科技(深圳)有限公司 多媒体信息播放方法、服务器、终端及存储介质
CN112218167B (zh) * 2019-07-10 2022-04-15 腾讯科技(深圳)有限公司 多媒体信息播放方法、服务器、终端及存储介质

Also Published As

Publication number Publication date
EP2568702A1 (en) 2013-03-13
EP2568702B1 (en) 2014-05-21
CN102270456B (zh) 2012-11-21
CN102270456A (zh) 2011-12-07
US20130094672A1 (en) 2013-04-18
EP2568702A4 (en) 2013-05-15

Similar Documents

Publication Publication Date Title
WO2011153905A1 (zh) 一种音频信号的混音处理方法及装置
US20230216965A1 (en) Audio Conferencing Using a Distributed Array of Smartphones
US6850496B1 (en) Virtual conference room for voice conferencing
US20050280701A1 (en) Method and system for associating positional audio to positional video
JP2975687B2 (ja) 第1局・第2局間に音声信号とビデオ信号とを送信する方法、局、テレビ会議システム、第1局・第2局間に音声信号を伝送する方法
EP1763241B1 (en) Spatially correlated audio in multipoint videoconferencing
US9113034B2 (en) Method and apparatus for processing audio in video communication
EP2158752B1 (en) Methods and arrangements for group sound telecommunication
JP5198567B2 (ja) ビデオ通信方法、システムおよび装置
US6118876A (en) Surround sound speaker system for improved spatial effects
WO2011127816A1 (zh) 一种音频信号的混音处理方法、装置及系统
WO2010109918A1 (ja) 復号化装置、符号化復号化装置および復号化方法
US20030044002A1 (en) Three dimensional audio telephony
WO2008113269A1 (fr) Procédé et dispositif pour réaliser une conversation privée dans une session multipoint
US7720212B1 (en) Spatial audio conferencing system
WO2011057511A1 (zh) 实现混音的方法、装置和系统
US20090094375A1 (en) Method And System For Presenting An Event Using An Electronic Device
WO2010094219A1 (zh) 一种语音信号的处理、播放方法和装置
JP2645731B2 (ja) 音像定位再生方式
WO2011120407A1 (zh) 视频通信的实现方法及装置
US7068792B1 (en) Enhanced spatial mixing to enable three-dimensional audio deployment
JP2001036881A (ja) 音声伝送システム及び音声再生装置
EP2160005A1 (en) Decentralised spatialized audio processing
WO2017211447A1 (en) Method for reproducing sound signals at a first location for a first participant within a conference with at least two further participants at at least one further location
JP2005110103A (ja) テレビ会議における音声の定位方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11791896

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2011791896

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE