WO2012079459A1 - Method and apparatus for audio mixing of multiple microphones - Google Patents

Method and apparatus for audio mixing of multiple microphones Download PDF

Info

Publication number
WO2012079459A1
WO2012079459A1 PCT/CN2011/083165 CN2011083165W WO2012079459A1 WO 2012079459 A1 WO2012079459 A1 WO 2012079459A1 CN 2011083165 W CN2011083165 W CN 2011083165W WO 2012079459 A1 WO2012079459 A1 WO 2012079459A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice input
input channels
channels
input channel
signal
Prior art date
Application number
PCT/CN2011/083165
Other languages
French (fr)
Chinese (zh)
Inventor
彭远疆
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2012079459A1 publication Critical patent/WO2012079459A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/568Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants

Definitions

  • the present invention relates to the field of audio information processing, and in particular, to a multi-microphone mixing method and apparatus. Background technique
  • FIG. 1 shows a schematic diagram of a distributed pickup system.
  • Figure 1 depicts a typical conference room layout in a video conferencing system.
  • each participant uses a separate microphone as the pickup device.
  • Distributed Pickup In order to prevent crosstalk from the speech signals collected by adjacent microphones, each microphone is required to be close to one or several speakers, and the spacing between the microphones is generally greater than the distance between the microphones and the corresponding speakers.
  • the array microphone is also used in the video conferencing system for centralized pickup. As shown in FIG. 2, which is a schematic diagram of a centralized pickup mode, FIG. 2 depicts a centralized sound collection scheme using an array microphone in a video conference system, in which all participants collectively use an array microphone as a sound pickup device. The array microphone is used to assemble a plurality of sound pickup units into a whole device in a certain layout.
  • the array microphones are mostly in the shape of a disk or a polygon, and each of the sound pickup units is generally placed on the outer edge of the device and pointed outward.
  • the spacing between adjacent sound pickup units in an array microphone is typically much smaller than the distance from the array microphone device to the speaker.
  • Figure 3 shows a centralized pickup using multiple array microphones. Schematic diagram. Figure 3 depicts the use of multiple array microphone pickups in a larger room, each array microphone being responsible for the pickup of a single area.
  • Direct mixing method Simply add and mix the input signals of each channel and output them to a single channel. It is determined that the background noise becomes larger after mixing, the signal-to-noise ratio (SNR) is significantly reduced, and the reverberation seriously leads to speech. Vague, poor sound quality.
  • SNR signal-to-noise ratio
  • First microphone priority mixing method Count the signal strength of each input channel to find out the signal intensity. The largest voice channel directly serves as the output channel. This method does not reduce the signal-to-noise ratio, but the disadvantage is that when two or more people in different positions speak at the same time, there is a clear channel switching feeling, and the volume of the voice and background noise will change significantly.
  • Dynamic Weighted Mixing Method The signal strength of each voice channel is counted and sorted according to size. Only the channels with the highest signal strength are weighted and mixed, and other channels do not participate in the mix. This method can alleviate the channel switching when the speakers at different positions speak at the same time, but the disadvantage is that since only the intensity information of the signal is utilized, a single person can also open two or more channels physically adjacent to each other, resulting in a letter. The noise ratio is reduced, the reverberation is aggravated and the speech is vague.
  • the above mixing method is based on the signal strength to judge the channel gating. In many application scenarios, the performance is low, and it is prone to misjudgment:
  • the mixing signal In stereo/multi-channel systems, in addition to considering the energy mixing of different channels, the mixing signal still needs to maintain the orientation (position) information of the original sound source.
  • the microphones in different positions often correspond to different positions of the sound source. Incorrect gating can result in a sudden change in the position of the sound image, causing greater interference to the far-end listener.
  • the invention provides a multi-microphone mixing method and device, which can reduce the false positive rate of the input channel strobe and improve the audio quality after mixing.
  • a multi-microphone mixing method including:
  • the voice input channels are at least two, determining a signal similarity between the signals of the voice input channels, thereby controlling the gating of the voice input channel, and weighting the signals of the gated voice input channels Output.
  • the method further includes: when there is only one voice input channel, directly controlling the voice input channel strobe.
  • the method for controlling the gating of the voice input channel according to each signal similarity is: For any two voice input channels, if the signal similarity of the two voice input channels is small When the first threshold is equal to the preset, the two input channels are controlled to be strobed.
  • the process of controlling the gating of the two voice input channels according to the signal strength of the two voice input channels and the delay of the two signals corresponding to the signal similarity includes:
  • the signal similarity of the two voice input channels is determined, and the relative delays of the two signals corresponding to the maximum value of the similarity function are taken, according to which the strobe One or two of two voice input channels;
  • Controlling one of the two voice input channels when the signal strength difference between the two voice input channels is equal to the set value; or determining the signal similarity of the two voice input channels, and taking the maximum value of the similarity function The relative delay of the corresponding two signals, according to which one or two of the two voice input channels are gated;
  • the process of strobing the voice input channel according to the relative delay includes: if one of the two signals has a relative delay greater than a set duration, controlling one of the two voice input channels; if the relative delay of the two signals If it is less than the set duration, then both voice input channels are controlled to be strobed; if the relative delay of the two signals is equal to the set duration, one of the two voice input channels is controlled to be strobed or strobed.
  • the method for controlling one of the two voice input channels is:
  • the process of determining signal similarity between signals of each voice input channel includes: performing band pass filtering preprocessing on signals of each voice input channel; In all channels after preprocessing, the signal similarity is determined for each of the two signals using a normalized cross-correlation function or an average amplitude difference function.
  • a multi-microphone mixing device comprising:
  • a statistical module configured to count signal strength of each input channel in the current time period, and thereby selecting at least two input channels with high signal strength for voice detection;
  • a similarity determination module configured to determine, by the statistical module, a voice input channel as a voice input channel, and determine a signal similarity between signals of each voice input channel when the voice input channel is at least two ;
  • a gating module configured to control gating of the voice input channel according to each signal similarity determined by the similarity determining module
  • a mixing module configured to perform a weighted mixing output on a signal input by the channel input channel of the gating module.
  • the strobe module is further configured to directly control the voice input channel strobe when there is only one voice input channel.
  • the strobe module is configured to: if any two voice input channels have signal similarity equal to or less than a preset first threshold, control the two input channels to be strobed.
  • the multi-microphone mixing method considereds the signal strength of each input channel and the signal similarity between the channels when the strobing discrimination is performed on the input channel, so that the probability of channel mis-singing is greatly reduced. Small, which greatly improves the voice quality after mixing.
  • Figure 1 is a schematic diagram of a distributed sound pickup mode
  • Figure 2 is a schematic diagram of a centralized pickup method
  • Figure 3 is a schematic diagram of a centralized pickup method using a plurality of array microphones
  • FIG. 4 is a schematic diagram of a distributed sound collection method containing a reflector
  • FIG. 5 is a flowchart of a multi-microphone mixing method according to an embodiment of the present invention
  • FIG. 6 is a flowchart of a multi-microphone mixing method according to Embodiment 1 of the present invention.
  • FIG. 7 is a flowchart of a multi-microphone mixing method according to Embodiment 2 of the present invention.
  • FIG. 8 is a structural diagram of a multi-microphone mixing device according to an embodiment of the present invention. detailed description
  • An embodiment of the present invention provides a multi-microphone mixing method, as shown in FIG. 5, including:
  • the input channel with the highest signal strength is selected for voice detection, at least two.
  • step S503 Determine the detected input channel with voice as a voice input channel, and detect the number of voice input channels. If the number of voice input channels is at least two, perform step S503, if the number of voice input channels If it is one, step S505 is performed, if the number of voice input channels is 0, step S506 is performed;
  • the number of voice input channels is at least two, determining a signal similarity between signals of each voice input channel;
  • Step 2 when the signal similarity of the two voice input channels is equal to the first threshold, Step 2).
  • step 1) is further performed on the basis of 1) to ensure the accuracy of the mixing.
  • step 1) may be omitted, and only step 2) is performed.
  • the signal similarity is the maximum value of the similarity function of the two signals (the maximum value of the normalized cross-correlation function value or the minimum value of the average amplitude difference function), and the delay of the two signals corresponding to the signal similarity That is, the relative delay of the two signals corresponding to the most value of the similarity function.
  • controlling the strobes of the two voice input channels according to the signal strength of the two voice input channels and the delay of the two signals corresponding to the signal similarity specifically:
  • the delay of the two signals corresponding to the signal similarity of the two voice input channels determines the delay of the two signals corresponding to the signal similarity of the two voice input channels, if the delay of the two signals is greater than the set duration, then controlling One of the two voice input channels is strobed. If the delay of the two signals is less than or equal to the set duration, then both voice input channels are controlled to be strobed.
  • the A, B, and C voice input channels when the similarity between A and B is less than the first threshold, the similarity of A and C is less than the first threshold, and the similarity of ⁇ C is greater than the first threshold, according to A, B Similarity, A, C similarity, control A, B, C are strobed, and then control one of B and C according to B, C similarity, therefore, control A, (or person, B strobe.
  • the method for determining the similarity in step S503 is specifically: Performing band pass filtering preprocessing on the signals of each voice input channel;
  • the signal similarity is determined by using a normalized cross-correlation function for every two signals after preprocessing.
  • the signal similarity is the maximum value of the normalized cross-correlation function value.
  • the signal similarity is determined by using the average amplitude difference function for every two signals after preprocessing.
  • the signal similarity is the minimum value of the average amplitude difference function, and the signal similarity is greater than a certain first threshold, that is, the minimum value of the average amplitude difference function is smaller than the set second threshold. .
  • the voice input channel is directly controlled to be gated and output.
  • the strobe of the input channel is performed by using the last strobe.
  • the strobe discrimination of the channel is not re-executed, and the strobe of the current input channel is directly used for the strobe of the input channel, and output.
  • the signal strength of each input channel and the signal similarity between the channels are considered, so that the probability of channel mis-singing is greatly reduced, thereby The amplitude improves the quality of the sound after mixing.
  • a flowchart of a multi-microphone mixing method according to Embodiment 1 of the present invention includes:
  • both input channels A and B have voice, that is, both human and B are voice input channels, and the signals of channel A and channel B are respectively preprocessed by a bandpass filter of 80 Hz to 800 Hz, and are preprocessed.
  • the two signals calculate their normalized cross-correlation function (NCCF) and determine the maximum value of the normalized cross-correlation function (NCCF) value ⁇ ) and determine this time (ie the maximum value of the normalized cross-correlation function value) Corresponding signal delay between A and B
  • NCCF NCCF
  • Step S607 determines the difference between the signal strengths of the two channels, and determine whether the signal strength difference between the ⁇ and ⁇ channels is less than or equal to the set value. If yes, execute Step S607, if no, step S609;
  • step S609 may also be performed.
  • the signal strength value of either one can be determined by various methods.
  • the difference between the signal strengths of the A and B channels is smaller than the set value, indicating that the signal strengths of the two are not much different.
  • step S609 can also be performed.
  • control channel person, B are strobed
  • step S606 can also be performed.
  • Control one of the A and B channels and preferably, control the channel gating of the A and B channels with higher signal strength.
  • step S609 may be directly performed to control one of the strobes and strobes, or the mixing may be completed.
  • the judgment of the signal strength difference value in step S606 and the judgment of the signal delay in S607, and the execution of S608, make the signal judgment more precise, and further improve the quality of the multi-microphone mix.
  • FIG. 7 is a flowchart of a multi-microphone mixing method according to Embodiment 2 of the present invention.
  • the signals of the channel A and the channel B are respectively preprocessed by a bandpass filter of 80 Hz to 800 Hz, and the average amplitude difference function is calculated for the two signals after the preprocessing ( AMDF), and determine the minimum value W of the average amplitude difference function (AMDF) value, and determine the signal delay between A and B corresponding to this time (ie, the minimum value of the average amplitude difference function value);
  • AMDF preprocessing
  • AMDF average amplitude difference function
  • B channel strobe When the minimum value of the average amplitude difference function of the A and B channels is less than the set threshold, it can be considered that only one speaker in the local is speaking, and then the signal strength difference value and the delay controller according to the two channels A and B are continued. , B channel strobe.
  • Channel A, B, output A*0.5+B*0.5;
  • step S706 when the minimum value of ⁇ ) is less than the set threshold, step S709 may be directly performed to control one of the A and B channels, or the mixing may be completed.
  • the judgment of the signal strength difference value in step S706 and the judgment of the signal delay in S707, and the execution of S708, make the signal judgment more accurate, and further improve the quality of the multi-microphone mix.
  • the present invention does not limit the specific method for evaluating the signal similarity between different channels and the maximum number of channels allowed to be simultaneously opened, nor does it limit the evaluation of the mixing weight between different channels.
  • the specific method for judging the signal similarity between different channels is to use the NCCF function, allowing the maximum number of channels to be simultaneously opened to be 2, and the mixing weight between channels is fixed to (0.5, 0.5) in the mono system.
  • the mixing weights of different channels are related to the spatial position of their corresponding microphones, and will not be analyzed in detail here.
  • the embodiment of the present invention further provides a multi-microphone mixing device, as shown in FIG. 8, comprising: a statistical module 81, configured to count the signal strength of each input channel in the current time period, and select at least two input channels with the largest signal strength. Perform voice detection;
  • the similarity determination module 82 is configured to determine the detected voice input channel as a voice input channel, and determine a signal similarity between signals of each voice input channel when the voice input channel is at least two;
  • the gating module 83 is configured to control the gating of the voice input channel according to each signal similarity; the mixing module 84 is configured to perform weighted mixing output on the signal of the gated voice input channel. Preferably, the gating module 83 is further configured to directly control when there is only one voice input channel The voice input channel is strobed.
  • the gating module 83 is specifically configured to control any two speech input channels. If the signal similarity of the two speech input channels is less than or equal to the first threshold, the two input channels are controlled to be strobed.
  • the operation of the signal strength difference value of the two voice input channels can be flexible, for example: when the signal strength difference values of the two voice input channels are greater than the set value, the two voice input channels are controlled. a strobe; when the signal strength difference value of the two voice input channels is less than the set value, determining the signal similarity of the two voice input channels, and taking the relative delay of the two signals corresponding to the maximum value of the similarity function, Subsequent operations related to the relative delay are performed.
  • the subsequent specific operation may be the same as the subsequent operation when the signal strength difference value is greater than the set value (ie, controlling the two voice inputs A strobe in the channel may also be the same as the subsequent operation when the signal strength difference value is less than the set value (ie, determining the signal similarity of the two voice input channels, and taking the two values corresponding to the maximum value of the similarity function)
  • the relative delay of the signals, followed by subsequent operations involving the relative delays may be the same as the subsequent operation when the signal strength difference value is greater than the set value (ie, controlling the two voice inputs A strobe in the channel may also be the same as the subsequent operation when the signal strength difference value is less than the set value (ie, determining the signal similarity of the two voice input channels, and taking the two values corresponding to the maximum value of the similarity function)
  • the related operations related to the relative delay described above may also be flexible, such as: if the relative delay of the two signals is greater than the set duration, then one of the two voice input channels is controlled; If the relative delay of the two signals is less than the set duration, then the two voice input channels are controlled to strobe.
  • the subsequent specific operation may be the same as the subsequent operation when the relative delay of the two signals is greater than the set duration (ie, controlling two voice input channels)
  • a strobe can also be the same as the subsequent operation when the relative delay of the two signals is less than the set duration (ie, control both voice input channels are strobed).
  • the invention relates to the field of audio information processing, and discloses a multi-microphone mixing method and device, which can count the signal strength of each input channel in the current time period, and thereby select at least two input channels with large signal strength for voice detection, and Determining the detected input channel with voice as a voice input channel; when the voice input channel is at least two, determining a signal similarity between signals of each voice input channel, thereby controlling the gating of the voice input channel And the signal of the strobed voice input channel is weighted and mixed.
  • the method and device of the invention can reduce the false positive rate of the input channel strobe and improve the audio quality after mixing.

Landscapes

  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The present invention relates to the field of audio information processing. Disclosed are a method and apparatus for the audio mixing of multiple microphones. The method comprises: making a statistic analysis of the signal strength of each input channel during the current period, choosing on the basis of this at least two input channels with higher signal strength for audio testing, identifying the tested input channel with audio as the audio input channel; when there are at least two audio input channels, identifying the signal similarity between the signals of each audio input channel, then controlling the strobe of the audio input channel on the basis of this, and making the signal of the strobe audio input channel for weighted audio mixing output. The method and apparatus in the present invention are capable of reducing the erroneous judgment rate of input channel strobe and improving the audio quality after audio mixing.

Description

一种多话筒混音方法及装置 技术领域  Multi-microphone mixing method and device
本发明涉及音频信息处理领域, 尤其涉及一种多话筒混音方法及装置。 背景技术  The present invention relates to the field of audio information processing, and in particular, to a multi-microphone mixing method and apparatus. Background technique
在视频会议系统中, 需要使用话筒来采集本地发言人的声音, 该声音 经过音频编码后传输到远端, 在远端系统中解码后经过功率放大器输出到 音箱中回放出来。 为了减小房间混响和背景噪音的影响, 在视频会议系统 中一般采用定向话筒来采集声音(即拾音)。 由于定向话筒在正对着话筒的 方向上拾音效果最好, 为了保证不同方位发言人讲话时都有良好的拾音效 果, 一般需要多只定向话筒来采集不同方位发言人的语音, 这种拾音方式 称为分布式拾音。 如图 1所示为分布式拾音方式示意图, 图 1描述了视频 会议系统中一个典型的会议室布局, 每个与会者使用一个单独的话筒作为 拾音设备。 分布式拾音为了防止相邻话筒采集到的语音信号出现串扰, 要 求每个话筒都靠近一个或几个讲话人, 且话筒之间的间距一般比话筒到对 应讲话人间的距离要大。 有时为了降低话筒总数量, 视频会议系统中也会 使用阵列麦克风来进行集中式拾音。 如图 2所示, 为集中式拾音方式示意 图, 图 2描述了视频会议系统中使用了阵列麦克风的集中式拾音方案, 图 中所有与会者共同使用一个阵列麦克风作为拾音设备。 阵列麦克风把多个 拾音单元按一定的布局装配于一个整体设备之中, 阵列麦克风外形多为圆 盘或多边形形状, 每个拾音单元一般布处于设备外边沿且指向外方向。 阵 列麦克风中相邻拾音单元之间的间距一般远小于阵列麦克风设备到讲话人 的距离。 当单个阵列麦克风无法有效覆盖整个房间时, 可以使用多个阵列 麦克风来分区域拾音。 如图 3 所示为使用多个阵列麦克风的集中式拾音方 式示意图。 图 3描述了在一个较大的房间中, 使用了多个阵列麦克风拾音, 每个阵列麦克风负责一片区域的拾音。 In the video conferencing system, a microphone is needed to collect the voice of the local speaker. The sound is audio-encoded and transmitted to the far end. After decoding in the remote system, the sound is output to the speaker for playback. In order to reduce the effects of room reverberation and background noise, directional microphones are generally used in video conferencing systems to collect sound (ie, pickup). Since the directional microphone has the best sound pickup effect in the direction facing the microphone, in order to ensure good sound pickup effect when the speakers speak in different directions, it is generally required to use multiple directional microphones to collect the voices of speakers of different orientations. The pickup method is called distributed pickup. Figure 1 shows a schematic diagram of a distributed pickup system. Figure 1 depicts a typical conference room layout in a video conferencing system. Each participant uses a separate microphone as the pickup device. Distributed Pickup In order to prevent crosstalk from the speech signals collected by adjacent microphones, each microphone is required to be close to one or several speakers, and the spacing between the microphones is generally greater than the distance between the microphones and the corresponding speakers. Sometimes in order to reduce the total number of microphones, the array microphone is also used in the video conferencing system for centralized pickup. As shown in FIG. 2, which is a schematic diagram of a centralized pickup mode, FIG. 2 depicts a centralized sound collection scheme using an array microphone in a video conference system, in which all participants collectively use an array microphone as a sound pickup device. The array microphone is used to assemble a plurality of sound pickup units into a whole device in a certain layout. The array microphones are mostly in the shape of a disk or a polygon, and each of the sound pickup units is generally placed on the outer edge of the device and pointed outward. The spacing between adjacent sound pickup units in an array microphone is typically much smaller than the distance from the array microphone device to the speaker. When a single array microphone cannot effectively cover the entire room, multiple array microphones can be used to sub-regionally pick up the sound. Figure 3 shows a centralized pickup using multiple array microphones. Schematic diagram. Figure 3 depicts the use of multiple array microphone pickups in a larger room, each array microphone being responsible for the pickup of a single area.
考虑到编解码复杂度、 传输带宽、 系统兼容性等因素, 需要把多个话 筒(拾音单元)采集到的多通道信号混合成单通道或者双通道立体声信号, 然后再做单通道 /立体声编码和传输。 评价多话筒混音技术指标主要是混音 后输出语音的信噪比、 音质以及语音的平稳性, 对于立体声系统, 声像方 位(相位)信息的保真度也是一个重要的衡量指标。  Considering the complexity of codec, transmission bandwidth, system compatibility, etc., it is necessary to mix multi-channel signals collected by multiple microphones (sound pickup units) into single-channel or dual-channel stereo signals, and then do single-channel/stereo coding. And transmission. The evaluation of multi-microphone mixing technology is mainly the signal-to-noise ratio, sound quality and sound stability of the output voice after mixing. For stereo systems, the fidelity of the image (phase) information is also an important measure.
传统的视频会议系统多使用简单的基于信号强度(短时能量或者信号 幅度) 的混音方法, 把多个话筒采集到的语音信号混合后输出, 典型的混 音方法有:  Traditional video conferencing systems use a simple mixing method based on signal strength (short-time energy or signal amplitude) to mix and output the speech signals collected by multiple microphones. Typical mixing methods are:
1、 直接混音法: 即简单地把各个通道输入信号相加混音后输出到单通 道, 确定是混音后背景噪声变大, 信噪比 (SNR ) 明显降低, 而且混响严 重导致语音含糊、 音质差。  1. Direct mixing method: Simply add and mix the input signals of each channel and output them to a single channel. It is determined that the background noise becomes larger after mixing, the signal-to-noise ratio (SNR) is significantly reduced, and the reverberation seriously leads to speech. Vague, poor sound quality.
2、 第一话筒优先混音法: 统计各输入通道的信号强度, 找出信号强度 最大的有声通道直接作为输出通道。 这种方法不会降低信噪比, 但其缺点 是当有两个或两个以上不同位置的人同时说话时会有明显的通道切换感, 语音和背景噪声的音量大小会有明显变化。  2. First microphone priority mixing method: Count the signal strength of each input channel to find out the signal intensity. The largest voice channel directly serves as the output channel. This method does not reduce the signal-to-noise ratio, but the disadvantage is that when two or more people in different positions speak at the same time, there is a clear channel switching feeling, and the volume of the voice and background noise will change significantly.
3、 动态加权混音法: 统计各有声通道的信号强度并按照大小排序, 只 将信号强度最大的几个通道进行加权混音, 其它通道不参与混音。 这种方 法可以减轻不同位置讲话人同时讲话时的通道切换, 但其缺点是由于只利 用了信号的强度信息, 单个人讲话时也会打开物理上相邻的两个或多个通 道, 导致信噪比降低, 混响加重且语音含糊。  3. Dynamic Weighted Mixing Method: The signal strength of each voice channel is counted and sorted according to size. Only the channels with the highest signal strength are weighted and mixed, and other channels do not participate in the mix. This method can alleviate the channel switching when the speakers at different positions speak at the same time, but the disadvantage is that since only the intensity information of the signal is utilized, a single person can also open two or more channels physically adjacent to each other, resulting in a letter. The noise ratio is reduced, the reverberation is aggravated and the speech is vague.
上述混音方法, 完全基于信号强度来判断通道选通, 在很多应用场景 中性能较低, 容易出现误判:  The above mixing method is based on the signal strength to judge the channel gating. In many application scenarios, the performance is low, and it is prone to misjudgment:
1 )在典型的阵列麦克风的应用中, 如图 2, 当距离阵列话筒较远位置 的某一发言人讲话时, 阵列话筒设备中的每个话筒采集到的信号强度差异 4艮小, 导致混音时容易出现误判。 1) In a typical array microphone application, as shown in Figure 2, when remote from the array microphone When a spokesperson spoke, the difference in signal intensity collected by each microphone in the array microphone device was 4艮, which led to misjudgment when mixing.
2 ) 即使在分^丈置话筒的应用中, 由于桌面、 白板、 墙面等的反射作 用, 如图 4所示, 为含有反射物的分布式拾音方式示意图, 基于信号强度 的判别方法也容易出现误判, 导致反射声 /混响声较大的通道被错误选通, 严重影响了混音后的语音音质。  2) Even in the application of the microphone, due to the reflection of the desktop, whiteboard, wall surface, etc., as shown in Fig. 4, the schematic diagram of the distributed sound collection method containing the reflector, the method based on the signal strength It is prone to misjudgment, which causes the channel with loud reflection/reverberation to be erroneously gated, which seriously affects the voice quality after mixing.
在立体声 /多通道系统中, 混音时除了考虑不同通道的能量混合, 还要 求混音后信号仍能保持原始声源的方位(位置)信息, 不同位置的话筒往 往对应着音源的不同位置, 错误的选通会导致声像位置的突变, 从而对远 端听者造成更大的干扰。 发明内容  In stereo/multi-channel systems, in addition to considering the energy mixing of different channels, the mixing signal still needs to maintain the orientation (position) information of the original sound source. The microphones in different positions often correspond to different positions of the sound source. Incorrect gating can result in a sudden change in the position of the sound image, causing greater interference to the far-end listener. Summary of the invention
本发明提供一种多话筒混音方法及装置, 能够降低输入通道选通的误 判率, 提高混音后的音频质量。  The invention provides a multi-microphone mixing method and device, which can reduce the false positive rate of the input channel strobe and improve the audio quality after mixing.
为了达到上述目的, 本发明的技术方案是这样实现的:  In order to achieve the above object, the technical solution of the present invention is achieved as follows:
一种多话筒混音方法, 包括:  A multi-microphone mixing method, including:
统计当前时段各输入通道的信号强度, 据此选出至少两个信号强度大 的输入通道以进行语音检测, 并将检测出的有语音的输入通道确定为语音 输入通道;  Counting the signal strength of each input channel in the current time period, and selecting at least two input channels with high signal strength to perform voice detection, and determining the detected input channel with voice as a voice input channel;
当所述语音输入通道为至少两个时, 确定各语音输入通道的信号之间 的信号相似度, 据此控制语音输入通道的选通, 并将选通的语音输入通道 的信号进行加权混音输出。  When the voice input channels are at least two, determining a signal similarity between the signals of the voice input channels, thereby controlling the gating of the voice input channel, and weighting the signals of the gated voice input channels Output.
其中, 还包括: 当语音输入通道只有一个时, 直接控制该语音输入通 道选通。  The method further includes: when there is only one voice input channel, directly controlling the voice input channel strobe.
其中, 根据各个信号相似度控制语音输入通道的选通的方法为: 对于任意两个语音输入通道, 若两个语音输入通道的信号相似度都小 于等于预设的第一阈值时, 控制该两个输入通道都选通。 The method for controlling the gating of the voice input channel according to each signal similarity is: For any two voice input channels, if the signal similarity of the two voice input channels is small When the first threshold is equal to the preset, the two input channels are controlled to be strobed.
其中, 还包括:  Among them, it also includes:
若存在两个语音输入通道并且其信号相似度大于和 /或等于预设的第一 阈值时, 根据该两个语音输入通道的信号强度大小以及信号相似度所对应 的两个信号的延时, 控制该两个语音输入通道的选通。  If there are two voice input channels and the signal similarity is greater than and/or equal to the preset first threshold, according to the signal strength of the two voice input channels and the delay of the two signals corresponding to the signal similarity, Controls the gating of the two speech input channels.
其中, 根据该两个语音输入通道的信号强度大小以及信号相似度所对 应的两个信号的延时, 控制该两个语音输入通道的选通的过程包括:  The process of controlling the gating of the two voice input channels according to the signal strength of the two voice input channels and the delay of the two signals corresponding to the signal similarity includes:
当两个语音输入通道的信号强度差异值大于设定值时, 控制该两个语 音输入通道中的一个选通;  Controlling one of the two voice input channels when the signal strength difference value of the two voice input channels is greater than a set value;
当两个语音输入通道的信号强度差异值小于设定值时, 确定两个语音 输入通道的信号相似度, 并取相似度函数最值所对应的两个信号的相对延 时, 据此选通两个语音输入通道中的一个或两个;  When the signal strength difference value of the two voice input channels is less than the set value, the signal similarity of the two voice input channels is determined, and the relative delays of the two signals corresponding to the maximum value of the similarity function are taken, according to which the strobe One or two of two voice input channels;
当两个语音输入通道的信号强度差异值等于设定值时, 控制该两个语 音输入通道中的一个选通; 或者, 确定两个语音输入通道的信号相似度, 并取相似度函数最值所对应的两个信号的相对延时, 据此选通两个语音输 入通道中的一个或两个;  Controlling one of the two voice input channels when the signal strength difference between the two voice input channels is equal to the set value; or determining the signal similarity of the two voice input channels, and taking the maximum value of the similarity function The relative delay of the corresponding two signals, according to which one or two of the two voice input channels are gated;
根据所述相对延时选通所述语音输入通道的过程包括: 若两个信号的 相对延时大于设定时长, 则控制两个语音输入通道中的一个选通; 若两信 号的相对延时小于设定时长, 则控制两个语音输入通道都选通; 若两个信 号的相对延时等于设定时长, 则控制两个语音输入通道中的一个选通或都 选通。  The process of strobing the voice input channel according to the relative delay includes: if one of the two signals has a relative delay greater than a set duration, controlling one of the two voice input channels; if the relative delay of the two signals If it is less than the set duration, then both voice input channels are controlled to be strobed; if the relative delay of the two signals is equal to the set duration, one of the two voice input channels is controlled to be strobed or strobed.
其中, 所述控制两个语音输入通道中的一个选通的方法为:  The method for controlling one of the two voice input channels is:
控制两个语音输入通道中信号强度较大的语音输入通道选通。  Controls the voice input channel strobe of the two voice input channels with high signal strength.
其中, 所述确定各语音输入通道的信号之间的信号相似度的过程包括: 将各语音输入通道的信号进行带通滤波预处理; 在预处理后的所有通道中, 针对每两个信号利用归一化互相关函数或 者平均幅度差异函数确定信号相似度。 The process of determining signal similarity between signals of each voice input channel includes: performing band pass filtering preprocessing on signals of each voice input channel; In all channels after preprocessing, the signal similarity is determined for each of the two signals using a normalized cross-correlation function or an average amplitude difference function.
一种多话筒混音装置, 包括:  A multi-microphone mixing device comprising:
统计模块, 用于统计当前时段各输入通道的信号强度, 据此选出至少 两个信号强度大的输入通道以进行语音检测;  a statistical module, configured to count signal strength of each input channel in the current time period, and thereby selecting at least two input channels with high signal strength for voice detection;
相似度确定模块, 用于将所述统计模块检测出的有语音的输入通道确 定为语音输入通道, 并在语音输入通道为至少两个时, 确定各语音输入通 道的信号之间的信号相似度;  a similarity determination module, configured to determine, by the statistical module, a voice input channel as a voice input channel, and determine a signal similarity between signals of each voice input channel when the voice input channel is at least two ;
选通模块, 用于根据所述相似度确定模块已确定的各个信号相似度控 制语音输入通道的选通;  a gating module, configured to control gating of the voice input channel according to each signal similarity determined by the similarity determining module;
混音模块, 用于将被所述选通模块选通的语音输入通道的信号进行加 权混音输出。  And a mixing module, configured to perform a weighted mixing output on a signal input by the channel input channel of the gating module.
其中, 所述选通模块, 还用于当语音输入通道只有一个时, 直接控制 该语音输入通道选通。  The strobe module is further configured to directly control the voice input channel strobe when there is only one voice input channel.
其中, 所述选通模块, 用于: 对于任意两个语音输入通道, 若两个语 音输入通道的信号相似度小于等于预设的第一阈值时, 控制该两个输入通 道都选通。  The strobe module is configured to: if any two voice input channels have signal similarity equal to or less than a preset first threshold, control the two input channels to be strobed.
本发明实施例提供的多话筒混音方法, 在对输入通道进行选通判别时, 同时考虑了各输入通道的信号强度大小和通道间的信号相似度, 使得出现 通道误选通的几率大大减小, 从而大幅度提高了混音后的语音质量。 附图说明  The multi-microphone mixing method provided by the embodiment of the present invention considers the signal strength of each input channel and the signal similarity between the channels when the strobing discrimination is performed on the input channel, so that the probability of channel mis-singing is greatly reduced. Small, which greatly improves the voice quality after mixing. DRAWINGS
图 1为分布式拾音方式示意图;  Figure 1 is a schematic diagram of a distributed sound pickup mode;
图 2为集中式拾音方式示意图;  Figure 2 is a schematic diagram of a centralized pickup method;
图 3为使用多个阵列麦克风的集中式拾音方式示意图;  Figure 3 is a schematic diagram of a centralized pickup method using a plurality of array microphones;
图 4为含有反射物的分布式拾音方式示意图; 图 5为本发明实施例提供的多话筒混音方法流程图; Figure 4 is a schematic diagram of a distributed sound collection method containing a reflector; FIG. 5 is a flowchart of a multi-microphone mixing method according to an embodiment of the present invention;
图 6为本发明实施例一提供的多话筒混音方法流程图;  6 is a flowchart of a multi-microphone mixing method according to Embodiment 1 of the present invention;
图 7为本发明实施例二提供的多话筒混音方法流程图;  7 is a flowchart of a multi-microphone mixing method according to Embodiment 2 of the present invention;
图 8为本发明实施例提供的多话筒混音装置的结构图。 具体实施方式  FIG. 8 is a structural diagram of a multi-microphone mixing device according to an embodiment of the present invention. detailed description
本发明实施例提供一种多话筒混音方法, 如图 5所示, 包括:  An embodiment of the present invention provides a multi-microphone mixing method, as shown in FIG. 5, including:
5501、 统计当前时段的各输入通道的信号强度, 并选出信号强度最大 的至少两个输入通道进行语音检测;  5501. Count the signal strength of each input channel in the current time period, and select at least two input channels with the highest signal strength to perform voice detection;
在该步驟中, 选出信号强度最大的输入通道进行语音检测, 最少为 2 个, 当选取的输入通道过多时, 在后续的混音计算过程中, 会比较复杂, 因此, 一般选取 2~4个。  In this step, the input channel with the highest signal strength is selected for voice detection, at least two. When there are too many input channels selected, it will be more complicated in the subsequent mixing calculation process. Therefore, generally select 2~4 One.
5502、 将检测出的有语音的输入通道确定为语音输入通道, 并检测语 音输入通道的个数,若语音输入通道的个数为至少两个时,执行步驟 S503 , 若语音输入通道的个数为一个时, 执行步驟 S505 , 若语音输入通道的个数 为 0个时, 执行步驟 S506;  5502. Determine the detected input channel with voice as a voice input channel, and detect the number of voice input channels. If the number of voice input channels is at least two, perform step S503, if the number of voice input channels If it is one, step S505 is performed, if the number of voice input channels is 0, step S506 is performed;
5503、 当语音输入通道的个数为至少两个, 确定各语音输入通道的信 号之间的信号相似度;  5503. When the number of voice input channels is at least two, determining a signal similarity between signals of each voice input channel;
当语音输入通道只有两个时, 信号相似度只有一个, 当语音输入通道 多于两个时, 每两个语音输入通道之间具有信号相似度。  When there are only two voice input channels, there is only one signal similarity. When there are more than two voice input channels, there is signal similarity between every two voice input channels.
S504、 根据各个信号相似度控制语音输入通道的选通, 并将选通的语 音输入通道的信号进行加权混音输出。  S504. Control a strobe of the voice input channel according to each signal similarity, and perform weighted mixing output on the signal of the strobed voice input channel.
具体为:  Specifically:
1 )若两个语音输入通道的信号相似度小于等于第一阈值时, 控制该两 个输入通道都预选通;  1) if the signal similarity of the two voice input channels is less than or equal to the first threshold, controlling the two input channels to be pre-strobed;
其中, 当两个语音输入通道的信号相似度等于第一阈值时, 也可以执 行步驟 2 )。 Wherein, when the signal similarity of the two voice input channels is equal to the first threshold, Step 2).
若任意的两个语音输入通道的信号相似度都小于等于第一阈值时, 则 所有的通道都预选通, 可直接将预选通的通道选通。  If the signal similarity of any two voice input channels is less than or equal to the first threshold, all channels are pre-strobed, and the pre-strobe channel can be directly gated.
若存在两个语音输入通道的信号相似度大于第一阈值, 则在 1 )的基础 上, 进一步执行步驟 2 ), 可保证混音的精确性。 当然, 若任意两个信号的 相似度都大于第一阈值, 也可以不执行步驟 1 ), 只执行步驟 2 )。  If the signal similarity of the two voice input channels is greater than the first threshold, step 1) is further performed on the basis of 1) to ensure the accuracy of the mixing. Of course, if the similarity of any two signals is greater than the first threshold, step 1) may be omitted, and only step 2) is performed.
2 )若两个语音输入通道的信号相似度大于等于第一阈值时, 根据该两 个语音输入通道的信号强度大小以及信号相似度对应的两个信号的延时, 控制该两个语音输入通道的选通; 信号相似度即两个信号的相似性函数的 最值(归一化互相关函数值的最大值或者平均幅度差异函数的最小值), 信 号相似度对应的两个信号的时延即相似度函数的最值对应的两个信号的相 对延时。  2) if the signal similarity of the two voice input channels is greater than or equal to the first threshold, controlling the two voice input channels according to the signal strength of the two voice input channels and the delay of the two signals corresponding to the signal similarity The signal similarity is the maximum value of the similarity function of the two signals (the maximum value of the normalized cross-correlation function value or the minimum value of the average amplitude difference function), and the delay of the two signals corresponding to the signal similarity That is, the relative delay of the two signals corresponding to the most value of the similarity function.
其中: 根据该两个语音输入通道的信号强度大小以及信号相似度对应 的两个信号的延时, 控制该两个语音输入通道的选通, 具体为:  Wherein: controlling the strobes of the two voice input channels according to the signal strength of the two voice input channels and the delay of the two signals corresponding to the signal similarity, specifically:
当两个语音输入通道的信号强度差异值大于等于设定值时, 控制该两 个输入通道中的一个选通; "等于" 时也可以执行下述步驟。  When the signal strength difference value of the two voice input channels is greater than or equal to the set value, one of the two input channels is controlled; when "equal", the following steps can also be performed.
当两个语音输入通道的信号强度差异值小于设定值时, 确定两个语音 输入通道的信号相似度对应的两个信号的延时, 若两个信号的延时大于设 定时长, 则控制两个语音输入通道中的一个选通, 若两个信号的延时小于 等于设定时长, 则控制两个语音输入通道都选通。  When the signal strength difference value of the two voice input channels is less than the set value, determining the delay of the two signals corresponding to the signal similarity of the two voice input channels, if the delay of the two signals is greater than the set duration, then controlling One of the two voice input channels is strobed. If the delay of the two signals is less than or equal to the set duration, then both voice input channels are controlled to be strobed.
上述步驟中: 例如 A、 B、 C语音输入通道, 当 A与 B相似度小于第 一阈值, A、 C相似度小于第一阈值, 且^ C相似度大于第一阈值, 则根 据 A、 B相似度、 A、 C相似度, 控制 A、 B、 C均选通, 再根据 B、 C相 似度控制 B、 C中一个选通, 因此, 控制 A、 ( 或人、 B选通。  In the above steps: for example, the A, B, and C voice input channels, when the similarity between A and B is less than the first threshold, the similarity of A and C is less than the first threshold, and the similarity of ^ C is greater than the first threshold, according to A, B Similarity, A, C similarity, control A, B, C are strobed, and then control one of B and C according to B, C similarity, therefore, control A, (or person, B strobe.
其中, 步驟 S503中确定相似度的方法具体为: 将各语音输入通道的信号进行带通滤波预处理; The method for determining the similarity in step S503 is specifically: Performing band pass filtering preprocessing on the signals of each voice input channel;
对预处理后的每两个信号利用归一化互相关函数确定其信号相似度。 当利用归一化函数确定信号相似度时, 信号相似度即是归一化互相关 函数值的最大值。  The signal similarity is determined by using a normalized cross-correlation function for every two signals after preprocessing. When the signal similarity is determined using the normalization function, the signal similarity is the maximum value of the normalized cross-correlation function value.
或者使用平均幅度差异函数来确定相似度, 具体为:  Or use the average amplitude difference function to determine the similarity, specifically:
将各语音输入通道的信号进行带通滤波预处理;  Performing band pass filtering preprocessing on the signals of each voice input channel;
对预处理后的每两个信号利用平均幅度差异函数确定其信号相似度。 当采用平均幅度差异函数确定信号相似度时, 信号相似度即是平均幅 度差异函数的最小值, 信号相似度大于一定的第一阈值, 即平均幅度差异 函数的最小值小于设定的第二阈值。  The signal similarity is determined by using the average amplitude difference function for every two signals after preprocessing. When the signal amplitude similarity is determined by the average amplitude difference function, the signal similarity is the minimum value of the average amplitude difference function, and the signal similarity is greater than a certain first threshold, that is, the minimum value of the average amplitude difference function is smaller than the set second threshold. .
5505、 当语音输入通道的个数只有一个时, 直接控制该语音输入通道 选通, 并输出。  5505. When there is only one voice input channel, the voice input channel is directly controlled to be gated and output.
5506、 当语音输入通道的个数为 0时, 利用上一次的选通进行输入通 道的选通。  5506. When the number of voice input channels is 0, the strobe of the input channel is performed by using the last strobe.
当语音输入通道的个数为 0 时, 即本次不重新进行通道的选通判别, 直接采用上一次的选通结果进行本次输入通道的选通, 并输出。  When the number of voice input channels is 0, this time, the strobe discrimination of the channel is not re-executed, and the strobe of the current input channel is directly used for the strobe of the input channel, and output.
采用本发明实施例的方法, 在对输入通道进行选通判别时, 同时考虑 了各输入通道的信号强度大小和通道间的信号相似度, 使得出现通道误选 通的几率大大减小, 从而大幅度提高了混音后的语音质量。  According to the method of the embodiment of the present invention, when the input channel is strobed, the signal strength of each input channel and the signal similarity between the channels are considered, so that the probability of channel mis-singing is greatly reduced, thereby The amplitude improves the quality of the sound after mixing.
下面结合附图, 详细说明本发明实施例的方法。  The method of the embodiment of the present invention will be described in detail below with reference to the accompanying drawings.
实施例一  Embodiment 1
如图 6所示, 为本发明实施例一提供的多话筒混音方法流程图, 具体 包括:  As shown in FIG. 6, a flowchart of a multi-microphone mixing method according to Embodiment 1 of the present invention includes:
S601、 统计当前时段的各输入通道的信号强度, 并选出信号强度最大 的两个输入通道 A、 B进行语音检测; 5602、 当输入通道 A、 B没有语音时, 直接采用上一次的判别结果;S601. Count the signal strength of each input channel in the current time period, and select two input channels A and B with the largest signal strength to perform voice detection; 5602. When there is no voice in the input channels A and B, the previous determination result is directly used;
5603、 当输入通道 A有语音时, B没有语音时, 即 A为语音输入通道, 直接控制输入通道 A选通; 5603. When input channel A has voice, and B has no voice, that is, A is a voice input channel, and directly controls input channel A to strobe;
5604、 当输入通道 A、 B都有语音时, 即人、 B都为语音输入通道, 将 通道 A和通道 B的信号分别通过一个 80Hz~800Hz的带通滤波预处理, 并 对预处理后的两个信号计算其归一化互相关函数(NCCF ), 并确定归一化 互相关函数(NCCF )值的最大值 ^) , 并确定此时(即归一化互相关函数 值的最大值)对应的 A、 B间的信号时延  5604. When both input channels A and B have voice, that is, both human and B are voice input channels, and the signals of channel A and channel B are respectively preprocessed by a bandpass filter of 80 Hz to 800 Hz, and are preprocessed. The two signals calculate their normalized cross-correlation function (NCCF) and determine the maximum value of the normalized cross-correlation function (NCCF) value ^) and determine this time (ie the maximum value of the normalized cross-correlation function value) Corresponding signal delay between A and B
NCCF的定义和计算方法是本领域公知的, 在此不再赘述。  The definition and calculation method of NCCF are well known in the art and will not be described herein.
对每一个延时 ,确定 NCCF值 找出 NCCF值的最大值并确定该 最大值对应的时延;  For each delay, determine the NCCF value to find the maximum value of the NCCF value and determine the delay corresponding to the maximum value;
5605、 判断 最大值是否小于等于设定的门限值 VI , 如果是, 执行 步驟 S608, 如果否, 执行步驟 S606;  5605, determining whether the maximum value is less than or equal to the set threshold value VI, if yes, executing step S608, if no, executing step S606;
5606、 当 最大值大于设定的门限值 VI时, 再确定 Α、 Β两个通道 的信号强度的差异,判断 Α、Β通道的信号强度差异值是否小于等于设定值, 如果是, 执行步驟 S607, 如果否, 执行步驟 S609;  5606. When the maximum value is greater than the set threshold value VI, determine the difference between the signal strengths of the two channels, and determine whether the signal strength difference between the Α and Β channels is less than or equal to the set value. If yes, execute Step S607, if no, step S609;
当 Α、 Β通道的信号归一化互相关函数最大值大于设定的门限值时,可 以认为本地只有一个发言人在讲话,再继续根据 Α、 Β两个通道的信号强度 差异值以及时延控制 Α、 Β通道的选通。  When the maximum value of the signal normalized cross-correlation function of the Α and Β channels is greater than the set threshold, it can be considered that only one speaker in the local language is speaking, and then the signal strength difference value of the two channels according to Α and 继续 is continued. The control gates of the Α and Β channels are extended.
当然, 在此步中, 当差异值等于设定值时, 也可以执行步驟 S609。 判断 A、 B通道信号强度差异值, 可以直接采用 A的信号强度 -B的信 号强度, 或者采用两者的信号强度的比值(信号强度小 /信号强度大), 或者 采用两者的差值 /两者中任意一个的信号强度值, 当然, 还可以采用各种方 法确定 A、 B通道的信号强度差异值, 差异值小于设定值, 说明两者信号强 度相差不大。 5607、 确定最大值对应的时延 f是否小于等于设定时长, 如果是, 执行 步驟 S608, 如果否, 执行步驟 S609; Of course, in this step, when the difference value is equal to the set value, step S609 may also be performed. To judge the signal strength difference between A and B channels, you can directly use the signal strength of A's signal strength-B, or use the ratio of the signal strength of the two (signal strength is small / signal strength is large), or the difference between the two / The signal strength value of either one can be determined by various methods. The difference between the signal strengths of the A and B channels is smaller than the set value, indicating that the signal strengths of the two are not much different. 5607, determining whether the time delay f corresponding to the maximum value is less than or equal to the set time length, if yes, executing step S608, if no, executing step S609;
当然, 时延等于设定时长时, 也可以执行步驟 S609。  Of course, when the delay is equal to the set duration, step S609 can also be performed.
5608、 控制通道人、 B都选通;  5608, the control channel person, B are strobed;
当 最大值小于等于设定的门限值 VI时, 控制通道人、 Β都选通; 当最大值小于等于设定的门限值 VI时, 认为通道人、 Β对应的话筒前有不 同人在同时说话, 所以通道人、 Β均应该打开, 输出 =Α*0.5+Β*0.5;  When the maximum value is less than or equal to the set threshold value VI, the control channel person and Β are all strobed; when the maximum value is less than or equal to the set threshold value VI, it is considered that there are different people in front of the corresponding channel of the channel person and Β Speak at the same time, so the channel person and Β should be open, output = Α * 0.5 + Β * 0.5;
当然, 当最大值等于设定的门限值 VI时, 也可以执行步驟 S606。 当 最大值大于等于设定的门限值 VI时, 说明 Α、 Β话筒前有一个 讲话人在讲话, 当 Α、 Β通道的信号强度差异值很小, 并且 NCCF值最大 值对应的信号时延很小时, 可以认为讲话人到两个通道对应的话筒距离都 很接近, 可以同时打开通道 Α、 Β, 输出 =Α*0.5+Β*0.5;  Of course, when the maximum value is equal to the set threshold value VI, step S606 can also be performed. When the maximum value is greater than or equal to the set threshold value VI, it indicates that there is a speaker in front of the Α and Β microphones, when the signal strength difference value of the Α and Β channels is small, and the signal delay corresponding to the maximum value of the NCCF value Very small, it can be considered that the distance between the speakers and the microphones corresponding to the two channels is very close, and the channels Α, Β can be opened at the same time, and the output = Α * 0.5 + Β * 0.5;
5609、 控制 A、 B通道中的一个选通;  5609. Control one of the A and B channels;
控制 A、 B通道中的一个选通, 较佳地, 控制 A、 B通道中信号强度较 大的通道选通。  Control one of the A and B channels, and preferably, control the channel gating of the A and B channels with higher signal strength.
其中, 在步驟 S606中, 当 最大值大于等于设定的门限值 VI时, 可以直接执行步驟 S609, 控制 Α、 Β通道中的一个选通, 也可以完成混音。 当然步驟 S606中信号强度差异值的判断以及 S607中信号时延的判断, 以 及 S608的执行,使得信号判断更为精准,进一步提高了多话筒混音的质量。  In step S606, when the maximum value is greater than or equal to the set threshold value VI, step S609 may be directly performed to control one of the strobes and strobes, or the mixing may be completed. Of course, the judgment of the signal strength difference value in step S606 and the judgment of the signal delay in S607, and the execution of S608, make the signal judgment more precise, and further improve the quality of the multi-microphone mix.
实施例二  Embodiment 2
如图 7所示, 为本发明实施例二提供的多话筒混音方法流程图。  FIG. 7 is a flowchart of a multi-microphone mixing method according to Embodiment 2 of the present invention.
5701、 统计当前时段的各输入通道的信号强度, 并选出信号强度最大 的两个输入通道 A、 B进行语音检测;  5701. Count the signal strength of each input channel in the current time period, and select two input channels A and B with the largest signal strength to perform voice detection;
5702、 当输入通道 A、 B没有语音时, 直接采用上一次的判别结果; 5702. When there is no voice in the input channels A and B, the previous discrimination result is directly used;
5703、 当输入通道 A有语音时, B没有语音时, 直接控制输入通道 A 选通; 5703. When input channel A has voice, and B has no voice, directly control input channel A. Gating
S704、 当输入通道 A、 B都有语音时, 将通道 A和通道 B的信号分别 通过一个 80Hz~800Hz的带通滤波预处理,并对预处理后的两个信号计算其 平均幅度差异函数(AMDF ), 并确定平均幅度差异函数(AMDF )值的最 小值 W , 并确定此时(即平均幅度差异函数值的最小值 )对应的 A、 B间 的信号时延 ;  S704. When the input channels A and B have voice, the signals of the channel A and the channel B are respectively preprocessed by a bandpass filter of 80 Hz to 800 Hz, and the average amplitude difference function is calculated for the two signals after the preprocessing ( AMDF), and determine the minimum value W of the average amplitude difference function (AMDF) value, and determine the signal delay between A and B corresponding to this time (ie, the minimum value of the average amplitude difference function value);
AMDF的定义和计算方法是本领域公知的, 在此不再赘述。  The definition and calculation method of AMDF are well known in the art and will not be described herein.
对每一个延时 r , 确定 AMDF值 τ、 , 找出 AMDF值的最小值并确定 该最小值对应的时延; S705、 判断 最小值是否大于等于设定的门限值 , 如果是, 执行 步驟 S708 , 如果否, 执行步驟 S706; For each delay r, determine the AMDF value τ , , find the minimum value of the AMDF value and determine the delay corresponding to the minimum value; S705, determine whether the minimum value is greater than or equal to the set threshold value, and if so, perform steps S708, if no, executing step S706;
5706、 当 最小值小于设定的门限值 时, 再确定 A、 B两个通道的 信号强度的差异, 判断 A、 B通道的信号强度差异值是否小于等于设定值, 如果是, 执行步驟 S707 , 如果否, 执行步驟 S709; 5706. When the minimum value is less than the set threshold, determine the difference between the signal strengths of the two channels A and B, and determine whether the signal strength difference between the A and B channels is less than or equal to the set value. If yes, perform steps S707, if no, step S709 is performed;
当 A、 B通道的平均幅度差异函数最小值小于设定的门限值时,可以认 为本地只有一个发言人在讲话,再继续根据 A、 B两个通道的信号强度差异 值以及时延控制人、 B通道的选通。  When the minimum value of the average amplitude difference function of the A and B channels is less than the set threshold, it can be considered that only one speaker in the local is speaking, and then the signal strength difference value and the delay controller according to the two channels A and B are continued. , B channel strobe.
5707、 确定最小值对应的时延 r是否小于设定时长, 如果是, 执行步驟 S708 , 如果否, 执行步驟 S709;  5707, determining whether the time delay r corresponding to the minimum value is less than the set duration, if yes, executing step S708, if no, executing step S709;
S708、 控制通道人、 B都选通; 当^^ )最小值大于等于设定的门限值 时, 控制通道人、 B都选通; 当 最小值大于等于设定的门限值 时,认为通道人、 B对应的话筒前有不同人 在同时说话, 所以通道人、 B均应该打开, 输出 =A*0.5+B*0.5; 当 最小值小于设定的门限值 时,认为 A、 B话筒前有一个讲话人 在讲话, 当 A、 B通道的信号强度差异值很小, 并且 AMDF值最小值对应 的信号时延很 d、时, 可以认为讲话人到两个通道对应的话筒距离都很接近, 可以同时打开通道 A、 B, 输出 =A*0.5+B*0.5; S708, control channel person, B are strobed; when ^^) minimum value is greater than or equal to the set threshold value, the control channel person and B are strobed; when the minimum value is greater than or equal to the set threshold value, Before the channel person and B correspond to the microphone, there are different people talking at the same time, so the channel person and B should be open, output = A*0.5+B*0.5; when the minimum value is less than the set threshold, A, B is considered. There is a speaker in front of the microphone When speaking, when the signal strength difference between the A and B channels is small, and the signal delay corresponding to the minimum value of the AMDF value is very d, it can be considered that the distance between the speakers and the microphones corresponding to the two channels is very close, and can be simultaneously opened. Channel A, B, output = A*0.5+B*0.5;
S709、 控制 A、 B通道中的一个选通;  S709, controlling one of the A and B channels;
控制 A、 B通道中的一个选通, 较佳地, 控制 A、 B通道中信号强度较 大的通道选通。 其中, 在步驟 S706中, 当^^ )最小值小于设定的门限值 时, 可以直 接执行步驟 S709, 控制 A、 B通道中的一个选通, 也可以完成混音。 当然 步驟 S706中信号强度差异值的判断以及 S707中信号时延的判断,以及 S708 的执行, 使得信号判断更为精准, 进一步提高了多话筒混音的质量。  Control one of the A and B channels, and preferably, control the channel gating of the A and B channels with higher signal strength. In step S706, when the minimum value of ^^) is less than the set threshold, step S709 may be directly performed to control one of the A and B channels, or the mixing may be completed. Of course, the judgment of the signal strength difference value in step S706 and the judgment of the signal delay in S707, and the execution of S708, make the signal judgment more accurate, and further improve the quality of the multi-microphone mix.
需要注意的是, 本发明中并不限定评价不同通道间信号相似度的具体 方法和允许同时打开的最大通道数, 也没有限定评判不同通道间的混音权 重。如在实施例一中,评判不同通道间信号相似度的具体方法是使用 NCCF 函数, 允许同时打开最大通道数是 2,通道间的混音权重在单声道系统中固 定为 (0.5 , 0.5 ), 而在立体声系统中, 不同通道的混音权重和其对应话筒 的空间位置有关, 在此不再详细分析。  It should be noted that the present invention does not limit the specific method for evaluating the signal similarity between different channels and the maximum number of channels allowed to be simultaneously opened, nor does it limit the evaluation of the mixing weight between different channels. As in the first embodiment, the specific method for judging the signal similarity between different channels is to use the NCCF function, allowing the maximum number of channels to be simultaneously opened to be 2, and the mixing weight between channels is fixed to (0.5, 0.5) in the mono system. In stereo systems, the mixing weights of different channels are related to the spatial position of their corresponding microphones, and will not be analyzed in detail here.
本发明实施例还提供一种多话筒混音装置, 如图 8所示, 包括: 统计模块 81 , 用于统计当前时段各输入通道的信号强度, 并选出信号 强度最大的至少两个输入通道进行语音检测;  The embodiment of the present invention further provides a multi-microphone mixing device, as shown in FIG. 8, comprising: a statistical module 81, configured to count the signal strength of each input channel in the current time period, and select at least two input channels with the largest signal strength. Perform voice detection;
相似度确定模块 82, 用于将检测出的有语音的输入通道确定为语音输 入通道, 当语音输入通道为至少两个时, 确定各语音输入通道的信号之间 的信号相似度;  The similarity determination module 82 is configured to determine the detected voice input channel as a voice input channel, and determine a signal similarity between signals of each voice input channel when the voice input channel is at least two;
选通模块 83 , 用于根据各个信号相似度控制语音输入通道的选通; 混音模块 84, 用于将选通的语音输入通道的信号进行加权混音输出。 较佳地, 选通模块 83 , 还用于当语音输入通道只有一个时, 直接控制 该语音输入通道选通。 The gating module 83 is configured to control the gating of the voice input channel according to each signal similarity; the mixing module 84 is configured to perform weighted mixing output on the signal of the gated voice input channel. Preferably, the gating module 83 is further configured to directly control when there is only one voice input channel The voice input channel is strobed.
较佳地, 选通模块 83 , 具体用于对任意两个语音输入通道, 若两个语 音输入通道的信号相似度都小于等于第一阈值时, 控制该两个输入通道都 选通。  Preferably, the gating module 83 is specifically configured to control any two speech input channels. If the signal similarity of the two speech input channels is less than or equal to the first threshold, the two input channels are controlled to be strobed.
需要说明的是, 针对两个语音输入通道的信号强度差异值的操作可以 很灵活, 如: 当两个语音输入通道的信号强度差异值大于设定值时, 控制 该两个语音输入通道中的一个选通; 当两个语音输入通道的信号强度差异 值小于设定值时, 确定两个语音输入通道的信号相似度, 并取相似度函数 最值所对应的两个信号的相对延时, 再进行后续涉及所述相对延时的相关 操作。  It should be noted that the operation of the signal strength difference value of the two voice input channels can be flexible, for example: when the signal strength difference values of the two voice input channels are greater than the set value, the two voice input channels are controlled. a strobe; when the signal strength difference value of the two voice input channels is less than the set value, determining the signal similarity of the two voice input channels, and taking the relative delay of the two signals corresponding to the maximum value of the similarity function, Subsequent operations related to the relative delay are performed.
并且, 当两个语音输入通道的信号强度差异值等于设定值时, 后续的 具体操作既可以与所述信号强度差异值大于设定值时的后续操作相同 (即 控制所述两个语音输入通道中的一个选通), 也可以与所述信号强度差异值 小于设定值时的后续操作相同 (即确定两个语音输入通道的信号相似度, 并取相似度函数最值所对应的两个信号的相对延时, 再进行后续涉及所述 相对延时的相关操作)。  Moreover, when the signal strength difference value of the two voice input channels is equal to the set value, the subsequent specific operation may be the same as the subsequent operation when the signal strength difference value is greater than the set value (ie, controlling the two voice inputs A strobe in the channel may also be the same as the subsequent operation when the signal strength difference value is less than the set value (ie, determining the signal similarity of the two voice input channels, and taking the two values corresponding to the maximum value of the similarity function) The relative delay of the signals, followed by subsequent operations involving the relative delays).
另外, 上述的后续要进行的涉及所述相对延时的相关操作也可以很灵 活, 如: 若两个信号的相对延时大于设定时长, 则控制两个语音输入通道 中的一个选通; 若两个信号的相对延时小于设定时长, 则控制两个语音输 入通道 卩选通。  In addition, the related operations related to the relative delay described above may also be flexible, such as: if the relative delay of the two signals is greater than the set duration, then one of the two voice input channels is controlled; If the relative delay of the two signals is less than the set duration, then the two voice input channels are controlled to strobe.
并且, 当两个信号的相对延时等于设定时长时, 后续的具体操作既可 以与所述两个信号的相对延时大于设定时长时的后续操作相同 (即控制两 个语音输入通道中的一个选通), 也可以与所述两个信号的相对延时小于设 定时长时的后续操作相同 (即控制两个语音输入通道都选通)。  Moreover, when the relative delay of the two signals is equal to the set duration, the subsequent specific operation may be the same as the subsequent operation when the relative delay of the two signals is greater than the set duration (ie, controlling two voice input channels) A strobe can also be the same as the subsequent operation when the relative delay of the two signals is less than the set duration (ie, control both voice input channels are strobed).
以上所述, 仅为本发明的较佳实施例而已, 并非用于限定本发明的保 护范围 工业实用性 The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Industrial scope
本发明涉及音频信息处理领域, 公开了一种多话筒混音方法及装置, 可统计当前时段各输入通道的信号强度, 据此选出至少两个信号强度大的 输入通道以进行语音检测, 并将检测出的有语音的输入通道确定为语音输 入通道; 当所述语音输入通道为至少两个时, 确定各语音输入通道的信号 之间的信号相似度, 据此控制语音输入通道的选通, 并将选通的语音输入 通道的信号进行加权混音输出。 本发明方法及装置, 能够降低输入通道选 通的误判率, 提高混音后的音频质量。  The invention relates to the field of audio information processing, and discloses a multi-microphone mixing method and device, which can count the signal strength of each input channel in the current time period, and thereby select at least two input channels with large signal strength for voice detection, and Determining the detected input channel with voice as a voice input channel; when the voice input channel is at least two, determining a signal similarity between signals of each voice input channel, thereby controlling the gating of the voice input channel And the signal of the strobed voice input channel is weighted and mixed. The method and device of the invention can reduce the false positive rate of the input channel strobe and improve the audio quality after mixing.

Claims

权利要求书 Claim
1、 一种多话筒混音方法, 包括:  1. A multi-microphone mixing method, comprising:
统计当前时段各输入通道的信号强度, 据此选出至少两个信号强度大 的输入通道以进行语音检测, 并将检测出的有语音的输入通道确定为语音 输入通道;  Counting the signal strength of each input channel in the current time period, and selecting at least two input channels with high signal strength to perform voice detection, and determining the detected input channel with voice as a voice input channel;
当所述语音输入通道为至少两个时, 确定各语音输入通道的信号之间 的信号相似度, 据此控制语音输入通道的选通, 并将选通的语音输入通道 的信号进行加权混音输出。  When the voice input channels are at least two, determining a signal similarity between the signals of the voice input channels, thereby controlling the gating of the voice input channel, and weighting the signals of the gated voice input channels Output.
2、 如权利要求 1所述的方法, 其中, 还包括: 当语音输入通道只有一 个时, 直接控制该语音输入通道选通。  2. The method according to claim 1, further comprising: directly controlling the voice input channel gating when there is only one voice input channel.
3、 如权利要求 1所述的方法, 其中, 根据各个信号相似度控制语音输 入通道的选通的方法为:  3. The method of claim 1, wherein the method of controlling the gating of the voice input channel according to each signal similarity is:
对于任意两个语音输入通道, 若两个语音输入通道的信号相似度都小 于等于预设的第一阈值时, 控制该两个输入通道都选通。  For any two voice input channels, if the signal similarity of the two voice input channels is less than or equal to the preset first threshold, both input channels are controlled to be strobed.
4、 如权利要求 1或 3所述的方法, 其中, 还包括:  4. The method according to claim 1 or 3, further comprising:
若存在两个语音输入通道并且其信号相似度大于和 /或等于预设的第一 阈值时, 根据该两个语音输入通道的信号强度大小以及信号相似度所对应 的两个信号的延时, 控制该两个语音输入通道的选通。  If there are two voice input channels and the signal similarity is greater than and/or equal to the preset first threshold, according to the signal strength of the two voice input channels and the delay of the two signals corresponding to the signal similarity, Controls the gating of the two speech input channels.
5、 如权利要求 4所述的方法, 其中, 根据该两个语音输入通道的信号 强度大小以及信号相似度所对应的两个信号的延时, 控制该两个语音输入 通道的选通的过程包括:  5. The method according to claim 4, wherein the gating process of the two speech input channels is controlled according to a signal strength of the two speech input channels and a delay of two signals corresponding to the signal similarity Includes:
当两个语音输入通道的信号强度差异值大于设定值时, 控制该两个语 音输入通道中的一个选通;  Controlling one of the two voice input channels when the signal strength difference value of the two voice input channels is greater than a set value;
当两个语音输入通道的信号强度差异值小于设定值时, 确定两个语音 输入通道的信号相似度, 并取相似度函数最值所对应的两个信号的相对延 时, 据此选通两个语音输入通道中的一个或两个; When the signal strength difference value of the two voice input channels is less than the set value, the signal similarity of the two voice input channels is determined, and the relative delays of the two signals corresponding to the maximum value of the similarity function are taken. When, according to this, one or two of the two voice input channels are strobed;
当两个语音输入通道的信号强度差异值等于设定值时, 控制该两个语 音输入通道中的一个选通; 或者, 确定两个语音输入通道的信号相似度, 并取相似度函数最值所对应的两个信号的相对延时, 据此选通两个语音输 入通道中的一个或两个;  Controlling one of the two voice input channels when the signal strength difference between the two voice input channels is equal to the set value; or determining the signal similarity of the two voice input channels, and taking the maximum value of the similarity function The relative delay of the corresponding two signals, according to which one or two of the two voice input channels are gated;
根据所述相对延时选通所述语音输入通道的过程包括: 若两个信号的 相对延时大于设定时长, 则控制两个语音输入通道中的一个选通; 若两信 号的相对延时小于设定时长, 则控制两个语音输入通道都选通; 若两个信 号的相对延时等于设定时长, 则控制两个语音输入通道中的一个选通或都 选通。  The process of strobing the voice input channel according to the relative delay includes: if one of the two signals has a relative delay greater than a set duration, controlling one of the two voice input channels; if the relative delay of the two signals If it is less than the set duration, then both voice input channels are controlled to be strobed; if the relative delay of the two signals is equal to the set duration, one of the two voice input channels is controlled to be strobed or strobed.
6、 如权利要求 5所述的方法, 其中, 所述控制两个语音输入通道中的 一个选通的方法为:  6. The method of claim 5, wherein the method of controlling one of the two voice input channels is:
控制两个语音输入通道中信号强度较大的语音输入通道选通。  Controls the voice input channel strobe of the two voice input channels with high signal strength.
7、 如权利要求 1所述的方法, 其中, 所述确定各语音输入通道的信号 之间的信号相似度的过程包括:  7. The method according to claim 1, wherein the process of determining signal similarity between signals of each voice input channel comprises:
将各语音输入通道的信号进行带通滤波预处理;  Performing band pass filtering preprocessing on the signals of each voice input channel;
在预处理后的所有通道中, 针对每两个信号利用归一化互相关函数或 者平均幅度差异函数确定信号相似度。  In all channels after preprocessing, the signal similarity is determined for each of the two signals using a normalized cross-correlation function or an average amplitude difference function.
8、 一种多话筒混音装置, 包括:  8. A multi-microphone mixing device comprising:
统计模块, 用于统计当前时段各输入通道的信号强度, 据此选出至少 两个信号强度大的输入通道以进行语音检测;  a statistical module, configured to count signal strength of each input channel in the current time period, and thereby selecting at least two input channels with high signal strength for voice detection;
相似度确定模块, 用于将所述统计模块检测出的有语音的输入通道确 定为语音输入通道, 并在语音输入通道为至少两个时, 确定各语音输入通 道的信号之间的信号相似度;  a similarity determination module, configured to determine, by the statistical module, a voice input channel as a voice input channel, and determine a signal similarity between signals of each voice input channel when the voice input channel is at least two ;
选通模块, 用于根据所述相似度确定模块已确定的各个信号相似度控 制语音输入通道的选通; a gating module, configured to determine, according to the similarity determination module, each signal similarity control determined by the module Gating the voice input channel;
混音模块, 用于将被所述选通模块选通的语音输入通道的信号进行加 权混音输出。  And a mixing module, configured to perform a weighted mixing output on a signal input by the channel input channel of the gating module.
9、 如权利要求 8所述的装置, 其中, 所述选通模块, 还用于当语音输 入通道只有一个时, 直接控制该语音输入通道选通。  The device of claim 8, wherein the gating module is further configured to directly control the voice input channel gating when there is only one voice input channel.
10、 如权利要求 8所述的装置, 其中, 所述选通模块, 用于: 对于任 意两个语音输入通道, 若两个语音输入通道的信号相似度小于等于预设的 第一阈值时, 控制该两个输入通道都选通。  The device according to claim 8, wherein the gating module is configured to: if any two voice input channels have a signal similarity less than or equal to a preset first threshold, Control both input channels to strobe.
PCT/CN2011/083165 2010-12-17 2011-11-29 Method and apparatus for audio mixing of multiple microphones WO2012079459A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010594522.0A CN102056053B (en) 2010-12-17 2010-12-17 Multi-microphone audio mixing method and device
CN201010594522.0 2010-12-17

Publications (1)

Publication Number Publication Date
WO2012079459A1 true WO2012079459A1 (en) 2012-06-21

Family

ID=43959897

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/083165 WO2012079459A1 (en) 2010-12-17 2011-11-29 Method and apparatus for audio mixing of multiple microphones

Country Status (2)

Country Link
CN (1) CN102056053B (en)
WO (1) WO2012079459A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104064191A (en) * 2014-06-10 2014-09-24 百度在线网络技术(北京)有限公司 Audio mixing method and device
CN107333093A (en) * 2017-05-24 2017-11-07 苏州科达科技股份有限公司 A kind of sound processing method, device, terminal and computer-readable recording medium
CN111696515A (en) * 2020-06-15 2020-09-22 杭州艾力特数字科技有限公司 Audio mixing method for teaching recording and broadcasting

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102056053B (en) * 2010-12-17 2015-04-01 中兴通讯股份有限公司 Multi-microphone audio mixing method and device
CN103905942B (en) * 2012-12-26 2018-08-10 联想(北京)有限公司 The method and electronic equipment of voice data acquisition
CN103327433B (en) 2013-05-27 2014-08-27 腾讯科技(深圳)有限公司 Audio input interface detection method and system thereof
CN104219013B (en) * 2014-09-01 2017-05-24 厦门亿联网络技术股份有限公司 Method for multi-microphone sound mixing of video conference system
CN105848062B (en) * 2015-01-12 2018-01-05 芋头科技(杭州)有限公司 The digital microphone of multichannel
CN104616665B (en) * 2015-01-30 2018-04-24 深圳市云之讯网络技术有限公司 Sound mixing method based on voice similar degree
CN105049807B (en) * 2015-07-31 2018-05-18 小米科技有限责任公司 Monitored picture sound collection method and device
EP3455853A2 (en) * 2016-05-13 2019-03-20 Bose Corporation Processing speech from distributed microphones
CN107170465B (en) * 2017-06-29 2020-07-14 数据堂(北京)科技股份有限公司 Audio quality detection method and audio quality detection system
CN109327633B (en) * 2017-07-31 2020-09-22 苏州谦问万答吧教育科技有限公司 Sound mixing method, device, equipment and storage medium
CN107800902B (en) * 2017-09-15 2019-09-13 北京容联易通信息技术有限公司 The sound mixing method and system of multi-path voice
CN109994122B (en) * 2017-12-29 2023-10-31 阿里巴巴集团控股有限公司 Voice data processing method, device, equipment, medium and system
CN110060696B (en) * 2018-01-19 2021-06-15 腾讯科技(深圳)有限公司 Sound mixing method and device, terminal and readable storage medium
CN109510905B (en) * 2018-12-06 2020-10-30 中通天鸿(北京)通信科技股份有限公司 Multi-channel voice mixing method and system
CN110708432B (en) * 2019-10-12 2021-01-12 浙江大华技术股份有限公司 Method, system, device and storage medium for audio output in audio conference
CN111065019A (en) * 2019-12-09 2020-04-24 唐山师范学院 Multi-microphone sound mixing method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1716986A (en) * 2004-06-30 2006-01-04 宝利通公司 Stereo microphone processing for teleconferencing
CN101192411A (en) * 2007-12-27 2008-06-04 北京中星微电子有限公司 Large distance microphone array noise cancellation method and noise cancellation system
CN101894551A (en) * 2010-07-02 2010-11-24 华南理工大学 Method and device for automatically identifying cough
CN102056053A (en) * 2010-12-17 2011-05-11 中兴通讯股份有限公司 Multi-microphone audio mixing method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7751572B2 (en) * 2005-04-15 2010-07-06 Dolby International Ab Adaptive residual audio coding

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1716986A (en) * 2004-06-30 2006-01-04 宝利通公司 Stereo microphone processing for teleconferencing
CN101192411A (en) * 2007-12-27 2008-06-04 北京中星微电子有限公司 Large distance microphone array noise cancellation method and noise cancellation system
CN101894551A (en) * 2010-07-02 2010-11-24 华南理工大学 Method and device for automatically identifying cough
CN102056053A (en) * 2010-12-17 2011-05-11 中兴通讯股份有限公司 Multi-microphone audio mixing method and device

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104064191A (en) * 2014-06-10 2014-09-24 百度在线网络技术(北京)有限公司 Audio mixing method and device
CN104064191B (en) * 2014-06-10 2017-12-15 北京音之邦文化科技有限公司 Sound mixing method and device
CN107333093A (en) * 2017-05-24 2017-11-07 苏州科达科技股份有限公司 A kind of sound processing method, device, terminal and computer-readable recording medium
CN107333093B (en) * 2017-05-24 2019-11-08 苏州科达科技股份有限公司 A kind of sound processing method, device, terminal and computer readable storage medium
CN111696515A (en) * 2020-06-15 2020-09-22 杭州艾力特数字科技有限公司 Audio mixing method for teaching recording and broadcasting
CN111696515B (en) * 2020-06-15 2023-08-15 杭州艾力特数字科技有限公司 Audio mixing method for teaching recording and playing

Also Published As

Publication number Publication date
CN102056053A (en) 2011-05-11
CN102056053B (en) 2015-04-01

Similar Documents

Publication Publication Date Title
WO2012079459A1 (en) Method and apparatus for audio mixing of multiple microphones
US10602267B2 (en) Sound signal processing apparatus and method for enhancing a sound signal
JP4744874B2 (en) Sound detection and specific system
US9269367B2 (en) Processing audio signals during a communication event
US11437021B2 (en) Processing audio signals
US20060161430A1 (en) Voice activation
CN111429939B (en) Sound signal separation method of double sound sources and pickup
EP2229678A1 (en) Systems, methods, and apparatus for multi-microphone based speech enhancement
WO2008156941A1 (en) Sound discrimination method and apparatus
JP2010112994A (en) Voice processing device, voice processing method and program
Shujau et al. Separation of speech sources using an acoustic vector sensor
US6959095B2 (en) Method and apparatus for providing multiple output channels in a microphone
US10229686B2 (en) Methods and apparatus for speech segmentation using multiple metadata
US11335331B2 (en) Multibeam keyword detection system and method
GB2566756A (en) Temporal and spatial detection of acoustic sources
Hummes et al. Robust acoustic speaker localization with distributed microphones
Araki et al. Speaker indexing and speech enhancement in real meetings/conversations
JPH10243494A (en) Method and device for recognizing direction of face
KR101073632B1 (en) A zero-crossing-based multiple source localization apparatus in reverberant environments
Maraboina et al. Multi-speaker voice activity detection using ICA and beampattern analysis
US20240221778A1 (en) System and method for optimized audio mixing
Sun et al. A Lightweight Hybrid Multi-Channel Speech Extraction System with Directional Voice Activity Detection
Li et al. Effect of the division between early and late reflections on intelligibility of ideal binary-masked speech
WO2024147968A1 (en) System and method for optimized audio mixing
WO2022188712A1 (en) Method and apparatus for switching main microphone, voice detection method and apparatus for microphone, microphone-loudspeaker integrated device, and readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11848329

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11848329

Country of ref document: EP

Kind code of ref document: A1