WO2011153904A1 - Speech signal processing method and device based on microphone array - Google Patents

Speech signal processing method and device based on microphone array Download PDF

Info

Publication number
WO2011153904A1
WO2011153904A1 PCT/CN2011/074794 CN2011074794W WO2011153904A1 WO 2011153904 A1 WO2011153904 A1 WO 2011153904A1 CN 2011074794 W CN2011074794 W CN 2011074794W WO 2011153904 A1 WO2011153904 A1 WO 2011153904A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
microphone
sampling point
weight
speech signal
Prior art date
Application number
PCT/CN2011/074794
Other languages
French (fr)
Chinese (zh)
Inventor
何宏森
黄志宏
邱小军
袁浩
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2011153904A1 publication Critical patent/WO2011153904A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Abstract

The present invention discloses a speech signal processing method based on a microphone array, and the microphone array is composed of more than two directional microphones. The method comprises the following steps: determining the energy values of speech signals of the same frame, received by each directional microphone; determining adjustment parameters of speech signals of the same frame according to energy values; determining the weight of each sampling point signal in the speech signals according to the adjustment parameter of each speech signal, multiplying each sampling point signal in each speech signal by each weight, accumulating product values of sampling point signals corresponding to each speech signal, and outputting accumulated sampling point signals in sequence. The present invention also discloses a speech signal processing device based on the microphone array. The present invention has a simple compute mode, needs no complex calculation and circuit, and has favorable reverberation resistance and orientated pickup functions.

Description

基于传声器阵列的语音信号处理方法及装置 技术领域  Voice signal processing method and device based on microphone array
本发明涉及语音信号处理技术, 尤其涉及一种基于传声器阵列的语音 信号处理方法及装置。 背景技术  The present invention relates to voice signal processing technologies, and in particular, to a voice signal processing method and apparatus based on a microphone array. Background technique
在会议场所, 各种干扰源的存在以及混响等噪音干扰着语音信号, 会 使语音处理系统的性能急剧下降, 因此语音增强技术颇为重要。 基于传声 器阵列的多通道语音增强算法融合了信号的时空域信息, 利用噪声同语音 的相关性差异进行消噪, 近几年已经成为多媒体会议、 通信、 声控等系统 所依赖的重要技术。 音质和性能的好坏会严重影响音频会议系统的整体效 果与市场竟争力, 因此, 针对噪声, 目前常通过传声器阵列技术实现消噪, 这使得音频会议系统的参与人员彻底摆脱手持传声器并定向到传声器的束 缚, 大大提高了音频会议系统的实用性。 对语音信号处理而言, 需要争取 使进入编码器的语音音质就比较好, 如低混响、 低噪声等, 传声器阵列就 是为保证语音信号的低混响、 低噪声而设计的。  At conference venues, the presence of various sources of interference and noise such as reverberations that interfere with the speech signal can dramatically degrade the performance of the speech processing system, so speech enhancement techniques are important. The multi-channel speech enhancement algorithm based on the microphone array combines the spatio-temporal information of the signal, and uses the difference of noise and speech to denoise. In recent years, it has become an important technology relying on multimedia conference, communication, voice control and other systems. The sound quality and performance will seriously affect the overall effect and market competitiveness of the audio conferencing system. Therefore, for noise, noise cancellation is often achieved through the microphone array technology, which makes the participants of the audio conferencing system completely free from the handheld microphone and oriented. The shackles of the microphone greatly improve the practicality of the audio conferencing system. For speech signal processing, it is necessary to strive to make the speech quality of the input encoder better, such as low reverberation, low noise, etc., and the microphone array is designed to ensure low reverberation and low noise of the speech signal.
公开号为 CN101496417A、公开日为 2009年 7月 29日的中国专利申请 公开了一种 "语音会议系统", 在不同方向上的多个单向传声器拾取的语音 釆集信号形成多个语音釆集束信号, 其后, 与语音到达方向相对应的语音 釆集束信号的信号电平变高, 语音釆集部分选择信号电平超过设定阔值的 语音釆集束信号, 并将该信号送到通信部分。 该技术方案中, 超过阔值的 语音釆集束信号可能有多个, 这样在小房间就会增加混响, 使声音清晰度 降低。  A "voice conference system" is disclosed in Chinese Patent Application Publication No. CN101496417A, the disclosure of which is hereby incorporated by reference. a signal, and thereafter, a signal level of the voice 釆 bundle signal corresponding to the direction of arrival of the voice becomes higher, and the voice concentrating portion selects a voice 釆 bundle signal whose signal level exceeds the set threshold, and sends the signal to the communication portion . In this technical solution, there may be more than one voice 釆 bundle signal exceeding the threshold, so that the reverberation is increased in the small room, and the sound clarity is lowered.
公开号为 US20050195988A1、公开日为 2005年 9月 8日的美国专利申 请公开了一种 "System and method for beamforming using a microphone array" , 该技术方案是使用传声器阵列聚束的系统和方法, 其技术方案的实 质是设计了一个聚束器, 该聚束器首先利用描述传声器阵列的特性和结构 的参数信息来计算频域相关的权值矩阵, 与一个或多个为传声器阵列周围 环境自动生成或计算得到的噪声模型相结合, 来对传声器阵列的最优固定 波束进行设计, 然后, 在对传声器阵列接收的音频信号进行频域聚束处理 时, 利用此权值矩阵对传声器阵列中的每个传声器的输出进行频域加权。 该方法需要根据阵列的特性和结构在频域计算加权矩阵, 以达到形成波束 的目的, 增加了系统的复杂度, 加大了系统的开发难度并且降低了系统的 可靠性。 发明内容 U.S. Patent Application No. US20050195988A1, published on September 8, 2005 Please disclose a "System and method for beamforming using a microphone array". The technical solution is a system and method for bunching using a microphone array. The essence of the technical solution is to design a buncher, which first utilizes the buncher. Describe the parameter information of the characteristics and structure of the microphone array to calculate the frequency domain related weight matrix, combined with one or more noise models automatically generated or calculated for the surroundings of the microphone array to optimize the fixed beam of the microphone array The design is performed, and then, when the audio signal received by the microphone array is subjected to frequency domain focusing processing, the weight matrix is used to frequency-weight the output of each of the microphones in the microphone array. The method needs to calculate the weighting matrix in the frequency domain according to the characteristics and structure of the array, so as to achieve the purpose of forming a beam, increasing the complexity of the system, increasing the development difficulty of the system and reducing the reliability of the system. Summary of the invention
有鉴于此, 本发明的主要目的在于提供一种基于传声器阵列的语音信 号处理方法及装置, 利用强指向性传声器阵列能将距发言人最近的语音信 号进行放大, 从而能动态跟踪发言人。  In view of the above, the main object of the present invention is to provide a method and apparatus for processing a voice signal based on a microphone array. The strong directional microphone array can amplify the voice signal closest to the speaker, thereby dynamically tracking the speaker.
为达到上述目的, 本发明的技术方案是这样实现的:  In order to achieve the above object, the technical solution of the present invention is achieved as follows:
一种基于传声器阵列的语音信号处理方法, 所述传声器阵列由两个以 上的指向性传声器构成; 所述方法包括:  A voice signal processing method based on a microphone array, the microphone array being composed of two or more directional microphones; the method comprising:
确定各指向性传声器接收的相同帧的语音信号的能量值;  Determining an energy value of a speech signal of the same frame received by each directional microphone;
根据所述能量值确定所述相同帧的各语音信号的调整参数;  Determining, according to the energy value, an adjustment parameter of each voice signal of the same frame;
根据各语音信号的调整参数确定语音信号中各取样点信号的权值, 将 各语音信号中各取样点信号与各自的权值相乘, 并对各语音信号的对应取 样点信号的乘积值进行累加, 将累加后的取样点信号依次输出。  Determining the weight of each sampling point signal in the speech signal according to the adjustment parameter of each speech signal, multiplying each sampling point signal in each speech signal by a respective weight value, and performing a product value of the corresponding sampling point signal of each speech signal Accumulatively, the accumulated sampling point signals are sequentially output.
优选地, 所述根据所述能量值确定所述相同帧的各语音信号的调整参 数, 为:  Preferably, the determining, according to the energy value, an adjustment parameter of each voice signal of the same frame is:
将所述相同帧的各语音信号的能量值分别与最大的能量值作商; 对各商值进行指数调整处理, 并作为各语音信号的调整参数。 Comparing the energy values of the respective speech signals of the same frame with the maximum energy value; The quotient values are subjected to exponential adjustment processing as adjustment parameters for each speech signal.
优选地, 所述对各商值进行指数调整处理, 并作为各语音信号的调整 参数, 为:  Preferably, the exponential adjustment processing is performed on each quotient value, and as an adjustment parameter of each voice signal, it is:
将各商值的 E次方作为各语音信号的调整参数; 其中, E为大于等于 2 小于等于 10的正数。  The E-th power of each quotient is used as an adjustment parameter of each speech signal; wherein E is a positive number greater than or equal to 2 and less than or equal to 10.
优选地, 所述根据各语音信号的调整参数确定语音信号中各取样点信 号的权值, 具体按下式计算:  Preferably, the determining, according to the adjustment parameter of each voice signal, the weight of each sampling point signal in the voice signal, which is calculated according to the following formula:
wi{n) = wi{n-\) + {\-X)C-其中, w )为传声器 i中的当前语音信号帧 中第 n个取样点信号的权值, W -1)为传声器 i中的当前语音信号帧中第 n-1个取样点信号的权值; 为预先设定的遗忘因子, 0< <1; C为当前语 音信号帧的调整参数。 w i {n) = w i {n-\) + {\-X)C-where, w ) is the weight of the nth sample point signal in the current speech signal frame in the microphone i, W -1) The weight of the n-1th sampling point signal in the current speech signal frame in the microphone i; is a predetermined forgetting factor, 0<<1; C is an adjustment parameter of the current speech signal frame.
优选地, 所述根据各语音信号的调整参数确定语音信号中各取样点信 号的权值, 为:  Preferably, the determining, according to the adjustment parameter of each voice signal, the weight of each sampling point signal in the voice signal is:
wi{n) = wi{n-\) + {\-X)C-其中, w )为传声器 i中的当前语音信号帧 中第 n个取样点信号的初始权值, W -1)为传声器 i中的当前语音信号帧 中第 n-1个取样点信号的初始权值; 为预先设定的遗忘因子, 0< <1; C 为当前语音信号帧的调整参数; w i {n) = w i {n-\) + {\-X)C-where, w ) is the initial weight of the nth sample point signal in the current speech signal frame in the microphone i, W -1) The initial weight of the n-1th sampling point signal in the current speech signal frame in the microphone i; is a predetermined forgetting factor, 0<<1; C is an adjustment parameter of the current speech signal frame;
按下式对 w )进行处理, 将 )作为传声器 i中的当前语音信号帧中 第 n个取样点信号的最终权值: The following method is used to treat w ) as the final weight of the nth sampling point signal in the current speech signal frame in the microphone i:
v. (n )  v. (n)
^(η) = ^ ( (、 ' 、 —-, 其中, max ( ) 为取最大值计算。  ^(η) = ^ ( (, ' , , --, where max ( ) is the maximum value.
max^ (n), w2 (n),…, wN (n)) 优选地, 所述传声器阵列为圓形阵列或球形阵列; 所述传声器阵列中 的传声器数量为 4至 16个。 Max^(n), w 2 (n), ..., w N (n)) Preferably, the microphone array is a circular array or a spherical array; the number of microphones in the microphone array is 4 to 16.
一种基于传声器阵列的语音信号处理装置, 所述传声器阵列由两个以 上的指向性传声器构成; 所述装置包括第一确定单元、 第二确定单元、 计 算单元和输出单元; 其中, A voice signal processing device based on a microphone array, the microphone array being composed of two or more directional microphones; the device comprising a first determining unit, a second determining unit, and a meter Computing unit and output unit; wherein
第一确定单元, 用于确定各指向性传声器接收的相同帧的语音信号的 能量值;  a first determining unit, configured to determine an energy value of a voice signal of the same frame received by each directional microphone;
第二确定单元, 用于根据所述能量值确定所述相同帧的各语音信号的 调整参数;  a second determining unit, configured to determine, according to the energy value, an adjustment parameter of each voice signal of the same frame;
计算单元, 用于根据各语音信号的调整参数确定语音信号中各取样点 信号的权值, 将各语音信号中各取样点信号与各自的权值相乘, 并对各语 音信号的对应取样点信号的乘积值进行累加;  a calculating unit, configured to determine, according to an adjustment parameter of each voice signal, a weight of each sampling point signal in the voice signal, multiply each sampling point signal in each voice signal by a respective weight, and corresponding sampling points of each voice signal The product value of the signal is accumulated;
输出单元, 用于将累加后的取样点信号依次输出。  The output unit is configured to sequentially output the accumulated sampling point signals.
优选地, 所述第二确定单元进一步将所述相同帧的各语音信号的能量 值分别与最大的能量值作商; 并对各商值进行指数调整处理, 作为各语音 信号的调整参数。  Preferably, the second determining unit further compares the energy values of the voice signals of the same frame with the maximum energy value; and performs exponential adjustment processing on each quotient value as an adjustment parameter of each voice signal.
优选地, 所述第二确定单元进一步将各商值的 E次方作为各语音信号 的调整参数; 其中, E为大于等于 2小于等于 10的正数。  Preferably, the second determining unit further uses the E-th power of each quotient as an adjustment parameter of each voice signal; wherein, E is a positive number greater than or equal to 2 and less than or equal to 10.
优选地, 所述计算单元进一步按下式计算语音信号中各取样点信号的 权值:  Preferably, the calculating unit further calculates a weight of each sampling point signal in the voice signal according to the following formula:
wi{n) = wi{n-\) + {\-X)C-其中, w )为传声器 i中的当前语音信号帧 中第 n个取样点信号的权值, W -1)为传声器 i中的当前语音信号帧中第 n-1个取样点信号的权值; 为预先设定的遗忘因子, 0< <1; C为当前语 音信号帧的调整参数。 w i {n) = w i {n-\) + {\-X)C-where, w ) is the weight of the nth sample point signal in the current speech signal frame in the microphone i, W -1) The weight of the n-1th sampling point signal in the current speech signal frame in the microphone i; is a predetermined forgetting factor, 0<<1; C is an adjustment parameter of the current speech signal frame.
优选地, 所述计算单元进一步按下述方式计算语音信号中各取样点信 号的权值:  Preferably, the calculating unit further calculates the weight of each sampling point signal in the voice signal as follows:
wi{n) = wi{n-\) + {\-X)C-其中, w )为传声器 i中的当前语音信号帧 中第 n个取样点信号的初始权值, W -1)为传声器 i中的当前语音信号帧 中第 n-1个取样点信号的初始权值; 为预先设定的遗忘因子, 0< <1; C 为当前语音信号帧的调整参数; w i {n) = w i {n-\) + {\-X)C-where, w ) is the initial weight of the nth sample point signal in the current speech signal frame in the microphone i, W -1) Is the initial weight of the n-1th sampling point signal in the current speech signal frame in the microphone i; is a predetermined forgetting factor, 0<<1; C The adjustment parameter for the current speech signal frame;
按下式对 w )进行处理, 将 )作为传声器 i中的当前语音信号帧中 第 n个取样点信号的最终权值: The following method is used to treat w ) as the final weight of the nth sampling point signal in the current speech signal frame in the microphone i:
v. (n )  v. (n)
^ (n) = ^ ( (、 1 —- , 其中, max ( ) 为取最大值计算。 ^ (n) = ^ ( (, 1 —- , where max ( ) is the maximum value.
max^ (n), w2 (n),…, wN (n)) 优选地, 所述传声器阵列为圓形阵列或球形阵列; 所述传声器阵列中 的传声器数量为 3至 16个。 Max^(n), w 2 (n), ..., w N (n)) Preferably, the microphone array is a circular array or a spherical array; the number of microphones in the microphone array is 3 to 16.
本发明中, 釆用 N个强指向性传声器构成圓形阵列, 阵列的拾音覆盖 360度方位;首先对强指向性传声器阵列中各传声器接收到的语音信号的能 量值, 通过语音信号的能量值信息, 确定出各传声器接收到的当前语音帧 的语音信号的调整参数, 并利用该调整参数计算当前语音帧的各釆样点信 号的权值, 将所计算出的权值与对应的釆样点信号相乘, 对相同位置的釆 样点信号的乘积进行累加, 并按釆样点的顺序依次输出。 本发明利用传声 器阵列中各传声器所接收语音信号的能量值来确定各路语音信号的调整参 数, 并利用遗忘因子对各釆样点信号进行平滑处理, 使得所输出的语音信 号更连贯。 本发明计算方式简单, 不需要复杂的计算和电路, 具有良好的 抗混响和定向拾音功能。 附图说明  In the present invention, a strong array of N directional microphones is used to form a circular array, and the pickup of the array covers a 360-degree orientation; first, the energy value of the speech signal received by each microphone in the strong directional microphone array, and the energy of the speech signal. The value information determines an adjustment parameter of the voice signal of the current voice frame received by each microphone, and uses the adjustment parameter to calculate the weight of each sample point signal of the current voice frame, and the calculated weight value and the corresponding weight The sample signals are multiplied, and the product of the sample signals at the same position is accumulated and sequentially output in the order of the sample points. The invention utilizes the energy values of the speech signals received by the microphones in the microphone array to determine the adjustment parameters of the respective speech signals, and smoothes the signal of each sample point by using the forgetting factor, so that the outputted speech signals are more consistent. The invention has simple calculation method, does not require complicated calculations and circuits, and has good anti-reverberation and directional pickup functions. DRAWINGS
图 1为本发明基于传声器阵列的语音信号处理方法的流程图; 图 2为在混响室两个声源相互切换发声时, 传声器阵列中各传声器拾 取的语音信号语音帧的归一化能量变化关系的示意图;  1 is a flow chart of a method for processing a voice signal based on a microphone array according to the present invention; FIG. 2 is a normalized energy change of a voice signal frame of a voice signal picked up by each microphone in a microphone array when two sound sources of the reverberation chamber are switched to each other. Schematic diagram of the relationship;
图 3 为在混响室两个声源相互切换发声时, 传声器阵列的输出信号中 各通道语音帧所占的平均权重变化关系的示意图;  FIG. 3 is a schematic diagram showing the relationship between the average weights of the voice frames of each channel in the output signal of the microphone array when the two sound sources are switched to each other in the reverberation chamber;
图 4为在混响室两个声源同时发声时, 传声器阵列中各传声器拾取的 语音信号语音帧的归一化能量变化关系的示意图; 图 5 为在混响室两个声源同时发声时, 传声器阵列的输出信号中各通 道语音帧所占的平均权重变化关系的示意图; 4 is a schematic diagram showing a normalized energy change relationship of a speech signal speech frame picked up by each microphone in the microphone array when two sound sources are simultaneously sounded in the reverberation chamber; FIG. 5 is a schematic diagram showing the relationship between the average weights of voice frames of each channel in the output signal of the microphone array when two sound sources are simultaneously sounded in the reverberation chamber;
图 6为在普通房间两个声源相互切换发声时, 传声器阵列中各传声器 拾取的语音信号语音帧的归一化能量变化关系的示意图;  6 is a schematic diagram showing a normalized energy change relationship of a speech signal speech frame picked up by each microphone in the microphone array when two sound sources are switched to each other in an ordinary room;
图 7为在普通房间两个声源相互切换发声时, 传声器阵列的输出信号 中各通道语音帧所占的平均权重变化关系的示意图;  FIG. 7 is a schematic diagram showing the relationship between the average weights of the voice frames of each channel in the output signal of the microphone array when the two sound sources are switched to each other in an ordinary room;
图 8为在普通房间两个声源同时发声时, 传声器阵列中各传声器拾取 的语音信号语音帧的归一化能量变化关系的示意图;  8 is a schematic diagram showing a normalized energy change relationship of a speech signal speech frame picked up by each microphone in the microphone array when two sound sources are simultaneously sounded in an ordinary room;
图 9为在普通房间两个声源同时发声时, 传声器阵列的输出信号中各 通道语音帧所占的平均权重变化关系的示意图;  FIG. 9 is a schematic diagram showing the relationship between the average weights of voice frames of each channel in the output signal of the microphone array when two sound sources are simultaneously sounded in an ordinary room;
图 10 为本发明基于传声器阵列的语音信号处理装置的组成结构示意 图。 具体实施方式  Fig. 10 is a schematic view showing the structure of a voice signal processing apparatus based on a microphone array of the present invention. detailed description
本发明的基本思想为, 釆用 N个强指向性传声器构成圓形阵列, 使传 声器阵列的拾音覆盖 360度方位; 对各传声器拾取的信号分帧计算能量, 通过能量的比较, 保持能量最大通道的语音信号幅度不变, 而弱化其它通 道的语音信号; 语音信号的弱化程度受调整参数的控制; 并且, 为保证基 于能量比较在各通道间进行切换时语音信号平稳自然而无切换噪声, 引入 平滑机制一遗忘因子, 将当前釆样点和以前釆样点的信号相结合来进行切 换。  The basic idea of the present invention is to form a circular array by using N strong directional microphones, so that the pickup of the microphone array covers 360 degrees of orientation; calculate the energy of the signals picked up by the microphones, and maintain the maximum energy by comparing the energy. The amplitude of the speech signal of the channel is unchanged, and the speech signal of other channels is weakened; the degree of weakening of the speech signal is controlled by the adjustment parameter; and, in order to ensure that the speech signal is smooth and natural without switching noise when switching between channels based on energy comparison, A smoothing mechanism-forgetting factor is introduced, and the current sampling point is combined with the signal of the previous sampling point to switch.
为使本发明的目的、 技术方案和优点更加清楚明白, 以下举实施例并 参照附图, 对本发明进一步详细说明。  The present invention will be further described in detail below with reference to the accompanying drawings.
本发明的方法中, 传声器阵列中的传声器均为强指向性传声器, 而非 全向性传声器。 所谓强指向性传声器, 即该传声器能按指向进行语音信号 的釆集。 强指向性传声器能有效降低进入各个传声器的混响强度; 本发明 正是利用强指向性传声器的指向性拾音特点, 利用各传声器所拾取的相同 语音帧的能量来确定各该相同语音帧中各语音信号中各釆样点信号的权 值, 从而输出较佳的语音信号。 本发明的传声器阵列釆用圓周或球面布局, 以对各方位的语音信号进行釆集。 本发明中, 传声器阵列中的强指向性传 声器的数量一般为 3至 16个, 以在设定的圓周或球面上均匀分布, 达到各 个指向均有相应的传声器进行语音釆集。 圓周或球面的半径一般为 3 至 20cm, 各传声器振膜沿圓周或圓球的径向朝外。 In the method of the present invention, the microphones in the microphone array are all strong directivity microphones, rather than omnidirectional microphones. The so-called strong directional microphone, that is, the microphone can perform the collection of voice signals by pointing. Strong directional microphone can effectively reduce the reverberation intensity entering each microphone; the invention It is the directional pick-up feature of the strong directional microphone that uses the energy of the same speech frame picked up by each microphone to determine the weight of each sample signal in each of the same speech frames, so that the output is better. Voice signal. The microphone array of the present invention uses a circumferential or spherical layout to collect the speech signals of the various bits. In the present invention, the number of strong directional microphones in the microphone array is generally 3 to 16, so as to be evenly distributed on the set circumference or the spherical surface, and the corresponding microphones are provided for the respective points to perform voice collection. The radius of the circumference or the sphere is generally 3 to 20 cm, and the diaphragms of the microphones face outward in the radial direction of the circumference or the sphere.
基于传声器阵列, 上述传声器阵列中第 ( =1, 2, ..·, N)个传声器 接收的第 k帧 (帧长为 L毫秒 )釆样信号如 ( 1 ) 式所示:  Based on the microphone array, the kth frame (frame length L milliseconds) received by the (=1, 2, .., N) microphones in the above microphone array is as shown in (1):
xi(n) = xi((k-l)L + j), j = \, 2, ···, L (1) x i (n) = x i ((kl)L + j), j = \, 2, ···, L (1)
图 1 为本发明基于传声器阵列的语音信号处理方法的流程图, 如图 1 所示, 本发明基于传声器阵列的语音信号处理方法具体包括如下步骤: 步骤 101, 计算第 i ( i =1, 2, ..., Ν)个传声器接收的第 帧信号的 能量。 因正对声源的传声器所釆集到的语音信号相对来说能量要强, 通过 语音信号的能量能进行声源方位的初步判断; 所计算出的语音信号的能量 值, 同时也作为确定该传声器的语音信号处理的权重值的依据, 将在以下 步骤中对如何确定相应的权重值。 第 ( =1, 2, .··, N)个传声器接收的 第 帧信号的能量值 如式(2)所示:  FIG. 1 is a flowchart of a method for processing a voice signal based on a microphone array according to the present invention. As shown in FIG. 1, the voice signal processing method based on the microphone array of the present invention specifically includes the following steps: Step 101: Calculate an i (i =1, 2) , ..., Ν) The energy of the first frame signal received by the microphone. Because the speech signal collected by the microphone of the sound source is relatively strong, the energy of the speech signal can be used to make a preliminary judgment of the sound source orientation; the energy value of the calculated speech signal is also determined as the microphone. The basis for the weight value of the speech signal processing will be how to determine the corresponding weight value in the following steps. The energy value of the first frame signal received by the (1, 2, . . . , N) microphones is as shown in equation (2):
AW = i ,(( — DL + Γ (2) 本发明中, 用于计算能量的各通道语音帧长度可取为 400ms; 通道间 自适应切换的系统响应时间取为 400ms。 上述帧长由处理器的处理速度决 定, 也可以取其他的长度, 如 450ms或 500ms等。 AW = i , (( - D L + Γ (2) In the present invention, the length of the speech frame of each channel for calculating energy can be taken as 400 ms; the system response time of adaptive switching between channels is taken as 400 ms. The above frame length is processed by The processing speed of the device is determined, and other lengths, such as 450ms or 500ms, can also be taken.
步骤 102, 以 N个通道的第 帧信号的能量的最大值为基准, 对式(2) 所确定的能量值进行归一化处理。 本步骤中, 归一化处理即是将各个通道 的第 帧信号的能量值转换到 0至 1之间的数值, 以便于后续处理。 归一 化处理方式如式(3 )所示, 其中 为对 的归一化处理结果。 el{k)= ^ (3 ) Step 102: Normalize the energy value determined by the equation (2) based on the maximum value of the energy of the frame signal of the N channels. In this step, the normalization process is to convert the energy value of the frame signal of each channel to a value between 0 and 1, for subsequent processing. Normalization The processing method is as shown in the formula (3), where is the normalized processing result of the pair. e l{k) = ^ (3 )
m x(El (k), E2 (k , ·'·,ΕΝ (k)) Mx(E l (k), E 2 (k , ·'·,Ε Ν (k))
其中, max ( )为取最大值计算。 Where max ( ) is the maximum value.
步骤 103, 根据第 ( =1, 2, .··, N)个传声器接收的第 帧信号的归 一化能量计算调整参数。 确定调整参数的目的是使能量数值大的通道上的 语音信号变得更大, 而使能量数值小的通道上的语音信号变得更小, 并依 此拉大能量值较大语音信号与能量值较小语音信号之间的差异, 这样可以 更突出声源方向的信号, 抑制其它方向的信号, 使声音更清晰, 混响更小。 具体的, 对于归一化后的能量值, 对其分别进行幂运算。 本步骤中, 所选 用的调整指数值为大于等于 2小于等于 10的正数。 为方便运算及考虑到语 音信号的差异量, 调整指数一般选用 4、 5、 6。 调整参数 WW的确定方式如 式(4)所示:  Step 103: Calculate an adjustment parameter according to the normalized energy of the first frame signal received by the (=1, 2, . . . , N) microphones. The purpose of determining the adjustment parameters is to make the speech signal on the channel with a large energy value larger, and to make the speech signal on the channel with a smaller energy value smaller, and thereby increase the energy value and the larger speech signal and energy. The difference between the smaller voice signals, which can highlight the signal in the direction of the sound source, suppress the signal in other directions, make the sound clearer and the reverberation is smaller. Specifically, for the normalized energy values, they are respectively subjected to a power operation. In this step, the selected adjustment index value is a positive number greater than or equal to 2 and less than or equal to 10. In order to facilitate the calculation and take into account the difference in the speech signal, the adjustment index is generally selected 4, 5, 6. The adjustment parameter WW is determined as shown in equation (4):
Mk) =[£i(k)f (4) Mk) =[ £i (k)f (4)
其中 称为调整指数, 根据各通道语音帧的能量大小关系调整各通道信号 在输出信号中所占的比重。 It is called the adjustment index, and adjusts the proportion of each channel signal in the output signal according to the energy relationship of the speech frames of each channel.
步骤 104, 计算阵列输出信号中第 ( =1, 2, ..·, N)个传声器釆集的 第 n个釆样点信号的权值; 该权值的变化是根据每个釆样点信号逐步计算 得到的, 具体的, 第 n个釆样点信号的权值^ )的确定方式如式(5 )所示: wt (n) = lwt (n - 1) + (1 - λ)^ (k) (5 ) Step 104: Calculate weights of the nth sample point signals of the (1, 2, .., N) microphone sets in the array output signal; the change of the weight is according to each sample point signal The step-by-step calculation, specifically, the weight of the nth sample point signal ^) is determined as shown in equation (5): w t (n) = lw t (n - 1) + (1 - λ) ^ (k) (5)
其中 为遗忘因子, 以平滑切换前后语音帧音量, 避免语音信号的忽大忽 小, 并抑制切换时通道的语音帧能量变化太大所导致的切换噪声。 为事 先设定的参数, 为大于 0小于 1的数, 为保证语音信号的平滑性, 为 近于 1的数, 本发明中可设定 λ =0.9998; λ也可设定为其他值, 如 0.9996、 0.9992、 0.9990等值。 具体取值由用户希望的平滑性来确定。 步骤 105, 对第 i ( i =1, 2, N)个传声器釆集的信号的每个釆样 点的权值按其中的最大值进行归一化处理。 这主要是为了使传声器阵列输 出的能量最大通道的信号音量与能量最大的通道传声器釆集的信号音量相 等。 对第 ( =1, 2, ..·, N)个传声器釆集的信号的每个釆样点信号的权 值的归一化处理如式(6)所示: max(w1 (n), w2 (n), ···, wN (n)) 其中, max ( )为取最大值计算。 Among them, the forgetting factor is used to smooth the volume of the speech frame before and after switching, to avoid the flickering of the speech signal, and to suppress the switching noise caused by the change of the speech frame energy of the channel when switching. For a parameter set in advance, a number greater than 0 and less than 1, in order to ensure the smoothness of the speech signal, which is a number close to 1, λ = 0.9998 can be set in the present invention; λ can also be set to other values, such as 0.9996, 0.9992, 0.9990, etc. The specific value is determined by the smoothness desired by the user. Step 105: Normalize the weight of each sample point of the signal of the i-th (i =1, 2, N) microphone set according to the maximum value thereof. This is mainly to make the signal volume of the maximum energy channel output by the microphone array equal to the volume of the signal collected by the channel microphone with the largest energy. The normalization process for the weight of each sample signal of the signal set by the (=1, 2, .., N) microphones is as shown in equation (6): max(w 1 (n) , w 2 (n), ···, w N (n)) where max ( ) is the maximum value.
步骤 106, 计算传声器阵列的输出釆样点信号, 并依次输出。 所输出的 各釆样点信号如式(7) 所示:  Step 106: Calculate the output sample signal of the microphone array, and output them in sequence. The output of each sample signal is as shown in equation (7):
N  N
s(n) = ^wi(n)xi(n) 式( 7 )是将传声器阵列中各传声器相同帧的语音信号中的各釆样点依次与 所确定的对应权值进行相乘, 并对各传声器的对应釆样点信号进行累加, 作为输出的釆样点信号。 s(n) = ^w i (n)x i (n) Equation (7) is to multiply each sample point in the speech signal of the same frame of each microphone in the microphone array by the determined corresponding weight. The corresponding sample point signals of the respective microphones are accumulated as the output sample point signals.
本发明中, 实际工作中在进入本算法处理前的典型前端处理为, 通过 传声器将语音信号转化为电信号, 经过放大以及模数转换进入数字信号处 理器(DSP, Digital Signal Processor )处理。  In the present invention, the typical front-end processing before entering the processing of the algorithm in actual work is to convert the voice signal into an electrical signal through a microphone, and perform processing by amplifying and analog-to-digital conversion into a digital signal processor (DSP).
以下以传声器阵列由 4个传声器沿圓周均匀分布为例, 说明各应用环境 下语音信号处理结果。 其中, 圓周的半径为 5cm, 遗忘因子 =0.9998, 调 整指数 "=5.0。  The following is an example in which the microphone array is evenly distributed along the circumference of four microphones to illustrate the results of speech signal processing in each application environment. Among them, the radius of the circumference is 5cm, the forgetting factor is 0.9998, and the adjustment index is "=5.0.
图 2为在混响室两个声源相互切换发声时, 传声器阵列中各传声器拾取 的语音信号语音帧的归一化能量变化关系的示意图, 如图 2所示, 示出了在 混响室两个声源相互切换发声时, 釆用本发明方法计算各传声器拾取的语 音帧的能量后, 传声器阵列中各传声器拾取的语音信号语音帧的归一化能 量变化关系。 图 3为在混响室两个声源相互切换发声时, 传声器阵列的输出信号中各 通道语音帧所占的平均权重变化关系的示意图, 如图 3所示, 在混响室两个 声源相互切换发声时, 釆用本发明方法计算各传声器拾取的语音帧能量后, 传声器阵列的输出信号中各通道语音帧所占的平均权值变化关系, 可以看 出, 本发明能根据各传声器拾音的语音帧能量大小而自动切换, 并且切换 过程自然稳定, 各传声器拾取的语音信号经过本发明方法处理后, 传声器 阵列的输出语音信号音质流畅自然, 混响大大降低。 2 is a schematic diagram showing the relationship between the normalized energy changes of the speech signal speech frames picked up by the microphones in the microphone array when the two sound sources are switched to each other in the reverberation chamber, as shown in FIG. 2, showing the reverberation chamber. When the two sound sources switch to each other, the normalized energy change relationship of the speech signal speech frames picked up by the microphones in the microphone array is calculated by the method of the present invention after calculating the energy of the speech frames picked up by the respective microphones. FIG. 3 is a schematic diagram showing the relationship between the average weights of the voice frames of each channel in the output signal of the microphone array when the two sound sources are switched to each other in the reverberation chamber, as shown in FIG. 3, two sound sources in the reverberation chamber. When switching the sounds of each other, the average weight value of the voice frames of each channel in the output signal of the microphone array is calculated by the method of the present invention, and the present invention can be picked up according to the microphones. The voice frame energy of the sound is automatically switched, and the switching process is naturally stable. After the voice signal picked up by each microphone is processed by the method of the present invention, the sound quality of the output voice signal of the microphone array is smooth and natural, and the reverberation is greatly reduced.
图 4为在混响室两个声源同时发声时, 传声器阵列中各传声器拾取的语 音信号语音帧的归一化能量变化关系的示意图, 如图 4所示, 示出了在混响 室两个声源同时发声时, 釆用本发明方法计算各传声器拾取的语音帧能量, 传声器阵列中各传声器拾取的语音信号语音帧的归一化能量变化关系。  4 is a schematic diagram showing the relationship between the normalized energy changes of the speech signal speech frames picked up by the microphones in the microphone array when the two sound sources are simultaneously sounded in the reverberation chamber, as shown in FIG. 4, which shows two in the reverberation chamber. When a sound source simultaneously emits sound, the method of the present invention calculates the normalized energy change relationship of the speech frame energy picked up by each microphone and the speech signal speech frame picked up by each microphone in the microphone array.
图 5为在混响室两个声源同时发声时, 传声器阵列的输出信号中各通道 语音帧所占的平均权重变化关系的示意图, 如图 5所示, 在混响室两个声源 同时发声时, 釆用本发明方法计算各传声器拾取的语音帧能量, 传声器阵 列的输出信号中各通道语音帧所占的平均权重变化关系。 可以看出, 本发 明能根据各传声器拾音的语音帧能量大小自动切换, 并且切换过程自然稳 定, 各传声器拾取的语音信号经过本发明方法处理后, 传声器阵列的输出 语音信号音质流畅自然。  FIG. 5 is a schematic diagram showing the relationship between the average weights of the speech frames of each channel in the output signal of the microphone array when two sound sources are simultaneously sounded in the reverberation chamber, as shown in FIG. 5, at the same time in the reverberation chamber. When the sound is made, the method of the present invention calculates the average frame weight change of the speech frame energy of each channel in the output signal of the microphone array. It can be seen that the present invention can automatically switch according to the size of the speech frame energy of each microphone pickup, and the switching process is naturally stable. After the voice signal picked up by each microphone is processed by the method of the present invention, the sound quality of the output voice signal of the microphone array is smooth and natural.
图 6为在普通房间两个声源相互切换发声时, 传声器阵列中各传声器拾 取的语音信号语音帧的归一化能量变化关系的示意图, 如图 6所示, 示出了 在普通房间两个声源相互切换发声时, 釆用本发明方法计算各传声器拾取 的语音帧能量, 传声器阵列中各传声器拾取的语音信号语音帧的归一化能 量变化关系。  6 is a schematic diagram showing the relationship of the normalized energy change of the speech signal speech frames picked up by the microphones in the microphone array when the two sound sources are switched to each other in an ordinary room. As shown in FIG. 6, two are shown in the ordinary room. When the sound sources switch to each other, the method of the present invention calculates the normalized energy change relationship of the speech frame energy picked up by each microphone and the speech signal speech frame picked up by each microphone in the microphone array.
图 7为在普通房间两个声源相互切换发声时, 传声器阵列的输出信号中 各通道语音帧所占的平均权重变化关系的示意图, 如图 7所示, 在普通房间 两个声源相互切换发声时, 釆用本发明方法计算各传声器拾取的语音帧能 量, 传声器阵列的输出信号中各通道语音帧所占的平均权重变化关系。 可 以看出, 本发明能根据各传声器拾音的语音帧能量大小而自动切换, 并且 切换过程自然稳定, 各传声器拾取的语音信号经过本发明方法处理后, 传 声器阵列的输出语音信号音质流畅自然, 混响降低。 FIG. 7 is a schematic diagram showing the relationship between the average weights of voice frames of each channel in the output signal of the microphone array when two sound sources are switched to each other in an ordinary room, as shown in FIG. When the two sound sources are switched to each other, the method of the present invention is used to calculate the energy of the speech frame picked up by each microphone, and the average weight change relationship of the voice frames of each channel in the output signal of the microphone array. It can be seen that the present invention can automatically switch according to the size of the speech frame energy of each microphone pickup, and the switching process is naturally stable. After the voice signal picked up by each microphone is processed by the method of the present invention, the sound quality of the output voice signal of the microphone array is smooth and natural. Reverberation is reduced.
图 8为在普通房间两个声源同时发声时, 传声器阵列中各传声器拾取的 语音信号语音帧的归一化能量变化关系的示意图, 如图 8所示, 在普通房间 两个声源同时发声时, 釆用本发明计算各传声器拾取的语音帧能量, 传声 器阵列中各传声器拾取的语音信号语音帧的归一化能量变化关系;  FIG. 8 is a schematic diagram showing the relationship between the normalized energy changes of the speech signal speech frames picked up by the microphones in the microphone array when two sound sources are simultaneously sounded in an ordinary room. As shown in FIG. 8, two sound sources simultaneously sound in the ordinary room. When the present invention is used to calculate the speech frame energy picked up by each microphone, and the normalized energy change relationship of the speech signal speech frames picked up by the microphones in the microphone array;
图 9为在普通房间两个声源同时发声时, 传声器阵列的输出信号中各通 道语音帧所占的平均权重变化关系的示意图, 如图 9所示, 在普通房间两个 声源同时发声时, 釆用本发明计算各传声器拾取的语音帧能量, 传声器阵 列的输出信号中各通道语音帧所占的平均权重变化关系。 可以看出, 本发 明能根据各传声器拾音的语音帧能量大小自动切换, 并且切换过程自然稳 定, 各传声器拾取的语音信号经过本发明的方法处理后, 传声器阵列的输 出语音信号音质流畅自然。  FIG. 9 is a schematic diagram showing the relationship between the average weights of voice frames of each channel in the output signal of the microphone array when two sound sources are simultaneously sounded in an ordinary room. As shown in FIG. 9, when two sound sources are simultaneously sounded in an ordinary room. The invention calculates the average frame weight change of the speech frame energy of each channel in the output signal of the microphone array by using the present invention. It can be seen that the present invention can automatically switch according to the size of the speech frame energy of each microphone pickup, and the switching process is naturally stable. After the voice signal picked up by each microphone is processed by the method of the present invention, the sound quality of the output voice signal of the microphone array is smooth and natural.
通过以上步骤处理的语音信号可以以数字信号的方式输出, 也可以数 模转换后模拟信号输出。  The speech signal processed by the above steps can be output as a digital signal or as an analog signal after digital-to-analog conversion.
图 10 为本发明基于传声器阵列的语音信号处理装置的组成结构示意 图, 如图 10所示, 所述装置包括第一确定单元 100、 第二确定单元 101、 计算单元 102和输出单元 103; 其中,  FIG. 10 is a schematic structural diagram of a structure of a voice signal processing apparatus based on a microphone array according to the present invention. As shown in FIG. 10, the apparatus includes a first determining unit 100, a second determining unit 101, a calculating unit 102, and an output unit 103.
第一确定单元 100,用于确定各指向性传声器接收的相同帧的语音信号 的能量值;  a first determining unit 100, configured to determine an energy value of a voice signal of the same frame received by each directional microphone;
第二确定单元 101 ,用于根据所述能量值确定所述相同帧的各语音信号 的调整参数; 计算单元 102,用于根据各语音信号的调整参数确定语音信号中各取样 点信号的权值, 将各语音信号中各取样点信号与各自的权值相乘, 并对各 语音信号的对应取样点信号的乘积值进行累加; a second determining unit 101, configured to determine, according to the energy value, an adjustment parameter of each voice signal of the same frame; The calculating unit 102 is configured to determine weights of the sampling point signals in the voice signal according to the adjustment parameters of the voice signals, multiply each sampling point signal in each voice signal by a respective weight, and perform corresponding sampling on each voice signal. The product value of the point signal is accumulated;
输出单元 103, 用于将累加后的取样点信号依次输出。  The output unit 103 is configured to sequentially output the accumulated sampling point signals.
本发明中, 传声器阵列由两个以上的指向性传声器构成。  In the present invention, the microphone array is composed of two or more directional microphones.
上述第二确定单元 101 进一步将所述相同帧的各语音信号的能量值分 别与最大的能量值作商; 并对各商值进行指数调整处理, 作为各语音信号 的调整参数。  The second determining unit 101 further compares the energy values of the speech signals of the same frame with the maximum energy value; and performs exponential adjustment processing on each quotient value as an adjustment parameter of each speech signal.
上述第二确定单元 101进一步将各商值的 E次方作为各语音信号的调 整参数; 其中, E为大于等于 2小于等于 10的正数。  The second determining unit 101 further uses the E-th power of each quotient value as an adjustment parameter of each speech signal; wherein E is a positive number greater than or equal to 2 and less than or equal to 10.
上述计算单元 102进一步按下式计算语音信号中各取样点信号的权值: w,(n) = /lw,(n-l) + (l-/l)C;其中, w )为传声器 i中的当前语音信号帧 中第 n个取样点信号的权值, W -1)为传声器 i中的当前语音信号帧中第 n-1个取样点信号的权值; 为预先设定的遗忘因子, 0< <1; C为当前 语音信号帧的调整参数。 The calculating unit 102 further calculates the weights of the sampling point signals in the speech signal according to the following formula: w, (n) = /lw, (nl) + (l-/l) C; wherein, w ) is in the microphone i The weight of the nth sampling point signal in the current speech signal frame, W -1) is the weight of the n-1th sampling point signal in the current speech signal frame in the microphone i; is a predetermined forgetting factor, 0 <<1; C is the adjustment parameter of the current speech signal frame.
上述计算单元 102进一步按下述方式计算语音信号中各取样点信号的 权值:  The above calculating unit 102 further calculates the weight of each sampling point signal in the speech signal as follows:
w,(n) = /lw,(n-l) + (l-/l)C;其中, w )为传声器 i中的当前语音信号帧 中第 n个取样点信号的初始权值, W -1)为传声器 i中的当前语音信号帧 中第 n-1个取样点信号的初始权值; 为预先设定的遗忘因子, 0< <1; C为当前语音信号帧的调整参数; w,(n) = /lw,(nl) + (l-/l)C; where w ) is the initial weight of the nth sample point signal in the current speech signal frame in the microphone i, W -1) The initial weight of the n-1th sampling point signal in the current speech signal frame in the microphone i; is a predetermined forgetting factor, 0<<1; C is an adjustment parameter of the current speech signal frame;
按下式对 w )进行处理, 将 )作为传声器 i中的当前语音信号帧中 第 n个取样点信号的最终权值:  The following method is used to treat w ) as the final weight of the nth sampling point signal in the current speech signal frame in the microphone i:
^(n) = ^ ( (、 Λ 、 —-, 其中, max ( ) 为取最大值计算。 ^(n) = ^ ( (, Λ , ---, where max ( ) is the maximum value.
max ^ (n), w2 ( ),…, wN (n)) 上述传声器阵列为圓形阵列或球形阵列; 所述传声器阵列中的传声器 数量为 3至 16个。 Max ^ (n), w 2 ( ),..., w N (n)) The microphone array described above is a circular array or a spherical array; the number of microphones in the microphone array is 3 to 16.
本领域技术人员应当理解, 图 10所示的基于传声器阵列的语音信号处 理装置是为实现前述的基于传声器阵列的语音信号处理方法而设计的, 图 10所示装置中各处理单元的功能可参照前述方法的描述而理解, 各处理单 元的功能可通过运行于处理器上的程序而实现, 也可通过具体的逻辑电路 而实现。  It should be understood by those skilled in the art that the microphone signal processing apparatus based on the microphone array shown in FIG. 10 is designed to implement the aforementioned voice signal processing method based on the microphone array, and the functions of the processing units in the apparatus shown in FIG. 10 can be referred to. It is understood from the description of the foregoing method that the functions of the various processing units can be implemented by a program running on a processor, or by a specific logic circuit.
以上所述, 仅为本发明的较佳实施例而已, 并非用于限定本发明的保 护范围。  The above is only the preferred embodiment of the present invention and is not intended to limit the scope of the present invention.

Claims

权利要求书 Claim
1、 一种基于传声器阵列的语音信号处理方法, 其特征在于, 所述传声 器阵列由两个以上的指向性传声器构成; 所述方法包括:  A voice signal processing method based on a microphone array, characterized in that the microphone array is composed of two or more directional microphones; the method comprises:
确定各指向性传声器接收的相同帧的语音信号的能量值;  Determining an energy value of a speech signal of the same frame received by each directional microphone;
根据所述能量值确定所述相同帧的各语音信号的调整参数;  Determining, according to the energy value, an adjustment parameter of each voice signal of the same frame;
根据各语音信号的调整参数确定语音信号中各取样点信号的权值, 将 各语音信号中各取样点信号与各自的权值相乘, 并对各语音信号的对应取 样点信号的乘积值进行累加, 将累加后的取样点信号依次输出。  Determining the weight of each sampling point signal in the speech signal according to the adjustment parameter of each speech signal, multiplying each sampling point signal in each speech signal by a respective weight value, and performing a product value of the corresponding sampling point signal of each speech signal Accumulatively, the accumulated sampling point signals are sequentially output.
2、 根据权利要求 1所述的方法, 其特征在于, 所述根据所述能量值确 定所述相同帧的各语音信号的调整参数, 为:  2. The method according to claim 1, wherein the determining an adjustment parameter of each voice signal of the same frame according to the energy value is:
将所述相同帧的各语音信号的能量值分别与最大的能量值作商; 对各商值进行指数调整处理, 并作为各语音信号的调整参数。  The energy values of the speech signals of the same frame are respectively compared with the maximum energy value; the quotient values are subjected to exponential adjustment processing, and are used as adjustment parameters of the respective speech signals.
3、 根据权利要求 2所述的方法, 其特征在于, 所述对各商值进行指数 调整处理, 并作为各语音信号的调整参数, 为:  The method according to claim 2, wherein the exponential adjustment processing is performed on each quotient value, and as an adjustment parameter of each voice signal, it is:
将各商值的 E次方作为各语音信号的调整参数; 其中, E为大于等于 2 小于等于 10的正数。  The E-th power of each quotient is used as an adjustment parameter of each speech signal; wherein E is a positive number greater than or equal to 2 and less than or equal to 10.
4、 根据权利要求 1所述的方法, 其特征在于, 所述根据各语音信号的 调整参数确定语音信号中各取样点信号的权值 , 具体按下式计算:  The method according to claim 1, wherein the determining, according to the adjustment parameter of each voice signal, the weight of each sampling point signal in the voice signal, which is calculated according to the following formula:
w,(n) = /lw,(n -l) + (l-/l)C ;其中, w )为传声器 i中的当前语音信号帧 中第 n个取样点信号的权值, W -1)为传声器 i中的当前语音信号帧中第 n-1个取样点信号的权值; 为预先设定的遗忘因子, 0< <1; C为当前语 音信号帧的调整参数。 w,(n) = /lw,(n -l) + (l-/l)C ; where w ) is the weight of the nth sample point signal in the current speech signal frame in the microphone i, W -1 Is the weight of the n-1th sampling point signal in the current speech signal frame in the microphone i; is a predetermined forgetting factor, 0<<1; C is an adjustment parameter of the current speech signal frame.
5、 根据权利要求 1所述的方法, 其特征在于, 所述根据各语音信号的 调整参数确定语音信号中各取样点信号的权值, 为:  The method according to claim 1, wherein the determining, according to the adjustment parameter of each voice signal, the weight of each sampling point signal in the voice signal is:
w,(n) = /lw,(n -l) + (l-/l)C ;其中, w )为传声器 i中的当前语音信号帧 中第 n个取样点信号的初始权值, W -1)为传声器 i中的当前语音信号帧 中第 n-1个取样点信号的初始权值; 为预先设定的遗忘因子, 0< <1; C 为当前语音信号帧的调整参数; w,(n) = /lw,(n -l) + (l-/l)C ; where w ) is the current speech signal frame in the microphone i The initial weight of the nth sampling point signal, W -1) is the initial weight of the n-1th sampling point signal in the current speech signal frame in the microphone i; is a predetermined forgetting factor, 0<<1; C is the adjustment parameter of the current speech signal frame;
按下式对 w )进行处理, 并将 )作为传声器 i中的当前语音信号帧 中第 n个取样点信号的最终权值:  The w) is processed as follows, and is used as the final weight of the nth sample point signal in the current speech signal frame in the microphone i:
^ (n) = ^ ( (、 Λ 、 —- , 其中, max ( ) 为取最大值计算。 ^ (n) = ^ ( (, Λ , --- , where max ( ) is the maximum value.
max ^ (n), w2 ( ),…, wN (n)) Max ^ (n), w 2 ( ),..., w N (n))
6、 根据权利要求 1至 5任一项所述的方法, 其特征在于, 所述传声器 阵列为圓形阵列或球形阵列;所述传声器阵列中的传声器数量为 3至 16个。  The method according to any one of claims 1 to 5, characterized in that the array of microphones is a circular array or a spherical array; the number of microphones in the array of microphones is from 3 to 16.
7、 一种基于传声器阵列的语音信号处理装置, 其特征在于, 所述传声 器阵列由两个以上的指向性传声器构成; 所述装置包括第一确定单元、 第 二确定单元、 计算单元和输出单元; 其中,  7. A speech signal processing apparatus based on a microphone array, wherein the microphone array is composed of two or more directional microphones; the apparatus comprises a first determining unit, a second determining unit, a calculating unit and an output unit ; among them,
第一确定单元, 用于确定各指向性传声器接收的相同帧的语音信号的 能量值;  a first determining unit, configured to determine an energy value of a voice signal of the same frame received by each directional microphone;
第二确定单元, 用于根据所述能量值确定所述相同帧的各语音信号的 调整参数;  a second determining unit, configured to determine, according to the energy value, an adjustment parameter of each voice signal of the same frame;
计算单元, 用于根据各语音信号的调整参数确定语音信号中各取样点 信号的权值, 将各语音信号中各取样点信号与各自的权值相乘, 并对各语 音信号的对应取样点信号的乘积值进行累加;  a calculating unit, configured to determine, according to an adjustment parameter of each voice signal, a weight of each sampling point signal in the voice signal, multiply each sampling point signal in each voice signal by a respective weight, and corresponding sampling points of each voice signal The product value of the signal is accumulated;
输出单元, 用于将累加后的取样点信号依次输出。  The output unit is configured to sequentially output the accumulated sampling point signals.
8、 根据权利要求 7所述的装置, 其特征在于, 所述第二确定单元进一 步将所述相同帧的各语音信号的能量值分别与最大的能量值作商; 并对各 商值进行指数调整处理, 作为各语音信号的调整参数。  The device according to claim 7, wherein the second determining unit further compares the energy values of the voice signals of the same frame with the maximum energy value; and indexes each quotient value The adjustment process is used as an adjustment parameter for each voice signal.
9、 根据权利要求 8所述的装置, 其特征在于, 所述第二确定单元进一 步将各商值的 E次方作为各语音信号的调整参数; 其中, E为大于等于 2 小于等于 10的正数。 The device according to claim 8, wherein the second determining unit further uses the E-th power of each quotient as an adjustment parameter of each voice signal; wherein, E is greater than or equal to 2 and less than or equal to 10 number.
10、 根据权利要求 7所述的装置, 其特征在于, 所述计算单元进一步 按下式计算语音信号中各取样点信号的权值: 10. The apparatus according to claim 7, wherein the calculating unit further calculates a weight of each sampling point signal in the voice signal according to the following formula:
w,(n) = /lw,(n-l) + (l-/l)C;其中, w )为传声器 i中的当前语音信号帧 中第 n个取样点信号的权值, W -1)为传声器 i中的当前语音信号帧中第 n-1个取样点信号的权值; 为预先设定的遗忘因子, 0< <1; C为当前语 音信号帧的调整参数。 w,(n) = /lw,(nl) + (l-/l)C; where w ) is the weight of the nth sample point signal in the current speech signal frame in the microphone i, W -1) The weight of the n-1th sampling point signal in the current speech signal frame in the microphone i; is a predetermined forgetting factor, 0<<1; C is an adjustment parameter of the current speech signal frame.
11、 根据权利要求 7 所述的装置, 其特征在于, 所述计算单元进一步 按下述方式计算语音信号中各取样点信号的权值:  11. The apparatus according to claim 7, wherein the calculating unit further calculates the weight of each sampling point signal in the voice signal as follows:
w,(n) = /lw,(n-l) + (l-/l)C;其中, w )为传声器 i中的当前语音信号帧 中第 n个取样点信号的初始权值, W -1)为传声器 i中的当前语音信号帧 中第 n-1个取样点信号的初始权值; 为预先设定的遗忘因子, 0< <1; C 为当前语音信号帧的调整参数; w,(n) = /lw,(nl) + (l-/l)C; where w ) is the initial weight of the nth sample point signal in the current speech signal frame in the microphone i, W -1) The initial weight of the n-1th sampling point signal in the current speech signal frame in the microphone i; is a predetermined forgetting factor, 0<<1; C is an adjustment parameter of the current speech signal frame;
按下式对 w )进行处理, 将 )作为传声器 i中的当前语音信号帧中 第 n个取样点信号的最终权值:  The following method is used to treat w ) as the final weight of the nth sampling point signal in the current speech signal frame in the microphone i:
^(n) = ^ ( (、 Λ 、 —-, 其中, max ( ) 为取最大值计算。 ^(n) = ^ ( (, Λ , ---, where max ( ) is the maximum value.
max(wl (n), w2 (n), ···, wN (n)) Max(w l (n), w 2 (n), ···, w N (n))
12、 根据权利要求 7至 11任一项所述的装置, 其特征在于, 所述传声 器阵列为圓形阵列或球形阵列; 所述传声器阵列中的传声器数量为 3至 16 The apparatus according to any one of claims 7 to 11, wherein the microphone array is a circular array or a spherical array; the number of microphones in the microphone array is 3 to 16
PCT/CN2011/074794 2010-06-08 2011-05-27 Speech signal processing method and device based on microphone array WO2011153904A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010197159.9 2010-06-08
CN201010197159.9A CN101867853B (en) 2010-06-08 2010-06-08 Speech signal processing method and device based on microphone array

Publications (1)

Publication Number Publication Date
WO2011153904A1 true WO2011153904A1 (en) 2011-12-15

Family

ID=42959367

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/074794 WO2011153904A1 (en) 2010-06-08 2011-05-27 Speech signal processing method and device based on microphone array

Country Status (2)

Country Link
CN (1) CN101867853B (en)
WO (1) WO2011153904A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101867853B (en) * 2010-06-08 2014-11-05 中兴通讯股份有限公司 Speech signal processing method and device based on microphone array
CN103124386A (en) * 2012-12-26 2013-05-29 山东共达电声股份有限公司 De-noising, echo-eliminating and acute directional microphone for long-distance speech
WO2015114674A1 (en) * 2014-01-28 2015-08-06 三菱電機株式会社 Sound collecting device, input signal correction method for sound collecting device, and mobile apparatus information system
CN105652243B (en) * 2016-03-14 2017-12-05 西南科技大学 Multichannel group sparse linear predicts delay time estimation method
CN110570874B (en) * 2018-06-05 2021-10-22 中国科学院声学研究所 System and method for monitoring sound intensity and distribution of wild birds

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009009568A2 (en) * 2007-07-09 2009-01-15 Mh Acoustics, Llc Augmented elliptical microphone array
WO2009042948A1 (en) * 2007-09-28 2009-04-02 Qualcomm Incorporated Multiple microphone voice activity detector
CN101658052A (en) * 2007-03-21 2010-02-24 弗劳恩霍夫应用研究促进协会 Method and apparatus for enhancement of audio reconstruction
CN101867853A (en) * 2010-06-08 2010-10-20 中兴通讯股份有限公司 Speech signal processing method and device based on microphone array

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101658052A (en) * 2007-03-21 2010-02-24 弗劳恩霍夫应用研究促进协会 Method and apparatus for enhancement of audio reconstruction
WO2009009568A2 (en) * 2007-07-09 2009-01-15 Mh Acoustics, Llc Augmented elliptical microphone array
WO2009042948A1 (en) * 2007-09-28 2009-04-02 Qualcomm Incorporated Multiple microphone voice activity detector
CN101867853A (en) * 2010-06-08 2010-10-20 中兴通讯股份有限公司 Speech signal processing method and device based on microphone array

Also Published As

Publication number Publication date
CN101867853A (en) 2010-10-20
CN101867853B (en) 2014-11-05

Similar Documents

Publication Publication Date Title
CN103871421B (en) A kind of self-adaptation noise reduction method and system based on subband noise analysis
CN109817209B (en) Intelligent voice interaction system based on double-microphone array
CN106716526B (en) Method and apparatus for enhancing sound sources
EP3701525A1 (en) Electronic device using a compound metric for sound enhancement
EP2238592A2 (en) Method for reducing noise in an input signal of a hearing device as well as a hearing device
CN114827859A (en) Hearing device comprising a recurrent neural network and method for processing an audio signal
JP2009522942A (en) System and method using level differences between microphones for speech improvement
WO2010075750A1 (en) Gain control method, gain control equipment in multiple sound channels system, and voice processing system
WO2011153904A1 (en) Speech signal processing method and device based on microphone array
US20220109929A1 (en) Cascaded adaptive interference cancellation algorithms
CN112242148B (en) Headset-based wind noise suppression method and device
CN104185116B (en) A kind of method for automatically determining acoustically radiating emission mode
JP5246120B2 (en) Sound collecting device, gain control method, and program
JP2021505933A (en) Voice enhancement of audio signals with modified generalized eigenvalue beamformer
TW202147862A (en) Robust speaker localization in presence of strong noise interference systems and methods
WO2016154150A1 (en) Sub-band mixing of multiple microphones
CN108694956A (en) Hearing device and correlation technique with adaptive sub-band beam forming
US11205437B1 (en) Acoustic echo cancellation control
CN110517704A (en) A kind of speech processing system based on microphone array beamforming algorithm
JP2020102835A (en) Method for improving spatial hearing perception of binaural hearing aid
Gong et al. Parameter selection methods of delay and beamforming for cochlear implant speech enhancement
EP4000063A1 (en) Speech-tracking listening device
WO2022192580A1 (en) Dereverberation based on media type
US20230138240A1 (en) Compensating Noise Removal Artifacts
CN106331959B (en) The noise-reduction method and device of directional microphone

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11791895

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11791895

Country of ref document: EP

Kind code of ref document: A1