WO2021093798A1 - 用于选择麦克风阵列的输出波束的方法 - Google Patents
用于选择麦克风阵列的输出波束的方法 Download PDFInfo
- Publication number
- WO2021093798A1 WO2021093798A1 PCT/CN2020/128274 CN2020128274W WO2021093798A1 WO 2021093798 A1 WO2021093798 A1 WO 2021093798A1 CN 2020128274 W CN2020128274 W CN 2020128274W WO 2021093798 A1 WO2021093798 A1 WO 2021093798A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- vector
- current beam
- energy
- existence probability
- power spectrum
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 239000013598 vector Substances 0.000 claims abstract description 117
- 238000001228 spectrum Methods 0.000 claims abstract description 92
- 238000012935 Averaging Methods 0.000 claims description 8
- 230000005236 sound signal Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0224—Processing in the time domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L2025/783—Detection of presence or absence of voice signals based on threshold decision
Definitions
- the invention relates to the output beam selection of a microphone array, in particular to a microphone array output beam selection method based on the existence probability of speech.
- the microphone array can perform beamforming in multiple directions. However, due to the limitation of the output hardware resources or application scenarios, it is usually only allowed to select a beam in a certain direction as the output signal.
- the output beam selection of the microphone array is essentially an estimation of the source direction of the voice signal. Correctly judging the direction of the voice signal can maximize the application effect of the beamforming algorithm; conversely, choosing a non-optimal beam as the output will greatly reduce the noise suppression effect of the beamforming algorithm. Therefore, in practice, the output beam selection mechanism, as the follow-up link of the beamforming algorithm, is of great significance to the research and development of voice signal processing systems using microphone arrays.
- MCU Micro Control Unit
- Chinese Patent No. CN103888861B discloses a microphone array directivity adjustment method, in which the method first receives voice information, and judges the information of the pre-speaker based on the voice information, and determines the location of the pre-speaker according to the judgment result. direction.
- This method needs to store the speaker's identity information in advance, and beam pointing cannot be adjusted for speakers that are not stored.
- Chinese Patent Application Publication No. CN109119092A discloses a beam direction switching method based on a microphone array. The method only uses the phase delay information between the microphones and the energy information of each beam, and cannot distinguish between human voices and non-beams. The human voice signal is therefore easily disturbed by loud noises.
- Chinese Patent Application Publication No. CN109473118A discloses a dual-channel speech enhancement method, in which the target beam is only enhanced according to the existence probability of the sound to be enhanced in the target beam, and the sound existence probability between each beam is determined The ratio is used for beam selection.
- this method has the disadvantage of being susceptible to interference from low-volume unsteady signals.
- Chinese Patent Application Publication No. CN108899044A discloses a voice signal processing method, in which the existence probability of a wake-up word is used to determine the relevance of the voice signal to the content, which specifically includes first inputting the voice signal into the wake-up engine, and obtains the wake-up engine output Confidence of the voice signal, then calculate the probability of voice existence and calculate the direction of arrival of the original input signal.
- this method relies on the wake-up engine to calculate the existence probability of a specific word or sentence, which needs to rely on speech recognition technology to achieve, so it can only be applied to voice signal processing with wake-up function system.
- the wake word existence probability calculation and vector calculation required by the method increase the calculation complexity of the method, which is not conducive to implementation on resource-constrained devices such as the Internet of Things Micro Control Unit (MCU).
- MCU Internet of Things Micro Control Unit
- the object of the present invention is to provide a method for selecting the output beam of a microphone array, which does not rely on pre-stored speaker information, does not require wake-up word recognition before recognizing the direction of arrival, and can reduce the volume. Both large noise interference and low-volume unsteady signal interference, as well as reduced computational complexity.
- a method for selecting an output beam of a microphone array includes the following steps: (a) receiving a plurality of sound signals from a microphone array including a plurality of microphones, and Perform beamforming on a sound signal to obtain multiple beams and corresponding beam output signals; (b) for each of the multiple beams, perform the following operation: convert the beam output signal of the current beam from the time domain to Frequency domain to obtain the spectrum vector and power spectrum vector of the current beam; based on the spectrum vector and power spectrum vector of the current beam, calculate the integrated voice signal energy of the current beam, where the integrated voice signal energy is the integrated energy and integrated voice of the current beam.
- the product of the existence probability wherein the comprehensive energy indicates the energy level of the beam output signal of the current beam, the comprehensive voice existence probability indicates the probability of the existence of voice in the beam output signal of the current beam, and the comprehensive voice existence probability and the The integrated energy is a scalar; and (c) the beam with the largest energy value of the integrated speech signal is selected as
- the spectrum vector is obtained by performing short-time Fourier transform or short-time discrete cosine transform on the beam output signal of the current beam.
- step (b) after obtaining the spectrum vector and power spectrum vector of the current beam, the power spectrum vector is updated with the spectrum vector according to the following formula:
- t represents the frame index
- f represents the frequency point
- S b (f, t-1) is the power spectrum corresponding to the element of the current beam power spectrum vector at the frequency point f in the t-1 frame
- S b (f ,t) is the power spectrum corresponding to the element at the frequency point f on the t-th frame of the power spectrum vector of the current beam
- ⁇ 1 is a parameter greater than 0 and less than 1
- Y b (f,t) is the current beam
- ⁇ 1 is greater than or equal to 0.9 and less than or equal to 0.99.
- step (b) before calculating the integrated speech signal energy of the current beam based on the spectrum vector and power spectrum vector of the current beam, it is determined that the local energy corresponding to each element in the power spectrum vector of the current beam is the lowest value.
- determining the minimum local energy value corresponding to each element in the power spectrum vector of the current beam includes: maintaining two vectors S b,min and S b,tmp with the same length as the spectrum vector and with an initial value of zero;
- t represents the frame index
- f represents the frequency point
- S b, min (f, t) represents the lowest value of the local energy corresponding to the element of the power vector spectrum of the current beam at the frequency point f on the t-th frame
- S b, min (f,t-1) represents the lowest value of the local energy corresponding to the element of the power vector spectrum of the current beam at the frequency point f on the t-1 frame
- S b (f,t) represents the power vector spectrum of the current beam The corresponding power spectrum of the element at the frequency point f on the t-th frame
- S b,tmp (f,t) represents the local energy temporary corresponding to the element of the power vector spectrum of the current beam at the frequency point f on the t-th frame
- S b, tmp (f, t-1) represents the temporary lowest value of the local energy corresponding to the element of the power vector spectrum of the current beam at the frequency point f on the t-1 frame
- the L is set such that the L frame signal includes a signal of 200 milliseconds to 500 milliseconds.
- the integrated energy is obtained according to the following steps: averaging all elements of the power spectrum vector as the integrated energy.
- averaging all elements of the power spectrum vector as the integrated energy includes:
- the integrated speech existence probability is obtained according to the following steps: For each element in the signal power spectrum vector of the current beam, according to the speech existence probability model, calculate the corresponding signal power spectrum vector. The speech existence probability of the element to generate the speech existence probability vector of the current beam; and perform the following steps to update each element of the speech existence probability vector of the current beam:
- t represents the frame index
- f represents the frequency point
- p b is the speech existence probability vector of the current beam
- p b (f, t-1) is the speech existence probability vector of the current beam at the frequency point on the t-1 frame
- p b (f, t) is the speech existence probability vector of the current beam’s speech existence probability vector in the t-th frame at the frequency point f
- ⁇ 2 is greater than 0 and less than 1 parameter
- S b (f,t) is the power spectrum corresponding to the element of the power spectrum vector of the current beam
- S b,min (f,t) is the lowest local energy value corresponding to the element of the power spectrum vector of the current beam
- ⁇ 1 is used Threshold for determining whether the current frame has a voice signal
- ⁇ 2 is greater than or equal to 0.8 and less than or equal to 0.99.
- averaging all the elements of the speech existence probability vector to serve as the comprehensive speech existence probability includes: performing a weighted average on all elements of the speech existence probability vector to serve as the comprehensive speech existence probability, where For each element in the speech existence probability vector, if the frequency point corresponding to the element is in the range of 0 to 5 kHz, the element is assigned a weight of 1, otherwise, a weight of 0 is assigned.
- step (b) after the integrated voice signal energy of the current beam is calculated, the integrated voice signal energy of the current beam is updated according to the following operations:
- d b (t) ⁇ 3 d b (t-1)+(1- ⁇ 3 )J(b,t),
- d b (t-1) is the integrated voice signal energy of the current beam on the t-1 frame
- d b (t) is the integrated voice signal energy of the current beam on the t frame
- the function J(b,t) represents the energy of the speech signal of the current frame, and its value is:
- ⁇ 2 is the threshold used to decide whether to set the value of the function J(b,t) to zero.
- ⁇ 3 is greater than or equal to 0.8 and less than or equal to 0.99.
- the solution of the present invention calculates the integrated voice signal energy of each beam, so as to select the output beam of the microphone array accordingly.
- the integrated speech signal energy fully takes into account the integrated energy of the beam and the integrated speech existence probability, and the beam selection is performed through both the beam energy and the speech existence probability, which neither requires pre-acquisition of speaker information, but also overcomes non-human voice Noise interference, and there is no need to perform any speech recognition before identifying the direction of arrival.
- the integrated speech signal energy is the product of scalar quantities, which reduces vector calculations and reduces computational complexity.
- Fig. 1 is a schematic flowchart of an exemplary embodiment of a method for selecting an output beam of a microphone array according to the present invention
- Fig. 2 is a schematic flowchart of a detailed example embodiment of a method for selecting an output beam of a microphone array according to the present invention.
- Fig. 3 is a schematic flowchart of updating the local energy minimum estimate in an embodiment of the method for selecting the output beam of the microphone array according to the present invention.
- Fig. 1 is a schematic flowchart of an exemplary embodiment of a method for selecting an output beam of a microphone array according to the present invention.
- the method 100 shown in FIG. 1 includes: (a) as shown in step 102, receiving a plurality of sound signals from a microphone array including a plurality of microphones, and beamforming the plurality of sound signals to obtain a plurality of beams and corresponding beams output signal.
- the method 100 further includes: (b) as shown in steps 104 to 108, for each of the multiple beams, performing the following operation: converting the beam output signal of the current beam from the time domain to the frequency domain to The spectrum vector and power spectrum vector of the current beam are obtained (step 104); based on the spectrum vector and power spectrum vector of the current beam, the integrated speech signal energy of the current beam is calculated (step 106), where the integrated speech signal energy is the synthesis of the current beam.
- the product of the energy and the integrated speech existence probability where the integrated energy indicates the energy level of the beam output signal of the current beam, the integrated speech existence probability indicates the probability of the existence of speech in the beam output signal of the current beam, and the existence of the integrated speech
- the probability and the integrated energy are scalars.
- the method further includes: (c) as shown in step 110, selecting the beam with the largest energy value of the integrated speech signal as the output beam.
- Fig. 2 is a schematic flowchart of a detailed example embodiment of a method for selecting an output beam of a microphone array according to the present invention.
- the method 200 starts in step 202, in which the beam output by the beamforming algorithm is transformed into the STFT domain, and the power spectrum vector of each beam is updated with the spectrum information.
- the beamforming algorithm outputs B beams, which are respectively transformed into the Short-Time Fourier Transform (STFT, Short-Time Fourier Transform) domain of point F
- the output signal can be expressed as an F-dimensional spectrum vector Y b in the STFT domain
- the f-th element Y b (f) of the vector Y b represents the spectrum information of the signal at the frequency point f.
- the independent variable t represents time (ie frame index), such as S b (f, t-1) and S b (f, t) respectively represent S b at the t-1 frame and t frame at the frequency point f Value, S b, min and S b, tmp and other variables in the following also use this method of expression.
- the parameter ⁇ 1 is between 0 and 1. The larger the value, the smaller the update degree of the power spectrum, which can better resist the influence of transient noise, but it is easier to mismatch with the real current instantaneous energy value. The preferred value is 0.9 to 0.99.
- the modulus of the vector Y b on the frequency f represents the power spectrum of the signal in the current frame (that is, the t-th frame, the same below) at the frequency f; by using
- the subsequent steps can preferably be calculated using the updated power spectrum vector, so as to make the system relatively stable.
- step 204 the estimation of the minimum local energy value S b,min of the current beam is updated.
- the local energy minimum estimate can be updated according to the method 300 shown in FIG. 3.
- FIG. 3 shows a specific method, the implementation of the present invention is not limited to this.
- Martin R.'s "Spectral subtraction based on minimum statistics" Martin, R.: Spectral subtraction based on minimum statistics. 1994, Proceedings of 7 th EUSIPCO, 1182-1185
- a variant of this method can be used to update the current Estimation of the minimum local energy of the beam S b,min.
- step 304 it is determined whether there is a next element in the power spectrum vector S b of the current beam. If yes, go to step 306; if not, it means that each element of the power spectrum vector of the current beam has been processed, and go to step 312 to obtain the lowest local energy value corresponding to each element.
- step 306 the current element corresponding to each frequency point is updated as follows:
- step 308 it is judged whether the L frame signal has been processed, that is, it is judged whether t is a multiple of L.
- S b, min and S b, tmp are reset as follows:
- the vector S b,min is the minimum value of the local (L frame signal). Since the signal must be noise or the accumulation of noise and speech at any moment, it can be approximately considered that S b,min represents the intensity of noise energy. This method is essentially based on the assumption that the speech signal is an unstable signal and the noise is a stable signal. The smaller the value of L, the lower the requirement for noise stability, but the degree of distinction between noise signal and speech signal The smaller; the value of this parameter is also related to the length of each frame of signal. In the preferred embodiment of the present invention, the L frame signal should roughly include a signal between 200 milliseconds and 500 milliseconds.
- the voice existence probability at each frequency point of the current beam is updated.
- the probability of the existence of the speech signal at each frequency point can be represented by the vector p b , and it can be updated as follows:
- the parameter ⁇ 2 is between 0 and 1, and the recommended setting is 0.8 to 0.99;
- the parameter ⁇ 1 represents the threshold used to determine whether the current frame has a voice signal.
- step 206 can use Cohen I and Berdugo B's "Noise Estimation by Minima Controlled Recursive Means for Robust Speech Enhancement" (Cohen, I. and Berdugo, B.: Noise estimation by minima controlled recursive averaging for robust speech enhancement. 2002, IEEE Signal Processing Letters, 9(1): 12-15) or its variants. It can also be replaced by other speech signal probability estimation algorithms. Similarly, the input of the algorithm is required to be the signal power spectrum S b , and the output is the speech probability p b between 0 and 1.
- the weighted average of the speech existence probability vector is performed to obtain the comprehensive speech probability of the current beam. Specifically, a weighted average is performed on the vector p b. A weight of 1 is assigned to the frequency points in the range of 0-5 kHz, otherwise a weight of 0 is assigned to obtain the comprehensive speech existence probability q b of the beam b.
- the scalar q b instead of the vector p b will be used for calculation, which will simplify the calculation; at the same time, since the human voice frequency is almost impossible to exceed 5kHz, it can be considered that discarding signals higher than this frequency will not affect the final result.
- the power spectrum vector is weighted and averaged to obtain the integrated energy of the current beam.
- the same weighted average is performed on the vector S b to obtain the integrated energy e b of the beam b.
- a weighted average is performed on the vector S b.
- a weight of 1 is assigned to the frequency points in the range of 0-5kHz, otherwise a weight of 0 is assigned.
- step 212 the integrated speech signal energy of the current beam is calculated.
- the parameter ⁇ 3 is between 0 and 1, and the recommended setting is 0.8 to 0.99.
- the function J(b) represents the energy of the speech signal of the current frame, and its value is
- the parameter ⁇ 2 represents the threshold used to decide whether to set the function value to zero.
- step 214 it is determined whether there is a next beam. If yes, go back to step 204 and execute steps 204-212 for the next beam; if not, go to step 218.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- General Health & Medical Sciences (AREA)
- Otolaryngology (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
一种用于选择麦克风阵列的输出波束的方法,包括:(a)从包括多个麦克风的麦克风阵列接收多个声音信号,对其进行波束成形以得到多个波束以及对应的波束输出信号(102);(b)对每个波束,执行下述操作:将当前波束的波束输出信号从时域转换至频域,以得出当前波束的频谱向量和功率谱向量(104);基于当前波束的频谱向量和功率谱向量,计算当前波束的综合语音信号能量,其中综合语音信号能量为当前波束的综合能量和综合语音存在概率的乘积,其中综合能量指示当前波束的波束输出信号的能量水平,综合语音存在概率指示当前波束的波束输出信号中存在语音的概率,且综合语音存在概率和综合能量为标量(106);及(c)选取综合语音信号能量值最大的波束作为输出波束(110)。
Description
本发明涉及麦克风阵列的输出波束选择,具体涉及一种基于语音存在概率的麦克风阵列输出波束选择方法。
麦克风阵列可以进行多个方向的波束成形,但是,由于输出端硬件资源或应用场景的限制,通常只允许选择某一个方向上的波束作为输出信号。麦克风阵列的输出波束选择本质上是对语音信号来源方向的估计。正确判断语音信号的方向,可以最大化波束成形算法的应用效果;反之,选择非最优的波束作为输出将会大大降低波束成形算法对噪声的抑制效果。因此,在实践中,输出波束选择机制作为波束成形算法的后继环节,对使用麦克风阵列的语音信号处理系统的研究与开发具有非常重要的意义。
发明人注意到,虽然现有技术中已尝试提出不同的麦克风阵列输出波束选择方法,但这些现有方法至少还存在以下不足:
1)依赖于预先存储的说话人信息或依赖于在识别波达方向之前进行唤醒词识别;
2)难以同时应对音量较大的噪声干扰和小音量非稳定信号干扰;以及
3)未针对物联网微控制单元(MCU)等资源受限设备或应用场景进行充分优化以降低计算复杂度。
例如,中国专利CN103888861B号公开了一种麦克风阵列指向性调节方法,其中该方法首先接收语音信息,并根据所述语音信息判断预讲话人的信息,根据判断结果,确定所述预讲话人所在的方向。该方法需要预先存储说话人的身份信息,而对未存储的说话人无法进行波束指向调节。
又如,中国专利申请公开CN109119092A号公开了一种基于麦克风阵列的波束指向切换方法,其中该方法只利用了各麦克风之间的相位延时信息和各波束的能量信息,无法区分人声和非人声信号,因而容易被音量较大的噪声干扰。
再如,中国专利申请公开CN109473118A号公开了一种双通道语音增强方法,其中仅根据目标波束中待增强声音的存在概率对所述目标波束进行增强,并基于各波束相互之间语音存在概率的比值进行波束选择。在实践中,该方法存在容易受到小音量非稳定信号干扰的缺点。
另如,中国专利申请公开CN108899044A号公开了一种语音信号处理方法,其中利用唤醒词存在概率确定语音信号与内容的关联性,具体包括先将语音信号输入至唤醒引擎中,并获取唤醒引擎输出的语音信号置信度,然后再计算语音存在概率并计算原始输入信号的波达方向。然而,在能够对波达方向进行判断之前,该方法依赖于唤醒引擎计算得到特定字词或语句的存在概率,这需要依赖语音识别技术实现,因此只能应用于带有唤醒功能的语音信号处理系统。另外,该方法所要求的唤醒词存在概率计算以及向量运算,增加了该方法的计算复杂度,不利于在例如物联网微控制单元(MCU)等资源受限设备上实施。
综上,现有技术中需要一种用于选择麦克风阵列的输出波束的方法,以解决现有技术中存在的上述问题。应理解,上述所列举的技术问题仅作为示例而非对本发明的限制,本发明并不限于同时解决上述所有技术问题的技术方案。本发明的技术方案可以实施为解决上述或其他技术问题中的一个或多个。
发明内容
针对上述问题,本发明的目的在于提供一种用于选择麦克风阵列的输出波束的方法,其不依赖于预先存储的说话人信息、不需要在识别波达方向之前进行唤醒词识别、能够减轻音量较大的噪声干扰和小音量非稳定信号干扰两者,以及具有降低的计算复杂度。
在本发明的一方面,提供一种用于选择麦克风阵列的输出波束的方法,所述方法包括下述步骤:(a)从包括多个麦克风的麦克风阵列接收多个声音信号,对所述多个声音信号进行波束成形以得到多个波束以及对应的波束输出信号;(b)对于所述多个波束中的每个波束,执行下述操作:将当前波束的波束输出信号从时域转换至频域,以得出当前波束的频谱向 量和功率谱向量;基于当前波束的频谱向量和功率谱向量,计算当前波束的综合语音信号能量,其中综合语音信号能量为当前波束的综合能量和综合语音存在概率的乘积,其中所述综合能量指示当前波束的波束输出信号的能量水平,所述综合语音存在概率指示当前波束的波束输出信号中存在语音的概率,且所述综合语音存在概率和所述综合能量为标量;以及(c)选取综合语音信号能量值最大的波束作为输出波束。
可选地,所述频谱向量是对当前波束的波束输出信号进行短时傅里叶变换或短时离散余弦变换得出的。
可选地,在步骤(b)中,在得出当前波束的频谱向量和功率谱向量之后,根据下述公式用频谱向量更新功率谱向量:
S
b(f,t)=α
1S
b(f,t-1)+(1-α
1)|Y
b(f,t)|
2,
其中:t表示帧索引;f表示频点;S
b(f,t-1)为当前波束的功率谱向量在第t-1帧在频点f处的元素对应的功率谱;S
b(f,t)为当前波束的功率谱向量在第t帧上在频点f处的元素对应的功率谱;α
1为大于0且小于1的参数;以及Y
b(f,t)为当前波束的频谱向量在第t帧上在频点f处的元素对应的频谱。
优选地,α
1大于等于0.9且小于等于0.99。
可选地,在步骤(b)中,在基于当前波束的频谱向量和功率谱向量,计算当前波束的综合语音信号能量之前,确定当前波束的功率谱向量中的每个元素对应的局部能量最低值。
可选地,确定当前波束的功率谱向量中的每个元素对应的局部能量最低值包括:维护两个长度与频谱向量相同且初始值为零的向量S
b,min和S
b,tmp;
对向量S
b,min和S
b,tmp的每个元素,按下述公式进行更新:
S
b,min(f,t)=min{S
b,min(f,t-1),S
b(f,t)},
S
b,tmp(f,t)=min{S
b,tmp(f,t-1),S
b(f,t)},
其中:t表示帧索引;f表示频点;S
b,min(f,t)表示当前波束的功率向量谱的元素在第t帧上在频点f处对应的局部能量最低值;S
b,min(f,t-1)表示当前波束的功率向量谱的元素在第t-1帧上在频点f处对应的局部能量最低值;S
b(f,t)表示当前波束的功率向量谱的元素在第t帧上在频点f处对应的功率 谱;S
b,tmp(f,t)表示当前波束的功率向量谱的元素在第t帧上在频点f处对应的局部能量临时最低值;S
b,tmp(f,t-1)表示当前波束的功率向量谱的元素在第t-1帧上在频点f处对应的局部能量临时最低值;且
每当L个元素按上述公式进行更新之后,按下述方式对向量S
b,min和S
b,tmp进行重置:
S
b,min(f,t)=min{S
b,tmp(f,t-1),S
b(f,t)},
S
b,tmp(f,t)=S
b(f,t);
在对向量S
b,min和S
b,tmp的每个元素进行更新之后,得出当前波束的功率谱向量中的每个元素对应的局部能量最低值。
优选地,所述L设置为使得L帧信号包含200毫秒至500毫秒的信号。
可选地,所述综合能量是按照下述步骤得出的:对所述功率谱向量的所有元素求平均值以作为所述综合能量。
可选地,对所述功率谱向量的所有元素求平均值以作为所述综合能量包括:
对所述功率谱向量的所有元素进行加权平均以作为所述综合能量,其中对于所述功率谱向量中的每个元素,若该元素对应的频点位于0至5kHz范围内,则对该元素赋予权重1,否则赋予权重0。
可选地,所述综合语音存在概率是按照下述步骤得出的:对于当前波束的信号功率谱向量中的每个元素,根据语音存在概率模型,计算对应于信号功率谱向量中的每个元素的语音存在概率,以生成当前波束的语音存在概率向量;以及执行下述步骤以更新当前波束的语音存在概率向量的每个元素:
p
b(f,t)=α
2p
b(f,t-1)+(1-α
2)I(b,f,t)
其中:t表示帧索引;f表示频点;p
b为当前波束的语音存在概率向量;p
b(f,t-1)为当前波束的语音存在概率向量在第t-1帧上在频点f处的元素对应的语音存在概率;p
b(f,t)为当前波束的语音存在概率向量在第t帧上在频点f处的元素对应的语音存在概率;α
2为大于0且小于1的参数;以及
函数I(b,f,t)的取值是
S
b(f,t)为当前波束的功率谱向量的元素对应的功率谱;S
b,min(f,t)为当前波束的功率谱向量的元素对应的局部能量最低值;δ
1为用于判定当前帧是否带有语音信号的阈值;
对所述语音存在概率向量的所有元素求平均值以作为所述综合语音存在概率。
优选地,α
2大于等于0.8且小于等于0.99。
可选地,对所述语音存在概率向量的所有元素求平均值以作为所述综合语音存在概率包括:对所述语音存在概率向量的所有元素进行加权平均以作为所述综合语音存在概率,其中对于所述语音存在概率向量中的每个元素,若该元素对应的频点位于0至5kHz范围内,则对该元素赋予权重1,否则赋予权重0。
优选地,在步骤(b)中,在计算出当前波束的综合语音信号能量之后,按照下述操作对当前波束的综合语音信号能量进行更新:
d
b(t)=α
3d
b(t-1)+(1-α
3)J(b,t),
其中:d
b(t-1)为当前波束在第t-1帧上的综合语音信号能量;d
b(t)为当前波束在第t帧上的综合语音信号能量;
函数J(b,t)代表当前帧的语音信号能量,其取值为:
其中δ
2为用于决定是否将函数J(b,t)的值置零的阈值。
优选地,α
3大于等于0.8且小于等于0.99。
本发明的方案计算每个波束的综合语音信号能量,以据此选择麦克风阵列的输出波束。特别是,该综合语音信号能量充分考虑到波束的综合能量以及综合语音存在概率,通过波束能量与语音存在概率两者进行波束选择,既不需要预先获取说话人信息,也克服了非人声的噪声干扰,同时也不需要在识别波达方向之前进行任何语音识别。此外,该综合语音信号能量为标量的乘积,减少了向量计算,降低了计算复杂度。
应理解,上述对背景技术以及发明内容概要的描述仅仅是示意性的而非限制性的。
图1是根据本发明的用于选择麦克风阵列的输出波束的方法的一个示例实施例的示意性流程图;
图2是根据本发明的用于选择麦克风阵列的输出波束的方法的一个详细示例实施例的示意性流程图;及
图3是在根据本发明的用于选择麦克风阵列的输出波束的方法的一个实施例中,更新局部能量最低值估计的示意性流程图。
在下文中将参考附图更全面地描述本发明,附图构成本发明公开的一部分并通过图示的方式示出示例性的实施例。应理解,附图所示以及下文所述的实施例仅仅是说明性的,而不作为对本发明的限制。
图1是根据本发明的用于选择麦克风阵列的输出波束的方法的一个示例实施例的示意性流程图。
图1所示方法100包括:(a)如步骤102所示,从包括多个麦克风的麦克风阵列接收多个声音信号,对所述多个声音信号进行波束成形以得到多个波束以及对应的波束输出信号。
该方法100还包括:(b)如步骤104至108所示,对于所述多个波束中的每个波束,执行下述操作:将当前波束的波束输出信号从时域转换至频域,以得出当前波束的频谱向量和功率谱向量(步骤104);基于当前波束的频谱向量和功率谱向量,计算当前波束的综合语音信号能量(步骤106),其中综合语音信号能量为当前波束的综合能量和综合语音存在概率的乘积,其中所述综合能量指示当前波束的波束输出信号的能量水平,所述综合语音存在概率指示当前波束的波束输出信号中存在语音的概率,且所述综合语音存在概率和所述综合能量为标量。
该方法还包括:(c)如步骤110所示,选取综合语音信号能量值最大的波束作为输出波束。
图2是根据本发明的用于选择麦克风阵列的输出波束的方法的一个详细示例实施例的示意性流程图。
方法200开始于步骤202,在其中将波束成形算法输出的波束变换到STFT域,并用频谱信息更新各个波束的功率谱向量。具体地,假设波束成形算法输出B个波束,分别被变换到F点的短时傅里叶变换(STFT,Short-Time Fourier Transform)域中,则第b个波束(b=1,2,…,B)的输出信号可在STFT域中表示为F维频谱向量Y
b,向量Y
b的第f个元素Y
b(f)表示该信号在频点f处的频谱信息。对向量Y
b的各频点取模,并与功率谱向量S
b加权相加,根据下述公式更新后者:
S
b(f,t)=α
1S
b(f,t-1)+(1-α
1)|Y
b(f,t)|
2
其中自变量t表示时间(即帧索引),如S
b(f,t-1)和S
b(f,t)分别表示S
b在第t-1帧和第t帧在频点f处的值,下文中S
b,min和S
b,tmp等变量也采用这种表示方法。参数α
1介于0和1之间,取值越大,功率谱的更新程度越小,从而可以更好地抵抗瞬态噪声的影响,但更容易与真实的当前的瞬时能量值失配,优选的取值为0.9到0.99。向量Y
b在频率f上的模,|Y
b(f)|
2,代表当前帧(即第t帧,下同)信号在频率f上的功率谱;通过用|Y
b(f)|
2更新S
b(f),后者仍表示与前者相同的物理意义(信号能量),但由于是平滑地更新的,可以更好地抵抗瞬态噪声的影响。后面的步骤优选地可以用更新后的功率谱向量进行计算,从而使系统相对稳定。
在步骤204,更新当前波束的局部能量最低值S
b,min的估计。例如,可根据图3所示的方法300,更新局部能量最低值估计。应理解,虽然图3示出了一种具体的方法,但本发明的实施并不限于此。例如,可以采用马丁·R的《基于最小统计的谱减法》(Martin,R.:Spectral subtraction based on minimum statistics.1994,Proceedings of 7
th EUSIPCO,1182-1185)或该方法的变体来更新当前波束的局部能量最低值S
b,min的估计。
在步骤302,维护两个长度为F的向量S
b,min和S
b,tmp(其初始值均为0,即对所有f,有S
b,min(f,0)=S
b,tmp(f,0)=0)。
在步骤304,判断当前波束的功率谱向量S
b中是否存在下一元素。如果是,则进入步骤306;如果否,则表明当前波束的功率谱向量的每个元素已处理完毕,进入步骤312,得出每个元素对应的局部能量最低值。
在步骤306,对各频点对应的当前元素按如下方式进行更新,
S
b,min(f,t)=min{S
b,min(f,t-1),S
b(f,t)},
S
b,tmp(f,t)=min{S
b,tmp(f,t-1),S
b(f,t)},
在步骤308,判断是否已处理L帧信号,即,判断t是否是L的倍数。每当L帧信号被处理之后,在步骤310,按照如下方式对S
b,min和S
b,tmp进行重置,
S
b,min(f,t)=min{S
b,tmp(f,t-1),S
b(f,t)},
S
b,tmp(f,t)=S
b(f,t);
其中向量S
b,min是局部(L帧信号)的最小值。由于在任何时刻,信号一定是噪声或者噪声和语音的累加,因此,可近似地认为S
b,min代表噪声能量的强度。这种方法本质上是基于语音信号是非稳定信号、而噪声是稳定信号这一假设的,L的取值越小,对噪声的稳定性要求越低,但噪声信号和语音信号之间的区分度越小;该参数取值也和每帧信号的长度设定有关。在本发明的优选实施例中,大致应使得L帧信号约包含200毫秒到500毫秒之间的信号。
回到图2,在步骤206,更新当前波束的各频点上的语音存在概率。具体地,可以将各频点上语音信号存在的概率用向量p
b表示,并按照如下方式进行更新,
p
b(f,t)=α
2p
b(f,t-1)+(1-α
2)I(b,f,t)
其中参数α
2介于0和1之间,推荐设置为0.8到0.99;
函数I(b,f)的取值是
其中参数δ
1代表用于判定当前帧是否带有语音信号的阈值。
应理解,步骤206可以采用科恩·I和伯杜戈·B的《采用最小统计控制递归平均的噪声估计进行鲁棒语音增强》(Cohen,I.and Berdugo,B.:Noise estimation by minima controlled recursive averaging for robust speech enhancement. 2002,IEEE Signal Processing Letters,9(1):12-15)或其变体来执行,也可以用其它语音信号概率估计的算法来替代。类似地,需要该算法的输入为信号功率谱S
b,输出为0到1之间的语音概率p
b。
在步骤208中,对语音存在概率向量进行加权平均,得出当前波束的综合语音概率。具体地,对向量p
b做加权平均。对位于0-5kHz范围内的频点赋予权重1,否则赋予权重0,得到波束b的综合语音存在概率q
b。之后的步骤中会使用标量q
b而不是向量p
b进行计算,会使计算得到简化;同时,由于人声频率几乎不可能超过5kHz,可认为舍弃高于该频率的信号不会影响最终结果。
在步骤210中,对功率谱向量进行加权平均,得出当前波束的综合能量。类似地,对向量S
b做同样的加权平均,得到波束b的综合能量e
b。具体地,对向量S
b做加权平均。对位于0-5kHz范围内的频点赋予权重1,否则赋予权重0。
在步骤212中,计算当前波束的综合语音信号能量。定义d
b为波束b的语音信号能量,其初始值为0(即d
b(0)=0),在每一帧按照如下方式进行更新:
d
b(t)=α
3d
b(t-1)+(1-α
3)J(b,t)
参数α
3介于0和1之间,推荐设置为0.8到0.99,函数J(b)代表当前帧的语音信号能量,其取值是
其中参数δ
2代表用于决定是否将函数值置0的阈值。
在步骤214,判断是否存在下一波束。如果是,则返回步骤204,对下一波束执行步骤204-212;如果否,则进入步骤218。
在步骤218中,确定综合语音信号能量最大的波束,作为输出波束。具体地,取综合语音信号能量集合{d
b}(b=1,2,…,B)中的最大值所对应的波束b,作为输出波束。
以上实施例以示例的方式给出了具体操作过程,但应理解,本发明的保护范围不限于此。
虽然出于本公开的目的已经描述了本发明各方面的各种实施例,但是不应理解为将本公开的教导限制于这些实施例。在一个具体实施例中公开的特征并不限于该实施例,而是可以和不同实施例中公开的特征进行组合。此外,应理解,上文所述方法步骤可以顺序执行、并行执行、合并为更少步骤、拆分为更多步骤,以不同于所述方式组合和/或省略。本领域技术人员应理解,还存在可能的更多可选实施方式和变型,可以对上述部件和构造进行各种改变和修改,而不脱离由本发明权利要求所限定的范围。
Claims (14)
- 一种用于选择麦克风阵列的输出波束的方法,所述方法包括下述步骤:(a)从包括多个麦克风的麦克风阵列接收多个声音信号,对所述多个声音信号进行波束成形以得到多个波束以及对应的波束输出信号;(b)对于所述多个波束中的每个波束,执行下述操作:将当前波束的波束输出信号从时域转换至频域,以得出当前波束的频谱向量和功率谱向量;基于当前波束的频谱向量和功率谱向量,计算当前波束的综合语音信号能量,其中综合语音信号能量为当前波束的综合能量和综合语音存在概率的乘积,其中所述综合能量指示当前波束的波束输出信号的能量水平,所述综合语音存在概率指示当前波束的波束输出信号中存在语音的概率,且所述综合语音存在概率和所述综合能量为标量;以及(c)选取综合语音信号能量值最大的波束作为输出波束。
- 根据权利要求1所述的方法,其特征在于,所述频谱向量是对当前波束的波束输出信号进行短时傅里叶变换或短时离散余弦变换得出的。
- 根据权利要求1所述的方法,其特征在于,在步骤(b)中,在得出当前波束的频谱向量和功率谱向量之后,根据下述公式用频谱向量更新功率谱向量:S b(f,t)=α 1S b(f,t-1)+(1-α 1)|Y b(f,t)| 2,其中:t表示帧索引;f表示频点;S b(f,t-1)为当前波束的功率谱向量在第t-1帧在频点f处的元素对应的功率谱;S b(f,t)为当前波束的功率谱向量在第t帧上在频点f处的元素对应的功率谱;α 1为大于0且小于1的参数;以及Y b(f,t)为当前波束的频谱向量在第t帧上在频点f处的元素对应的频谱。
- 根据权利要求3所述的方法,其特征在于,α 1大于等于0.9且小于等于0.99。
- 根据权利要求1所述的方法,其特征在于,在步骤(b)中,在基于当前波束的频谱向量和功率谱向量,计算当前波束的综合语音信号能量之前,确定当前波束的功率谱向量中的每个元素对应的局部能量最低值。
- 根据权利要求5所述的方法,其特征在于,确定当前波束的功率谱向量中的每个元素对应的局部能量最低值包括:维护两个长度与频谱向量相同且初始值为零的向量S b,min和S b,tmp;对向量S b,min和S b,tmp的每个元素,按下述公式进行更新:S b,min(f,t)=min{S b,min(f,t-1),S b(f,t)},S b,tmp(f,t)=min{S b,tmp(f,t-1),S b(f,t)},其中:t表示帧索引;f表示频点;S b,min(f,t)表示当前波束的功率向量谱的元素在第t帧上在频点f处对应的局部能量最低值;S b,min(f,t-1)表示当前波束的功率向量谱的元素在第t-1帧上在频点f处对应的局部能量最低值;S b(f,t)表示当前波束的功率向量谱的元素在第t帧上在频点f处对应的功率谱;S b,tmp(f,t)表示当前波束的功率向量谱的元素在第t帧上在频点f处对应的局部能量临时最低值;S b,tmp(f,t-1)表示当前波束的功率向量谱的元素在第t-1帧上在频点f处对应的局部能量临时最低值;且每当L个元素按上述公式进行更新之后,按下述方式对向量S b,min和S b,tmp进行重置:S b,min(f,t)=min{S b,tmp(f,t-1),S b(f,t)},S b,tmp(f,t)=S b(f,t);在对向量S b,min和S b,tmp的每个元素进行更新之后,得出当前波束的功率谱向量中的每个元素对应的局部能量最低值。
- 根据权利要求6所述的方法,其特征在于,所述L设置为使得L帧信号包含200毫秒至500毫秒的信号。
- 根据权利要求1所述的方法,其特征在于,所述综合能量是按照下述步骤得出的:对所述功率谱向量的所有元素求平均值以作为所述综合能量。
- 根据权利要求8所述的方法,其特征在于,对所述功率谱向量的所有元素求平均值以作为所述综合能量包括:对所述功率谱向量的所有元素进行加权平均以作为所述综合能量,其中对于所述功率谱向量中的每个元素,若该元素对应的频点位于0至5kHz范围内,则对该元素赋予权重1,否则赋予权重0。
- 根据权利要求1所述的方法,其特征在于,所述综合语音存在概率是按照下述步骤得出的:对于当前波束的信号功率谱向量中的每个元素,根据语音存在概率模型,计算对应于信号功率谱向量中的每个元素的语音存在概率,以生成当前波束的语音存在概率向量;以及执行下述步骤以更新当前波束的语音存在概率向量的每个元素:p b(f,t)=α 2p b(f,t-1)+(1-α 2)I(b,f,t)其中:t表示帧索引;f表示频点;p b为当前波束的语音存在概率向量;p b(f,t-1)为当前波束的语音存在概率向量在第t-1帧上在频点f处的元素对应的语音存在概率;p b(f,t)为当前波束的语音存在概率向量在第t帧上在频点f处的元素对应的语音存在概率;α 2为大于0且小于1的参数;以及函数I(b,f,t)的取值是S b(f,t)为当前波束的功率谱向量的元素对应的功率谱;S b,min(f,t)为当前波束的功率谱向量的元素对应的局部能量最低值;δ 1为用于判定当前帧是否带有语音信号的阈值;对所述语音存在概率向量的所有元素求平均值以作为所述综合语音存在概率。
- 根据权利要求10所述的方法,其特征在于,α 2大于等于0.8且小于等于0.99。
- 根据权利要求9所述的方法,其特征在于,对所述语音存在概率向量的所有元素求平均值以作为所述综合语音存在概率包括:对所述语音存在概率向量的所有元素进行加权平均以作为所述综合语音存在概率,其中对于所述语音存在概率向量中的每个元素,若该元素对应的频点位于0至5kHz范围内,则对该元素赋予权重1,否则赋予权重0。
- 根据权利要求13所述的方法,其特征在于,α 3大于等于0.8且小于等于0.99。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/776,541 US20220399028A1 (en) | 2019-11-12 | 2020-11-12 | Method for selecting output wave beam of microphone array |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911097476.0A CN110600051B (zh) | 2019-11-12 | 2019-11-12 | 用于选择麦克风阵列的输出波束的方法 |
CN201911097476.0 | 2019-11-12 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021093798A1 true WO2021093798A1 (zh) | 2021-05-20 |
Family
ID=68852349
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/128274 WO2021093798A1 (zh) | 2019-11-12 | 2020-11-12 | 用于选择麦克风阵列的输出波束的方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20220399028A1 (zh) |
CN (1) | CN110600051B (zh) |
WO (1) | WO2021093798A1 (zh) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110600051B (zh) * | 2019-11-12 | 2020-03-31 | 乐鑫信息科技(上海)股份有限公司 | 用于选择麦克风阵列的输出波束的方法 |
CN111883162B (zh) * | 2020-07-24 | 2021-03-23 | 杨汉丹 | 唤醒方法、装置和计算机设备 |
CN113257269A (zh) * | 2021-04-21 | 2021-08-13 | 瑞芯微电子股份有限公司 | 一种基于深度学习的波束形成方法和存储设备 |
CN113932912B (zh) * | 2021-10-13 | 2023-09-12 | 国网湖南省电力有限公司 | 一种变电站噪声抗干扰估计方法、系统及介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102739886A (zh) * | 2011-04-01 | 2012-10-17 | 中国科学院声学研究所 | 基于回声频谱估计和语音存在概率的立体声回声抵消方法 |
CN103871420A (zh) * | 2012-12-13 | 2014-06-18 | 华为技术有限公司 | 麦克风阵列的信号处理方法及装置 |
CN106251877A (zh) * | 2016-08-11 | 2016-12-21 | 珠海全志科技股份有限公司 | 语音声源方向估计方法及装置 |
CN107976651A (zh) * | 2016-10-21 | 2018-05-01 | 杭州海康威视数字技术股份有限公司 | 一种基于麦克风阵列的声源定位方法及装置 |
WO2018133056A1 (zh) * | 2017-01-22 | 2018-07-26 | 北京时代拓灵科技有限公司 | 一种声源定位的方法和装置 |
US20190385635A1 (en) * | 2018-06-13 | 2019-12-19 | Ceva D.S.P. Ltd. | System and method for voice activity detection |
CN110600051A (zh) * | 2019-11-12 | 2019-12-20 | 乐鑫信息科技(上海)股份有限公司 | 用于选择麦克风阵列的输出波束的方法 |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101510426B (zh) * | 2009-03-23 | 2013-03-27 | 北京中星微电子有限公司 | 一种噪声消除方法及系统 |
CN102324237B (zh) * | 2011-05-30 | 2013-01-02 | 深圳市华新微声学技术有限公司 | 麦克风阵列语音波束形成方法、语音信号处理装置及系统 |
CN102508204A (zh) * | 2011-11-24 | 2012-06-20 | 上海交通大学 | 基于波束形成和传递路径分析的室内噪声源定位方法 |
WO2013132926A1 (ja) * | 2012-03-06 | 2013-09-12 | 日本電信電話株式会社 | 雑音推定装置、雑音推定方法、雑音推定プログラム及び記録媒体 |
CN105590631B (zh) * | 2014-11-14 | 2020-04-07 | 中兴通讯股份有限公司 | 信号处理的方法及装置 |
CN106448692A (zh) * | 2016-07-04 | 2017-02-22 | Tcl集团股份有限公司 | 应用语音存在概率优化的retf混响消除方法及系统 |
DK3300078T3 (da) * | 2016-09-26 | 2021-02-15 | Oticon As | Stemmeaktivitetsdetektionsenhed og en høreanordning, der omfatter en stemmeaktivitetsdetektionsenhed |
US10096328B1 (en) * | 2017-10-06 | 2018-10-09 | Intel Corporation | Beamformer system for tracking of speech and noise in a dynamic environment |
US10885907B2 (en) * | 2018-02-14 | 2021-01-05 | Cirrus Logic, Inc. | Noise reduction system and method for audio device with multiple microphones |
CN110390947B (zh) * | 2018-04-23 | 2024-04-05 | 北京京东尚科信息技术有限公司 | 声源位置的确定方法、系统、设备和存储介质 |
CN108922554B (zh) * | 2018-06-04 | 2022-08-23 | 南京信息工程大学 | 基于对数谱估计的lcmv频率不变波束形成语音增强算法 |
KR20210137146A (ko) * | 2019-03-10 | 2021-11-17 | 카르돔 테크놀로지 엘티디. | 큐의 클러스터링을 사용한 음성 증강 |
CN110223708B (zh) * | 2019-05-07 | 2023-05-30 | 平安科技(深圳)有限公司 | 基于语音处理的语音增强方法及相关设备 |
-
2019
- 2019-11-12 CN CN201911097476.0A patent/CN110600051B/zh active Active
-
2020
- 2020-11-12 US US17/776,541 patent/US20220399028A1/en active Pending
- 2020-11-12 WO PCT/CN2020/128274 patent/WO2021093798A1/zh active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102739886A (zh) * | 2011-04-01 | 2012-10-17 | 中国科学院声学研究所 | 基于回声频谱估计和语音存在概率的立体声回声抵消方法 |
CN103871420A (zh) * | 2012-12-13 | 2014-06-18 | 华为技术有限公司 | 麦克风阵列的信号处理方法及装置 |
CN106251877A (zh) * | 2016-08-11 | 2016-12-21 | 珠海全志科技股份有限公司 | 语音声源方向估计方法及装置 |
CN107976651A (zh) * | 2016-10-21 | 2018-05-01 | 杭州海康威视数字技术股份有限公司 | 一种基于麦克风阵列的声源定位方法及装置 |
WO2018133056A1 (zh) * | 2017-01-22 | 2018-07-26 | 北京时代拓灵科技有限公司 | 一种声源定位的方法和装置 |
US20190385635A1 (en) * | 2018-06-13 | 2019-12-19 | Ceva D.S.P. Ltd. | System and method for voice activity detection |
CN110600051A (zh) * | 2019-11-12 | 2019-12-20 | 乐鑫信息科技(上海)股份有限公司 | 用于选择麦克风阵列的输出波束的方法 |
Also Published As
Publication number | Publication date |
---|---|
CN110600051A (zh) | 2019-12-20 |
CN110600051B (zh) | 2020-03-31 |
US20220399028A1 (en) | 2022-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021093798A1 (zh) | 用于选择麦克风阵列的输出波束的方法 | |
JP7011075B2 (ja) | マイク・アレイに基づく対象音声取得方法及び装置 | |
US20210067867A1 (en) | Signal processing apparatus and signal processing method | |
US10304475B1 (en) | Trigger word based beam selection | |
US9973849B1 (en) | Signal quality beam selection | |
CN111418010B (zh) | 一种多麦克风降噪方法、装置及终端设备 | |
CN109817209B (zh) | 一种基于双麦克风阵列的智能语音交互系统 | |
US7383178B2 (en) | System and method for speech processing using independent component analysis under stability constraints | |
US9286908B2 (en) | Method and system for noise reduction | |
US8255209B2 (en) | Noise elimination method, apparatus and medium thereof | |
US8693287B2 (en) | Sound direction estimation apparatus and sound direction estimation method | |
CN110610718B (zh) | 一种提取期望声源语音信号的方法及装置 | |
JP2008311866A (ja) | 音響信号処理方法及び装置 | |
JP2008219458A (ja) | 音源分離装置,音源分離プログラム及び音源分離方法 | |
JP2005249816A (ja) | 信号強調装置、方法及びプログラム、並びに音声認識装置、方法及びプログラム | |
US20220076690A1 (en) | Signal processing apparatus, learning apparatus, signal processing method, learning method and program | |
US20220109929A1 (en) | Cascaded adaptive interference cancellation algorithms | |
US10755727B1 (en) | Directional speech separation | |
Delcroix et al. | Speech recognition in the presence of highly non-stationary noise based on spatial, spectral and temporal speech/noise modeling combined with dynamic variance adaptation | |
US9583120B2 (en) | Noise cancellation apparatus and method | |
WO2022190615A1 (ja) | 信号処理装置および方法、並びにプログラム | |
CN113223552B (zh) | 语音增强方法、装置、设备、存储介质及程序 | |
WO2018173526A1 (ja) | 音声処理用コンピュータプログラム、音声処理装置及び音声処理方法 | |
KR101361034B1 (ko) | 하모닉 주파수 의존성을 이용한 독립벡터분석에 기반한 강한 음성 인식 방법 및 이를 이용한 음성 인식 시스템 | |
JP6631127B2 (ja) | 音声判定装置、方法及びプログラム、並びに、音声処理装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20888546 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20888546 Country of ref document: EP Kind code of ref document: A1 |