WO2015196760A1 - 一种麦克风阵列语音检测方法及装置 - Google Patents

一种麦克风阵列语音检测方法及装置 Download PDF

Info

Publication number
WO2015196760A1
WO2015196760A1 PCT/CN2014/094542 CN2014094542W WO2015196760A1 WO 2015196760 A1 WO2015196760 A1 WO 2015196760A1 CN 2014094542 W CN2014094542 W CN 2014094542W WO 2015196760 A1 WO2015196760 A1 WO 2015196760A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
state
array
power spectrum
detection threshold
Prior art date
Application number
PCT/CN2014/094542
Other languages
English (en)
French (fr)
Inventor
范泛
付中华
黎家力
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2015196760A1 publication Critical patent/WO2015196760A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise

Definitions

  • the present invention relates to voice processing technologies, and in particular, to a microphone array voice detection method and apparatus.
  • voice detection is an important link. Accurate detection of voice signals has an important impact on voice recognition, enhancement, coding and so on.
  • Traditional single-channel speech detection usually uses a certain feature as the detection basis, and performs characteristic analysis on the input signal, and then uses a classifier to detect. Due to the real-time requirements, feature analysis and classifier detection are relatively simple. Features commonly used in feature analysis include short-term energy, zero-crossing rate or other spectral features, and the classifier is also based on threshold determination, linear separator, etc. the Lord. The detection performance of these detection methods is extremely limited under complex noise conditions. The basic assumption of speech detection in noisy environments is that the characteristics of noise and speech signals are different. This has the following difficulties in practice: the determination of detection thresholds is not accurate enough, especially the varying noise. The environment, the detection threshold is even more difficult to determine.
  • Embodiments of the present invention provide a microphone array voice method and apparatus, which can accurately determine a detection threshold under complex noise conditions and improve the accuracy of noise detection.
  • an embodiment of the present invention provides a microphone array voice detection method, including:
  • the detection threshold is Make adjustments.
  • the step of calculating a maximum sub-band power ratio and a detection threshold of the array voice input signal includes:
  • the fixed beam output power spectrum of the array speech input signal is estimated by means of inter-frame regression smoothing and frequency domain smoothing, and the average power spectrum of the array speech input signal is estimated by means of inter-frame smoothing and frequency domain smoothing;
  • the maximum sub-band power ratio is estimated by the inter-frame regression smoothing method according to the average power ratio in the sub-band range within the sub-band of the set width.
  • the fixed beam output power spectrum calculation formula is:
  • k is the frequency point number
  • is the short time frame number
  • the current frame beam output signal power spectrum when the short time frame number is ⁇
  • a x is the first regression coefficient
  • l 1 is the preset frequency point number, where 0 ⁇ a x ⁇ 1, k, ⁇ , b, l 1 is a positive integer
  • the average power spectrum of the current frame when the short time frame is numbered ⁇ ; a y is the second regression coefficient, 0 ⁇ a y ⁇ 1;
  • r( ⁇ ) a r r( ⁇ -1)+(1-a r )r( ⁇ );
  • r( ⁇ -1) is the last calculation result of r( ⁇ ), the initial value of r( ⁇ -1) is the average power ratio in the subband range of the set width; a r is the third regression coefficient, 0 ⁇ a ⁇ ⁇ 1.
  • the detecting threshold adjustment state includes a voice state.
  • the step of determining the current voice state by using the maximum sub-band power ratio and the detection threshold according to the preset determination condition specifically includes:
  • the voice end state is currently in the state and the maximum subband power is greater than the current detection threshold, it is judged that the voice state is transferred.
  • the step of determining the current voice state by using the maximum sub-band power ratio and the detection threshold according to the preset determination condition further includes:
  • the method before the step of calculating a maximum sub-band power ratio of the array voice input signal according to the fixed beam output power spectrum and the average power spectrum of the array voice input signal, the method further includes:
  • ⁇ (k) is the ideal diffusion field normalized coherent matrix of the target speech signal.
  • the matrix is an N ⁇ N matrix whose n 1 row n 2 column elements are:
  • WNG min (k) is the white noise gain
  • d(k) is the spatial steering vector of the target sound source to the speech acquisition device, and its calculation formula is:
  • the detection threshold is adjusted according to the following formula:
  • ⁇ ′( ⁇ ) is the adjusted detection threshold
  • ⁇ L and ⁇ H are respectively a lower limit and an upper limit of the preset speech detection threshold
  • an embodiment of the present invention further provides a microphone array voice detecting apparatus, including:
  • a first calculation module configured to calculate a maximum sub-band power ratio of the array voice input signal according to a fixed beam output power spectrum and an average power spectrum of the array voice input signal;
  • a state judging module configured to determine, according to a preset judgment condition, the current sub-band power ratio and the current detection threshold to determine a current voice state;
  • the threshold adjustment module is configured to adjust the detection threshold when determining that the currently transferred voice state is a preset detection threshold adjustment state.
  • the first calculating module specifically includes:
  • the first calculating unit is configured to estimate the fixed beam output power spectrum of the array speech input signal by means of inter-frame regression smoothing and frequency domain smoothing, and estimate the average power spectrum of the array speech input signal by means of inter-frame smoothing and frequency domain smoothing.
  • a second calculating unit configured to calculate a power ratio of each frequency point according to a ratio of the fixed beam output power spectrum and the average power spectrum;
  • the third calculating unit is configured to estimate the maximum by using an inter-frame regression smoothing method according to an average power ratio in the sub-band range, centering on a frequency point at which the frequency power ratio is the largest, and in a sub-band range of the set width. Sub-band power ratio.
  • the fixed beam output power spectrum calculation formula is:
  • k is the frequency point number
  • is the short time frame number
  • the current frame beam output signal power spectrum when the short time frame number is ⁇
  • a x is the first regression coefficient
  • l 1 is the preset frequency point number, where 0 ⁇ a x ⁇ 1, k, ⁇ , b, l 1 is a positive integer
  • the average power spectrum of the current frame when the short time frame is numbered ⁇ ; a y is the second regression coefficient, 0 ⁇ a y ⁇ 1;
  • r( ⁇ ) a r r( ⁇ -1)+(1-a r )r( ⁇ );
  • r( ⁇ -1) is the last calculation result of r( ⁇ ), the initial value of r( ⁇ -1) is the average power ratio in the subband range of the set width; a r is the third regression coefficient, 0 ⁇ a ⁇ ⁇ 1.
  • the detecting threshold adjustment state includes a voice state.
  • the status determining module specifically includes:
  • a first determining unit configured to determine that the voice state is transferred when the number of frames that are currently in the voice start state and the maximum subband power is greater than the current detection threshold and that is continuously in the voice start state is greater than the set first frame number threshold;
  • the second determining unit is configured to determine that the voice state is transferred when the voice end state is currently in the state and the maximum subband power is greater than the current detection threshold.
  • the status determining module further includes:
  • the third judging unit is configured to determine that the transition to the voice start state is when the previous non-speech state and the maximum sub-band power ratio are greater than the current detection threshold;
  • the fourth judging unit is configured to: when the current sub-band power is currently in the voice start state and the maximum sub-band power is less than or equal to the current detection threshold, determine to enter the no-speech state;
  • the fifth judging unit is configured to: when the current sub-band power is in the voice state and the maximum sub-band power is less than or equal to the current detection threshold, determine to enter the voice end state;
  • the sixth judging unit is configured to determine that the number of frames that are currently in the voice state and the maximum subband power is less than or equal to the current detection threshold and that the continuous speech end state is greater than the set second frame number threshold .
  • the device further includes:
  • a signal receiving module configured to receive an array voice input signal input through a voice collecting device
  • a signal conversion module configured to perform windowing and truncation on the array voice input signal, and perform short-time Fourier transform processing to obtain a time-frequency representation signal of the array voice input signal;
  • a second calculating module configured to calculate a frequency domain fixed beam output according to the time-frequency representation signal
  • a third calculating module configured to calculate an array current frame average power spectrum and a current frame beam output signal power spectrum according to the frequency domain fixed beam output;
  • a fourth calculating module configured to calculate a fixed beam output power spectrum of the array voice input signal according to the current frame average power spectrum of the array; and calculate an average power spectrum of the array voice input signal according to the current frame beam output signal power spectrum.
  • ⁇ (k) is the ideal diffusion field normalized coherent matrix of the target speech signal.
  • the matrix is an N ⁇ N matrix whose n 1 row n 2 column elements are:
  • WNG min (k) is the white noise gain
  • d(k) is the spatial steering vector of the target sound source to the speech acquisition device, and its calculation formula is:
  • the threshold adjustment module adjusts the detection threshold according to the following formula:
  • ⁇ ′( ⁇ ) is the adjusted detection threshold
  • ⁇ L and ⁇ H are respectively a lower limit and an upper limit of the preset speech detection threshold
  • the microphone voice detection method and apparatus adjust the detection threshold when determining the voice state according to the preset condition, and can assist in the changed noise environment. Determine the detection threshold.
  • the embodiment of the present invention processes the voice signal according to the preset beam parameters, enhances the directivity of the voice signal, and reduces the influence of noise or other voice signals on the voice detection device and the system.
  • FIG. 1 is a schematic flow chart of a microphone voice detection method according to an embodiment of the present invention.
  • FIG. 2 is a process of calculating a maximum sub-band power ratio and a detection threshold of an array voice input signal according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of steps included in another embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a microphone voice detecting apparatus according to an embodiment of the present invention.
  • FIG. 6 is a signal flow diagram when calculating a frequency domain fixed beam output according to an embodiment of the present invention.
  • FIG. 7 is a signal flow diagram when calculating a current frame average power spectrum according to an embodiment of the present invention.
  • the embodiment of the invention provides a microphone array voice detection method, as shown in FIG. 1 , including the following steps:
  • Step 101 Output power spectrum and average power spectrum of the fixed beam according to the array voice input signal Calculating the maximum sub-band power ratio of the array voice input signal;
  • Step 102 Determine, according to a preset determination condition, the current sub-band power ratio and the current detection threshold to determine a current voice state
  • Step 103 When it is determined that the currently transferred voice state is a preset detection threshold adjustment state, the detection threshold is adjusted.
  • the microphone array voice detection method determines the current voice state according to a preset determination condition, and detects when the currently transferred voice state is a preset detection threshold adjustment state.
  • the threshold is adjusted.
  • the maximum sub-band power ratio of the array speech input signal is in the set range, so that the detection threshold can be determined more accurately in a varying noise environment.
  • the step of calculating a maximum sub-band power ratio and a detection threshold of the array speech input signal includes, in particular, a process as shown in FIG. 2:
  • Step 201 Estimating the fixed beam output power spectrum of the array speech input signal by means of inter-frame regression smoothing and frequency domain smoothing, and estimating the average power spectrum of the array speech input signal by means of inter-frame smoothing and frequency domain smoothing.
  • Step 202 Calculate a power ratio of each frequency point according to a ratio of the fixed beam output power spectrum to the average power spectrum.
  • Step 203 estimating the maximum sub-band power ratio by using an inter-frame regression smoothing method according to an average power ratio in the sub-band range centering on a frequency point at which the frequency point power ratio is the largest.
  • the fixed beam output power spectrum is calculated as:
  • k is the frequency point number
  • is the short time frame number
  • the current frame beam output signal power spectrum when the frequency point is numbered b and the short time frame number is ⁇
  • a x is the first regression coefficient
  • l 1 is the preset frequency point number; wherein 0 ⁇ a x ⁇ 1,k , ⁇ , b, l 1 are positive integers, respectively.
  • r( ⁇ ) a r r( ⁇ -1)+(1-a r )r( ⁇ );
  • r( ⁇ -1) is the last calculation result of r( ⁇ ), the initial value of r( ⁇ -1) is the average power ratio in the subband range of the set width; a r is the third regression coefficient, 0 ⁇ a ⁇ ⁇ 1.
  • the detecting threshold adjustment state includes a voice state.
  • the step of determining the current voice state by using the maximum sub-band power ratio and the detection threshold according to the pre-set determination condition specifically includes:
  • the voice end state is currently in the state and the maximum subband power is greater than the current detection threshold, it is determined that the voice state is transferred.
  • the step of determining the current voice state by using the maximum sub-band power ratio and the detection threshold according to the preset determination condition further includes:
  • the current detection threshold is ⁇ ( ⁇ ).
  • Two counters are used to record the number of consecutive frames in the speech start state and the number of consecutive speech end states.
  • the number of consecutive speech states is c 1
  • the number of consecutive speech end states is c 2 ;
  • the step of determining the current voice state by using the maximum sub-band power ratio and the detection threshold includes the following process:
  • L 1 is an empirical value and takes a positive integer
  • the detection system may completely fail.
  • the master-slave microphone and microphone array can be selected as the sound pickup device.
  • the master-slave microphone samples two different directional microphones, so that the target direction signal generates power difference between the two microphones, and then uses the power ratio of the two microphones to perform target speech detection.
  • the key is the master-slave microphone design and the target speech orientation.
  • the microphone array uses the spatial topology of each array element to form a specific directional beam, so that the signal inside and outside the beam produces a power difference, and then uses this clue to detect the signal in the target direction.
  • the master-slave microphone pickup technology in the related art still has a problem: the microphone array beam is inevitably affected by the side lobes, and the low-frequency directivity is poor; therefore, the speech detection in the related art master-slave microphone pickup process There are still many problems to be solved when the technology is actually used.
  • a process as shown in FIG. 3 is also included:
  • Step 301 Receive an array voice input signal input through a voice collection device.
  • Step 302 Perform windowing and truncation on the array voice input signal, and perform short-time Fourier transform processing to obtain a time-frequency representation signal of the array voice input signal.
  • Step 303 Calculate a frequency domain fixed beam output according to the time-frequency representation signal.
  • Step 304 Calculate an average power spectrum of the current voice frame of the array and a power spectrum of the current frame beam output signal according to the frequency domain fixed beam output.
  • Step 305 Calculate a fixed beam output power spectrum of the array voice input signal according to the current power frame average power spectrum of the array; and calculate an average power spectrum of the array voice input signal according to the current voice frame beam output signal power spectrum of the array.
  • the Hanning window is used, and the 3/4 window length is overlapped; the time window length is L wnd , and the adjacent windows overlap L ovlp .
  • k is the frequency point number;
  • is the short time frame number, and k and ⁇ are positive integers.
  • the frequency domain fixed beam output is multiplied by a corresponding preset beam parameter a i (k) by using a time-frequency representation signal of the original voice array signal, that is, the frequency domain fixed beam output is:
  • N is a positive integer.
  • the signal flow diagram when calculating the frequency domain fixed beam output is as shown in FIG. 6.
  • the directivity of the beam can be enhanced, and the influence of noise interference or other speech interference on the system detection can be reduced.
  • the time-frequency representation signal of the original speech array signal is multiplied by the corresponding preset beam parameter calculation result and the minimum value of y 1 (k, ⁇ ), which can effectively avoid beam robustness. Causes low frequency abnormal amplification.
  • the design of the beam parameters may directly affect the power ratio of the signals inside and outside the beam.
  • the optimal frequency domain beam parameter design method is adopted, and the array white noise gain is less than 15 dB.
  • Design the optimal super-directional beam parameters in the frequency domain. If A(k) is used to represent a matrix whose array elements are a i (k), where i 1...N, then the optimal super-directivity beam parameters are:
  • ⁇ (k) is the ideal diffusion field normalized coherent matrix of the target speech signal.
  • the matrix is an N ⁇ N matrix whose n 1 row n 2 column elements are:
  • WNG min (k) is the white noise gain.
  • d(k) is the spatial steering vector of the target sound source to the speech acquisition device, and its calculation formula is:
  • the optimal super-directional beam parameters can be designed using third-party open source convex optimization software, such as CVX and SeDuMi.
  • the current frame beam output signal power spectrum calculation formula is:
  • the detection threshold is adjusted according to the following formula:
  • ⁇ ′( ⁇ ) is the adjusted detection threshold
  • ⁇ L and ⁇ H are respectively a lower limit and an upper limit of the preset speech detection threshold
  • the value of the slow regression smoothing of the maximum sub-band power spectrum ratio when there is a speech state 0 ⁇ ⁇ L ⁇ 1, 0 ⁇ ⁇ H ⁇ 1.
  • the maximum subband power spectrum ratio is slowly regression-smoothed by the following formula.
  • a 0 is a regression smoothing coefficient
  • the detection threshold is adjusted by the minimum maximum method according to the following formula:
  • the parameters mentioned in the above embodiments may refer to the following values:
  • the embodiment of the invention further provides a microphone array voice detecting device, as shown in FIG. 5, comprising:
  • a first calculation module configured to calculate a maximum sub-band power ratio of the array voice input signal according to a fixed beam output power spectrum and an average power spectrum of the array voice input signal;
  • a state judging module configured to determine, according to a preset judgment condition, the current sub-band power ratio and the current detection threshold to determine a current voice state;
  • the threshold adjustment module is configured to adjust the detection threshold when determining that the currently transferred voice state is a preset detection threshold adjustment state.
  • the first calculating module specifically includes:
  • the first calculating unit is configured to estimate the fixed beam output power spectrum of the array speech input signal by means of inter-frame regression smoothing and frequency domain smoothing, and estimate the average power spectrum of the array speech input signal by means of inter-frame smoothing and frequency domain smoothing.
  • a second calculating unit configured to calculate a power ratio of each frequency point according to a ratio of the fixed beam output power spectrum and the average power spectrum;
  • the third calculating unit is configured to estimate the maximum by using an inter-frame regression smoothing method according to an average power ratio in the sub-band range, centering on a frequency point at which the frequency power ratio is the largest, and in a sub-band range of the set width. Sub-band power ratio.
  • the fixed beam output power spectrum is calculated as:
  • k is the frequency point number
  • is the short time frame number
  • the current frame beam output signal power spectrum when the short time frame number is ⁇
  • a x is the first regression coefficient
  • l 1 is the preset frequency point number; wherein 0 ⁇ a x ⁇ 1, k, ⁇ , b, l 1 is a positive integer.
  • the average power spectrum of the current frame when the short time frame is numbered ⁇ ; a y is the second regression coefficient, 0 ⁇ a y ⁇ 1;
  • r( ⁇ ) a r r( ⁇ -1)+(1-a r )r( ⁇ );
  • r( ⁇ -1) is the last calculation result of r( ⁇ ), the initial value of r( ⁇ -1) is the average power ratio in the subband range of the set width; a r is the third regression coefficient, 0 ⁇ a ⁇ ⁇ 1.
  • the detecting threshold adjustment state includes a voice state.
  • the status determining module specifically includes:
  • a first determining unit configured to determine that the voice state is transferred when the number of frames that are currently in the voice start state and the maximum subband power is greater than the current detection threshold and that is continuously in the voice start state is greater than the set first frame number threshold;
  • the second determining unit is configured to determine that the voice state is transferred when the voice end state is currently in the state and the maximum subband power is greater than the current detection threshold.
  • the state determination module further includes:
  • the third judging unit is configured to determine that the transition to the voice start state is when the previous non-speech state and the maximum sub-band power ratio are greater than the current detection threshold;
  • the fourth judging unit is configured to: when the current sub-band power is currently in the voice start state and the maximum sub-band power is less than or equal to the current detection threshold, determine to enter the no-speech state;
  • the fifth judging unit is configured to: when the current sub-band power is in the voice state and the maximum sub-band power is less than or equal to the current detection threshold, determine to enter the voice end state;
  • the sixth judging unit is configured to determine that the number of frames that are currently in the voice state and the maximum subband power is less than or equal to the current detection threshold and that the continuous speech end state is greater than the set second frame number threshold .
  • the apparatus further includes:
  • a signal receiving module configured to receive an array voice input signal input through a voice collecting device
  • a signal conversion module configured to perform windowing and truncation on the array voice input signal, and perform short-time Fourier transform processing to obtain a time-frequency representation signal of the array voice input signal;
  • a second calculating module configured to calculate a frequency domain fixed beam output according to the time-frequency representation signal
  • a third calculating module configured to calculate an array current frame average power spectrum and a current frame beam output signal power spectrum according to the frequency domain fixed beam output;
  • a fourth calculating module configured to calculate a fixed beam output power spectrum of the array voice input signal according to the current frame average power spectrum of the array; and calculate an average power spectrum of the array voice input signal according to the current frame beam output signal power spectrum.
  • ⁇ (k) is the ideal diffusion field normalized coherent matrix of the target speech signal.
  • the matrix is an N ⁇ N matrix whose n 1 row n 2 column elements are:
  • WNG min (k) is the white noise gain
  • d(k) is the spatial steering vector of the target sound source to the speech acquisition device, and its calculation formula is:
  • the microphone array voice detection method and apparatus can adjust the detection threshold when the voice state is determined according to a preset condition, and can be assisted even in a changed noise environment. Determine the detection threshold.
  • the embodiment of the present invention processes the voice signal according to the preset beam parameters, enhances the directivity of the voice signal, and reduces the influence of noise or other voice signals on the voice detection device and the system.
  • all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve. Thus, the invention is not limited to any specific combination of hardware and software.
  • Each device/function module/functional unit in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.
  • Each device/function module/functional unit in the above embodiments may be stored in a computer readable storage medium when implemented in the form of a software function module and sold or used as a standalone product.
  • the above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
  • the microphone array voice detection method and apparatus provided by the embodiments of the present invention adjust the detection threshold when the voice state is determined according to a preset condition, and can assist the determination of the detection threshold even in a changed noise environment.
  • the embodiment of the present invention processes the voice signal according to the preset beam parameters, enhances the directivity of the voice signal, and reduces the influence of noise or other voice signals on the voice detection device and the system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

一种麦克风阵列语音检测方法及装置,所述方法包括如下步骤:根据阵列语音输入信号的固定波束输出功率谱和平均功率谱计算阵列语音输入信号的最大子带功率比(101);根据预先设定的判断条件,采用所述最大子带功率比和当前检测阈值判断当前所处的语音状态(102);当判断当前转入的语音状态为预设的检测阈值调整状态时,对检测阈值进行调整(103)。该麦克风阵列语音检测方法及装置,在复杂噪声条件下能够准确确定检测阈值,提高噪声检测的准确度。

Description

一种麦克风阵列语音检测方法及装置 技术领域
本发明涉及语音处理技术,尤其涉及一种麦克风阵列语音检测方法及装置。
背景技术
在语音通信和人机语音交互中,语音检测是一个重要的环节,准确检测语音信号对语音的识别、增强、编码等等都有重要影响。传统的单通道语音检测通常都以某种特征为检测依据,通过对输入的信号进行特征分析,然后用分类器进行检测。由于实时性的要求,特征分析和分类器检测都相对简单,特征分析所常用的特征包括短时能量、过零率或其他的谱特征等,而分类器也以阈值判定、线性分离器等为主。这些检测方法在复杂噪声条件下检测性能极其有限,噪声环境下语音检测的基本假设是噪声与语音信号的特征不同,这在实际当中存在如下困难:检测阈值的确定不够准确,特别是变化的噪声环境,检测阈值更是难以确定。
发明内容
本发明实施例提供一种麦克风阵列语音方法及装置,在复杂噪声条件下能够准确确定检测阈值,提高噪声检测的准确度。
根据本发明的一个方面,本发明实施例提供了一种麦克风阵列语音检测方法,包括:
根据阵列语音输入信号的固定波束输出功率谱和平均功率谱计算阵列语音输入信号的最大子带功率比;
根据预先设定的判断条件,采用所述最大子带功率比和当前检测阈值判断当前所处的语音状态;
当判断当前转入的语音状态为预设的检测阈值调整状态时,对检测阈值 进行调整。
可选地,计算阵列语音输入信号的最大子带功率比和检测阈值的步骤包括:
采用帧间回归平滑和频域平滑的方式估算阵列语音输入信号的固定波束输出功率谱,并采用帧间平滑和频域平滑的方式估算阵列语音输入信号的平均功率谱;
根据所述固定波束输出功率谱和平均功率谱的比值计算每个频点功率比;
以频点功率比最大的频点为中心,在设定宽度的子带范围内,根据所述子带范围内的平均功率比,采用帧间回归平滑的方式估计最大子带功率比。
可选的,所述固定波束输出功率谱计算公式为:
Figure PCTCN2014094542-appb-000001
其中,k为频点编号;λ为短时帧编号;
Figure PCTCN2014094542-appb-000002
为短时帧编号为λ时的当前帧波束输出信号功率谱;ax为第一回归系数;l1为预设定频点数,其中,0<ax<1,k,λ,b,l1分别为正整数;
所述阵列语音输入信号的平均功率谱的计算公式为:
Figure PCTCN2014094542-appb-000003
Figure PCTCN2014094542-appb-000004
为短时帧编号为λ时的当前帧的平均功率谱;ay为第二回归系数,0<ay<1;
所述每个频点功率比的计算公式为:
Figure PCTCN2014094542-appb-000005
所述最大子带功率谱比的计算公式为:
r(λ)=arr(λ-1)+(1-ar)r(λ);
r(λ-1)为r(λ)的上次计算结果,r(λ-1)初始值为设定宽度的子带范围内的平均功率比;ar为第三回归系数,0<aγ<1。
可选地,所述检测阈值调整状态包括有语音状态。
可选地,根据预先设定的判断条件,采用所述最大子带功率比和检测阈值判断当前所处的语音状态的步骤具体包括:
若当前处于语音开始状态且最大子带功率大于当前检测阈值、且连续处于语音开始状态的帧数大于设定的第一帧数阈值,则判断转入有语音状态;
若当前处于语音结束状态且最大子带功率大于当前检测阈值,则判断转入有语音状态。
可选地,根据预先设定的判断条件,采用所述最大子带功率比和检测阈值判断当前所处的语音状态的步骤还包括:
若当前处于无语音状态且最大子带功率比大于当前检测阈值,则判断转入语音开始状态;
若当前处于语音开始状态且最大子带功率小于等于当前检测阈值,则判断转入无语音状态;
若当前处于有语音状态且最大子带功率小于等于当前检测阈值,则判断转入语音结束状态;
若当前处于有语音状态且最大子带功率小于等于当前检测阈值、且连续处于语音结束状态的帧数大于设定的第二帧数阈值,则判断转入无语音状态。
可选地,根据阵列语音输入信号的固定波束输出功率谱和平均功率谱计算阵列语音输入信号的最大子带功率比的步骤之前,还包括:
接收通过语音采集设备输入的阵列语音输入信号;
对所述阵列语音输入信号进行加窗截短,并进行短时傅里叶变换处理,得到所述阵列语音输入信号的时频表示信号;
根据所述时频表示信号计算频域固定波束输出;
根据所述频域固定波束输出计算阵列当前语音帧平均功率谱和阵列当前语音帧波束输出信号功率谱;
根据所述阵列当前语音帧平均功率谱计算阵列语音输入信号的固定波束输出功率谱;并根据所述阵列当前语音帧波束输出信号功率谱计算阵列语音输入信号的平均功率谱。
可选地,所述固定波束输出采用所述原始阵列语音信号的时频表示信号乘以相应的预设波束参数;若采用A(k)表示阵元为ai(k)的矩阵,其中i=1……N,所述预设波束参数通过下述公式进行确定:
Figure PCTCN2014094542-appb-000006
约束条件为AH(k)d(k)=1,并且,
Figure PCTCN2014094542-appb-000007
Γ(k)为目标语音信号的理想扩散场归一化相干矩阵,该矩阵为N×N矩阵,其第n1行n2列元素为:
Figure PCTCN2014094542-appb-000008
上述关于Γ(k)的公式中,
Figure PCTCN2014094542-appb-000009
为第n1个麦克风和第n2个麦克风之间的距离,c是声速,K是短时傅里叶变换的长度;
WNGmin(k)是白噪声增益;
d(k)为目标声源到语音采集设备的空间导向矢量,其计算公式为:
Figure PCTCN2014094542-appb-000010
上述公式中,θ为目标声源到语音采集设备的方位角;d1……dN是第1到N个数字语音采集设备到数字语音采集设备阵列中心的距离;fs是采样频率,N为正整数。
可选地,当判断当前转入的语音状态为预设的检测阈值调整状态时,依据下述公式对检测阈值进行调整:
Figure PCTCN2014094542-appb-000011
其中,θ′(λ)为调整后的检测阈值;θL、θH分别为预设的语音检测阈值下限和上限;
Figure PCTCN2014094542-appb-000012
为有语音状态时对最大子带功率谱比进行缓慢回归平滑的值。
根据本发明的另一方面,本发明实施例还提供一种麦克风阵列语音检测装置,包括:
第一计算模块:设置为根据阵列语音输入信号的固定波束输出功率谱和平均功率谱计算阵列语音输入信号的最大子带功率比;
状态判断模块:设置为根据预先设定的判断条件,采用所述最大子带功率比和当前检测阈值判断当前所处的语音状态;
阈值调整模块:设置为当判断当前转入的语音状态为预设的检测阈值调整状态时,对检测阈值进行调整。
可选地,第一计算模块具体包括:
第一计算单元:设置为采用帧间回归平滑和频域平滑的方式估算阵列语音输入信号的固定波束输出功率谱,并采用帧间平滑和频域平滑的方式估算阵列语音输入信号的平均功率谱;
第二计算单元:设置为根据所述固定波束输出功率谱和平均功率谱的比值计算每个频点功率比;
第三计算单元:设置为以频点功率比最大的频点为中心,在设定宽度的子带范围内,根据所述子带范围内的平均功率比,采用帧间回归平滑的方式估计最大子带功率比。
可选地,所述固定波束输出功率谱计算公式为:
Figure PCTCN2014094542-appb-000013
其中,k为频点编号;λ为短时帧编号;
Figure PCTCN2014094542-appb-000014
为短时帧编号为λ时的当前帧波束输出信号功率谱;ax为第一回归系数;l1为预设定频点数,其中,0<ax<1,k,λ,b,l1分别为正整数;
所述阵列语音输入信号的平均功率谱的计算公式为:
Figure PCTCN2014094542-appb-000015
Figure PCTCN2014094542-appb-000016
为短时帧编号为λ时的当前帧的平均功率谱;ay为第二回归系数,0<ay<1;
所述每个频点功率比的计算公式为:
Figure PCTCN2014094542-appb-000017
所述最大子带功率谱比的计算公式为:
r(λ)=arr(λ-1)+(1-ar)r(λ);
r(λ-1)为r(λ)的上次计算结果,r(λ-1)初始值为设定宽度的子带范围内的平均功率比;ar为第三回归系数,0<aγ<1。
可选地,所述检测阈值调整状态包括有语音状态。
可选地,所述状态判断模块具体包括:
第一判断单元:设置为在当前处于语音开始状态且最大子带功率大于当前检测阈值、且连续处于语音开始状态的帧数大于设定的第一帧数阈值时,判断转入有语音状态;
和/或,第二判断单元:设置为在当前处于语音结束状态且最大子带功率大于当前检测阈值时,判断转入有语音状态。
可选的,所述状态判断模块还包括:
第三判断单元:设置为在前处于无语音状态且最大子带功率比大于当前检测阈值时,判断转入语音开始状态;
第四判断单元:设置为在当前处于语音开始状态且最大子带功率小于等于当前检测阈值时,判断转入无语音状态;
第五判断单元:设置为在当前处于有语音状态且最大子带功率小于等于当前检测阈值是,判断转入语音结束状态;
第六判断单元:设置为在当前处于有语音状态且最大子带功率小于等于当前检测阈值、且连续处于语音结束状态的帧数大于设定的第二帧数阈值时,判断转入无语音状态。
可选地,所述装置还包括:
信号接收模块:设置为接收通过语音采集设备输入的阵列语音输入信号;
信号变换模块:设置为对所述阵列语音输入信号进行加窗截短,并进行短时傅里叶变换处理,得到所述阵列语音输入信号的时频表示信号;
第二计算模块:设置为根据所述时频表示信号计算频域固定波束输出;
第三计算模块:设置为根据所述频域固定波束输出计算阵列当前帧平均功率谱和当前帧波束输出信号功率谱;
第四计算模块:设置为根据所述阵列当前帧平均功率谱计算阵列语音输入信号的固定波束输出功率谱;并根据所述当前帧波束输出信号功率谱计算阵列语音输入信号的平均功率谱。
可选地,所述固定波束输出采用所述原始阵列语音信号的时频表示信号乘以相应的预设波束参数;若采用A(k)表示阵元为ai(k)的矩阵,其中i=1……N,所述预设波束参数通过下述公式进行确定:
Figure PCTCN2014094542-appb-000018
约束条件为AH(k)d(k)=1,并且,
Figure PCTCN2014094542-appb-000019
Γ(k)为目标语音信号的理想扩散场归一化相干矩阵,该矩阵为N×N矩阵,其第n1行n2列元素为:
Figure PCTCN2014094542-appb-000020
上述关于Γ(k)的公式中,
Figure PCTCN2014094542-appb-000021
为第n1个麦克风和第n2个麦克风之间的距离,c是声速,K是短时傅里叶变换的长度;
WNGmin(k)是白噪声增益;
d(k)为目标声源到语音采集设备的空间导向矢量,其计算公式为:
Figure PCTCN2014094542-appb-000022
上述公式中,θ为目标声源到语音采集设备的方位角;d1……dN是第1到N个数字语音采集设备到数字语音采集设备阵列中心的距离;fs是采样频率,N为正整数。
可选地,所述阈值调整模块依据下述公式对检测阈值进行调整:
Figure PCTCN2014094542-appb-000023
其中,θ′(λ)为调整后的检测阈值;θL、θH分别为预设的语音检测阈值下限和上限;
Figure PCTCN2014094542-appb-000024
为有语音状态时对最大子带功率谱比进行缓慢回归平滑的值。
从上面所述可以看出,本发明实施例提供的麦克风语音检测方法及装置,在根据预设的条件判断处于语音状态时,对检测阈值进行调整,即使在变化的噪声环境中,也可以辅助确定检测阈值。此外,本发明实施例在语音检测过程中,根据预设的波束参数对语音信号进行处理,增强语音信号的指向性,降低噪声或其他语音信号对语音检测设备和系统造成的影响。
附图概述
图1为本发明一实施例的麦克风语音检测方法流程示意图;
图2为本发明一实施例中计算阵列语音输入信号的最大子带功率比和检测阈值的步骤的过程;
图3为本发明另一实施例包含的步骤示意图;
图4为本发明一实施例的状态转换示意图;
图5为本发明一实施例的麦克风语音检测装置结构示意图;
图6为本发明一实施例中计算频域固定波束输出时的信号流图;
图7为本发明一实施例中计算当前帧平均功率谱时的信号流图。
本发明的较佳实施方式
下文中将结合附图对本发明的实施例进行详细说明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互任意组合。另外,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
本发明实施例提供一种麦克风阵列语音检测方法,如图1所示,包括以下步骤:
步骤101:根据阵列语音输入信号的固定波束输出功率谱和平均功率谱 计算阵列语音输入信号的最大子带功率比;
步骤102:根据预先设定的判断条件,采用所述最大子带功率比和当前检测阈值判断当前所处的语音状态;
步骤103:当判断当前转入的语音状态为预设的检测阈值调整状态时,对检测阈值进行调整。
本发明实施例提供的麦克风阵列语音检测方法,按照预先设定的判断条件,对当前所处的语音状态进行判断,并在当前转入的语音状态为预设的检测阈值调整状态时,对检测阈值进行调整。在调整检测阈值时的语音状态下,阵列语音输入信号的最大子带功率比处于设定的范围,这样可以在变化的噪声环境中较为准确地确定检测阈值。
在本发明的一些实施例中,计算阵列语音输入信号的最大子带功率比和检测阈值的步骤,具体包括如图2所示的过程:
步骤201:采用帧间回归平滑和频域平滑的方式估算阵列语音输入信号的固定波束输出功率谱,并采用帧间平滑和频域平滑的方式估算阵列语音输入信号的平均功率谱。
步骤202:根据所述固定波束输出功率谱和平均功率谱的比值计算每个频点功率比。
步骤203:以频点功率比最大的频点为中心,在设定宽度的子带范围内,根据所述子带范围内的平均功率比,采用帧间回归平滑的方式估计最大子带功率比。
在一些实施例中,所述固定波束输出功率谱计算公式为:
Figure PCTCN2014094542-appb-000025
其中,k为频点编号;λ为短时帧编号;
Figure PCTCN2014094542-appb-000026
为频点编号为b且短时帧编号为λ时的当前帧波束输出信号功率谱;ax为第一回归系数;l1为预设定频点数;其中,0<ax<1,k,λ,b,l1分别为正整数。
所述阵列语音输入信号的平均功率谱的计算公式为:
Figure PCTCN2014094542-appb-000027
Figure PCTCN2014094542-appb-000028
为短时帧编号为λ时的当前帧的平均功率谱;ay为第二回归系数, 0<ay<1;
所述每个频点功率比的计算公式为:
Figure PCTCN2014094542-appb-000029
所述最大子带功率谱比的计算公式为:
r(λ)=arr(λ-1)+(1-ar)r(λ);
r(λ-1)为r(λ)的上次计算结果,r(λ-1)初始值为设定宽度的子带范围内的平均功率比;ar为第三回归系数,0<aγ<1。
在一些实施例中,所述检测阈值调整状态包括有语音状态。
在一些实施例中,根据预先设定的判断条件,采用所述最大子带功率比和检测阈值判断当前所处的语音状态的步骤具体包括:
若当前处于语音开始状态且最大子带功率大于当前检测阈值、且连续处于语音开始状态的帧数大于设定的第一帧数阈值,则判断转入有语音状态;
和/或,若当前处于语音结束状态且最大子带功率大于当前检测阈值,则判断转入有语音状态。
在一些实施例中,根据预先设定的判断条件,采用所述最大子带功率比和检测阈值判断当前所处的语音状态的步骤还包括:
若当前处于无语音状态且最大子带功率比大于当前检测阈值,则判断转入语音开始状态;
若当前处于语音开始状态且最大子带功率小于等于当前检测阈值,则判断转入无语音状态;
若当前处于有语音状态且最大子带功率小于等于当前检测阈值,则判断转入语音结束状态;
若当前处于有语音状态且最大子带功率小于等于当前检测阈值、且连续处于语音结束状态的帧数大于设定的第二帧数阈值,则判断转入无语音状态。
具体地,参照图4,设当前检测阈值为θ(λ)。采用两个计数器分别记录连续处于语音开始状态的帧数和连续处于语音结束状态的帧数,设连续处于语音状态的帧数为c1,连续处于语音结束状态的帧数为c2;则在本发明一实 施例中,采用所述最大子带功率比和检测阈值判断当前所处的语音状态的步骤包括如下过程:
若当前处于无语音状态且r(λ)>θ(λ),则判断从无语音状态转入语音开始状态;
若当前处于语音开始状态且r(λ)≤θ(λ),则判断从语音开始状态转入无语音状态;
预设连续处于语音状态的第一帧数阈值L1:若当前处于语音开始状态且r(λ)>θ(λ)、c1>L1,则判断从语音开始状态转入有语音状态,其中L1为经验值,取正整数;
若当前处于有语音状态且r(λ)≤θ(λ),则判断从有语音状态转入语音结束状态;
若当前处于语音结束状态且r(λ)>θ(λ),则判断从语音结束状态转入有语音状态;
预设连续处于语音结束状态的第二帧数阈值L2:若当前处于语音结束状态且r(λ)≤θ(λ)、c2>L2,则判断从语音结束状态转入无语音状态;其中L2为经验值,取正整数。
相关技术的语音检测技术在实际使用过程中,不仅检测阈值较难确定,而且当噪声或干扰声来自其他的语音信号时,检测系统可能完全失效。
为了适应复杂多变的环境噪声干扰,可选择主从麦克风和麦克风阵列作为拾音设备。主从麦克风采样两只不同指向性的麦克风,使目标方向信号在两只麦克风中产生功率差异,进而利用两只麦克风的功率比来进行目标语音检测,其关键在于主从麦克风设计以及目标语音方位。麦克风阵列则利用每个阵元的空间拓扑结构,形成特定的指向性波束,从而使波束内外信号产生功率差异,然后利用这一线索检测目标方向的信号。然而,相关技术中的主从麦克风拾音技术仍然存在问题:麦克风阵列波束不可避免会受到旁瓣的影响,且低频指向性很差;因此,相关技术的主从麦克风拾音过程中的语音检测技术实际使用时仍有许多问题需要解决。
在本发明的一些实施例中,根据阵列语音输入信号的固定波束输出功率谱和平均功率谱计算阵列语音输入信号的最大子带功率比的步骤之前,还包括如图3所示的过程:
步骤301:接收通过语音采集设备输入的阵列语音输入信号;
步骤302:对所述阵列语音输入信号进行加窗截短,并进行短时傅里叶变换处理,得到所述阵列语音输入信号的时频表示信号;
步骤303:根据所述时频表示信号计算频域固定波束输出;
步骤304:根据所述频域固定波束输出计算阵列当前语音帧平均功率谱和当前帧波束输出信号功率谱;
步骤305:根据所述阵列当前语音帧平均功率谱计算阵列语音输入信号的固定波束输出功率谱;并根据所述阵列当前语音帧波束输出信号功率谱计算阵列语音输入信号的平均功率谱。
具体地,作为一个实施例,在对所述原始阵列语音信号进行加窗截短时,采用汉宁窗,重叠3/4窗长;时间窗长度为Lwnd、相邻窗之间重叠Lovlp。对所述原始阵列语音信号进行短时傅里叶变换,得到原始语音阵列信号的时频表示信号:y1(k,λ)……yN(k,λ)。k为频点编号;λ为短时帧编号,k,λ为正整数。
更具体地,所述频域固定波束输出采用所述原始语音阵列信号的时频表示信号乘以相应的预设波束参数ai(k),即,所述频域固定波束输出为:
Figure PCTCN2014094542-appb-000030
N为正整数。
计算所述频域固定波束输出时的信号流图如图6所示。
通过计算频域固定波束输出,可增强波束的指向性,降低噪声干扰或其它语音干扰对系统检测造成的影响。在上述频域固定波束计算公式中,取原始语音阵列信号的时频表示信号乘以相应的预设波束参数计算结果和y1(k,λ)中的最小值,可有效避免波束稳健性不够导致低频异常放大。
所述波束参数的设计好坏可能会直接影响波束内外信号的功率比,在本发明的一个具体实施例中,采用最优频域波束参数设计方法,在满足阵列白 噪声增益小于15dB的条件下,设计频域的最优超指向性波束参数。若采用A(k)表示阵元为ai(k)的矩阵,其中i=1……N,则所述最优超指向性波束参数为:
Figure PCTCN2014094542-appb-000031
约束条件为AH(k)d(k)=1,并且,
Figure PCTCN2014094542-appb-000032
Γ(k)为目标语音信号的理想扩散场归一化相干矩阵,该矩阵为N×N矩阵,其第n1行n2列元素为:
Figure PCTCN2014094542-appb-000033
上述公式中,
Figure PCTCN2014094542-appb-000034
为第n1个麦克风和第n2个麦克风之间的距离,c是声速,K是短时傅里叶变换的长度。
WNGmin(k)是白噪声增益。
d(k)为目标声源到语音采集设备的空间导向矢量,其计算公式为:
Figure PCTCN2014094542-appb-000035
上述公式中,θ为目标声源到语音采集设备的方位角;d1……dN是第1到N个数字语音采集设备到数字语音采集设备阵列中心的距离;fs是采样频率,N为正整数。
最优超指向性波束参数可以采用第三方开源凸优化软件进行设计,如CVX和SeDuMi等。
更具体地,所述当前帧波束输出信号功率谱计算公式为:
Figure PCTCN2014094542-appb-000036
更具体地,所述当前帧平均功率谱计算过程计算公式为:
Figure PCTCN2014094542-appb-000037
在一些实施例中,当判断当前转入的语音状态为预设的检测阈值调整状态时,依据下述公式对检测阈值进行调整:
Figure PCTCN2014094542-appb-000038
其中,θ′(λ)为调整后的检测阈值;θL、θH分别为预设的语音检测阈值下限和上限;
Figure PCTCN2014094542-appb-000039
为有语音状态时对最大子带功率谱比进行缓慢回归平滑的值,0<θL<1,0<θH<1。
具体地,当判断处于语音状态时,先采用下述公式对最大子带功率谱比进行缓慢回归平滑,
Figure PCTCN2014094542-appb-000040
其中,a0为回归平滑系数;
Figure PCTCN2014094542-appb-000041
为对最大带子带功率谱比进行缓慢回归平滑后的值,其中,0<a0<1。
计算出上述最大子带功率谱比缓慢回归平滑后的值之后,根据下述公式采用最小最大方法调整检测阈值:
Figure PCTCN2014094542-appb-000042
在一具体实施例中,当语音检测装置的采样率为16kHz时,上述实施例中所提到的参数可参考下列数值:
N=6;Lwnd=32ms;Lovlp=24ms;c=340m/s;fs=16000Hz;WNGmin(k)=15dB;a0=0.99;ax=0.8;ay=0.8;ar=0.8;L1=10;L2=150;θL=0.25;θH=0.3。
本发明实施例还提供一种麦克风阵列语音检测装置,如图5所示,包括:
第一计算模块:设置为根据阵列语音输入信号的固定波束输出功率谱和平均功率谱计算阵列语音输入信号的最大子带功率比;
状态判断模块:设置为根据预先设定的判断条件,采用所述最大子带功率比和当前检测阈值判断当前所处的语音状态;
阈值调整模块:设置为当判断当前转入的语音状态为预设的检测阈值调整状态时,对检测阈值进行调整。
仍然参照图5,在一些实施例中,第一计算模块具体包括:
第一计算单元:设置为采用帧间回归平滑和频域平滑的方式估算阵列语音输入信号的固定波束输出功率谱,并采用帧间平滑和频域平滑的方式估算阵列语音输入信号的平均功率谱;
第二计算单元:设置为根据所述固定波束输出功率谱和平均功率谱的比值计算每个频点功率比;
第三计算单元:设置为以频点功率比最大的频点为中心,在设定宽度的子带范围内,根据所述子带范围内的平均功率比,采用帧间回归平滑的方式估计最大子带功率比。
在一些实施例中,所述固定波束输出功率谱计算公式为:
Figure PCTCN2014094542-appb-000043
其中,k为频点编号;λ为短时帧编号;
Figure PCTCN2014094542-appb-000044
为短时帧编号为λ时的当前帧波束输出信号功率谱;ax为第一回归系数;l1为预设定频点数;其中,0<ax<1,k,λ,b,l1分别为正整数。
所述阵列语音输入信号的平均功率谱的计算公式为:
Figure PCTCN2014094542-appb-000045
Figure PCTCN2014094542-appb-000046
为短时帧编号为λ时的当前帧的平均功率谱;ay为第二回归系数,0<ay<1;
所述每个频点功率比的计算公式为:
Figure PCTCN2014094542-appb-000047
所述最大子带功率谱比的计算公式为:
r(λ)=arr(λ-1)+(1-ar)r(λ);
r(λ-1)为r(λ)的上次计算结果,r(λ-1)初始值为设定宽度的子带范围内的平均功率比;ar为第三回归系数,0<aγ<1。
在一些实施例中,所述检测阈值调整状态包括有语音状态。
在一些实施例中,所述状态判断模块具体包括:
第一判断单元:设置为在当前处于语音开始状态且最大子带功率大于当前检测阈值、且连续处于语音开始状态的帧数大于设定的第一帧数阈值时,判断转入有语音状态;
和/或,第二判断单元:设置为在当前处于语音结束状态且最大子带功率大于当前检测阈值时,判断转入有语音状态。
在一些实施例中,所述状态判断模块还包括:
第三判断单元:设置为在前处于无语音状态且最大子带功率比大于当前检测阈值时,判断转入语音开始状态;
第四判断单元:设置为在当前处于语音开始状态且最大子带功率小于等于当前检测阈值时,判断转入无语音状态;
第五判断单元:设置为在当前处于有语音状态且最大子带功率小于等于当前检测阈值是,判断转入语音结束状态;
第六判断单元:设置为在当前处于有语音状态且最大子带功率小于等于当前检测阈值、且连续处于语音结束状态的帧数大于设定的第二帧数阈值时,判断转入无语音状态。
仍然参照图5,在一些实施例中,所述装置还包括:
信号接收模块:设置为接收通过语音采集设备输入的阵列语音输入信号;
信号变换模块:设置为对所述阵列语音输入信号进行加窗截短,并进行短时傅里叶变换处理,得到所述阵列语音输入信号的时频表示信号;
第二计算模块:设置为根据所述时频表示信号计算频域固定波束输出;
第三计算模块:设置为根据所述频域固定波束输出计算阵列当前帧平均功率谱和当前帧波束输出信号功率谱;
第四计算模块:设置为根据所述阵列当前帧平均功率谱计算阵列语音输入信号的固定波束输出功率谱;并根据所述当前帧波束输出信号功率谱计算阵列语音输入信号的平均功率谱。
在一些实施例中,所述固定波束输出采用所述原始阵列语音信号的时频表示信号乘以相应的预设波束参数;若采用A(k)表示阵元为ai(k)的矩阵, 其中i=1……N,所述预设波束参数通过下述公式进行确定:
Figure PCTCN2014094542-appb-000048
约束条件为AH(k)d(k)=1,并且,
Figure PCTCN2014094542-appb-000049
Γ(k)为目标语音信号的理想扩散场归一化相干矩阵,该矩阵为N×N矩阵,其第n1行n2列元素为:
Figure PCTCN2014094542-appb-000050
上述关于Γ(k)的公式中,
Figure PCTCN2014094542-appb-000051
为第n1个麦克风和第n2个麦克风之间的距离,c是声速,K是短时傅里叶变换的长度;
WNGmin(k)是白噪声增益;
d(k)为目标声源到语音采集设备的空间导向矢量,其计算公式为:
Figure PCTCN2014094542-appb-000052
上述公式中,θ为目标声源到语音采集设备的方位角;d1……dN是第1到N个数字语音采集设备到数字语音采集设备阵列中心的距离;fs是采样频率,N为正整数。
从上面所述可以看出,本发明实施例提供的麦克风阵列语音检测方法和装置,根据预设的条件判断处于语音状态时,对检测阈值进行调整,即使在变化的噪声环境中,也可以辅助确定检测阈值。此外,本发明实施例在语音检测过程中,根据预设的波束参数对语音信号进行处理,增强语音信号的指向性,降低噪声或其他语音信号对语音检测设备和系统造成的影响。
本领域普通技术人员可以理解上述实施例的全部或部分步骤可以使用计算机程序流程来实现,所述计算机程序可以存储于一计算机可读存储介质中,所述计算机程序在相应的硬件平台上(如系统、设备、装置、器件等)执行,在执行时,包括方法实施例的步骤之一或其组合。
可选地,上述实施例的全部或部分步骤也可以使用集成电路来实现,这些步骤可以被分别制作成一个个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。
上述实施例中的每装置/功能模块/功能单元可以采用通用的计算装置来实现,它们可以集中在单个的计算装置上,也可以分布在多个计算装置所组成的网络上。
上述实施例中的每装置/功能模块/功能单元以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。上述提到的计算机可读取存储介质可以是只读存储器,磁盘或光盘等。
任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求所述的保护范围为准。
工业实用性
本发明实施例提供的麦克风阵列语音检测方法和装置,根据预设的条件判断处于语音状态时,对检测阈值进行调整,即使在变化的噪声环境中,也可以辅助确定检测阈值。此外,本发明实施例在语音检测过程中,根据预设的波束参数对语音信号进行处理,增强语音信号的指向性,降低噪声或其他语音信号对语音检测设备和系统造成的影响。

Claims (18)

  1. 一种麦克风阵列语音检测方法,包括:
    根据阵列语音输入信号的固定波束输出功率谱和平均功率谱计算阵列语音输入信号的最大子带功率比;
    根据预先设定的判断条件,采用所述最大子带功率比和当前检测阈值判断当前所处的语音状态;
    当判断当前转入的语音状态为预设的检测阈值调整状态时,对检测阈值进行调整。
  2. 根据权利要求1所述的方法,其中,所述计算阵列语音输入信号的最大子带功率比和检测阈值的步骤包括:
    采用帧间回归平滑和频域平滑的方式估算阵列语音输入信号的固定波束输出功率谱,并采用帧间平滑和频域平滑的方式估算阵列语音输入信号的平均功率谱;
    根据所述固定波束输出功率谱和平均功率谱的比值计算每个频点功率比;
    以频点功率比最大的频点为中心,在设定宽度的子带范围内,根据所述子带范围内的平均功率比,采用帧间回归平滑的方式估计最大子带功率比。
  3. 根据权利要求2所述的方法,其中,所述固定波束输出功率谱计算公式为:
    Figure PCTCN2014094542-appb-100001
    其中,k为频点编号;λ为短时帧编号;
    Figure PCTCN2014094542-appb-100002
    为频点编号为b且短时帧编号为λ时的阵列当前语音帧波束输出信号功率谱;ax为第一回归系数;l1为预设定频点数;其中,0<ax<1,k,λ,b,l1分别为正整数;
    所述阵列语音输入信号的平均功率谱的计算公式为:
    Figure PCTCN2014094542-appb-100003
    Figure PCTCN2014094542-appb-100004
    为频点编号为b且短时帧编号为λ时的阵列当前语音帧的平均功率谱;ay为第二回归系数,0<ay<1;
    所述每个频点功率比的计算公式为:
    Figure PCTCN2014094542-appb-100005
    所述最大子带功率谱比的计算公式为:
    r(λ)=arr(λ-1)+(1-ar)r(λ);
    r(λ-1)为r(λ)的上次计算结果,r(λ-1)初始值为设定宽度的子带范围内的平均功率比;ar为第三回归系数,0<aγ<1。
  4. 根据权利要求1所述的方法,其中,所述检测阈值调整状态包括有语音状态。
  5. 根据权利要求4所述的方法,其中,所述根据预先设定的判断条件,采用所述最大子带功率比和检测阈值判断当前所处的语音状态的步骤包括:
    若当前处于语音开始状态且最大子带功率大于当前检测阈值、且连续处于语音开始状态的帧数大于设定的第一帧数阈值,则判断转入有语音状态;
    若当前处于语音结束状态且最大子带功率大于当前检测阈值,则判断转入有语音状态。
  6. 根据权利要求4所述的方法,其中,所述根据预先设定的判断条件,采用所述最大子带功率比和检测阈值判断当前所处的语音状态的步骤还包括:
    若当前处于无语音状态且最大子带功率比大于当前检测阈值,则判断转入语音开始状态;
    若当前处于语音开始状态且最大子带功率小于等于当前检测阈值,则判断转入无语音状态;
    若当前处于有语音状态且最大子带功率小于等于当前检测阈值,则判断转入语音结束状态;
    若当前处于有语音状态且最大子带功率小于等于当前检测阈值、且连续处于语音结束状态的帧数大于设定的第二帧数阈值,则判断转入无语音状态。
  7. 根据权利要求1所述的方法,其中,在根据阵列语音输入信号的固定波束输出功率谱和平均功率谱计算阵列语音输入信号的最大子带功率比的步骤之前,还包括:
    接收通过语音采集设备输入的阵列语音输入信号;
    对所述阵列语音输入信号进行加窗截短,并进行短时傅里叶变换处理,得到所述阵列语音输入信号的时频表示信号;
    根据所述时频表示信号计算频域固定波束输出;
    根据所述频域固定波束输出计算阵列当前语音帧平均功率谱和阵列当前语音帧波束输出信号功率谱;
    根据所述阵列当前语音帧平均功率谱计算阵列语音输入信号的固定波束输出功率谱;并根据所述阵列当前语音帧波束输出信号功率谱计算阵列语音输入信号的平均功率谱。
  8. 根据权利要求7所述的方法,其中,所述固定波束输出采用所述原始阵列语音信号的时频表示信号乘以相应的预设波束参数;
    所述预设波束参数通过下述公式进行确定:
    Figure PCTCN2014094542-appb-100006
    约束条件为AH(k)d(k)=1,并且,
    Figure PCTCN2014094542-appb-100007
    Γ(k)为目标语音信号的理想扩散场归一化相干矩阵,该矩阵为N×N矩阵,其第n1行n2列元素为:
    Figure PCTCN2014094542-appb-100008
    上述关于Γ(k)的公式中,
    Figure PCTCN2014094542-appb-100009
    为第n1个麦克风和第n2个麦克风之间的距 离,c是声速,K是短时傅里叶变换的长度;
    WNGmin(k)是白噪声增益;
    d(k)为目标声源到语音采集设备的空间导向矢量,其计算公式为:
    Figure PCTCN2014094542-appb-100010
    上述公式中,θ为目标声源到语音采集设备的方位角;d1……dN是第1到N个数字语音采集设备到数字语音采集设备阵列中心的距离;fs是采样频率,N为正整数。
  9. 根据权利要求1所述的方法,其中,当判断当前转入的语音状态为预设的检测阈值调整状态时,依据下述公式对检测阈值进行调整:
    Figure PCTCN2014094542-appb-100011
    其中,θ′(λ)为调整后的检测阈值;θL、θH分别为预设的语音检测阈值下限和上限;
    Figure PCTCN2014094542-appb-100012
    为有语音状态时对最大子带功率谱比进行缓慢回归平滑的值,0<θL<1,0<θH<1。
  10. 一种麦克风阵列语音检测装置,包括:
    第一计算模块:设置为根据阵列语音输入信号的固定波束输出功率谱和平均功率谱计算阵列语音输入信号的最大子带功率比;
    状态判断模块:设置为根据预先设定的判断条件,采用所述最大子带功率比和当前检测阈值判断当前所处的语音状态;
    阈值调整模块:设置为当判断当前转入的语音状态为预设的检测阈值调整状态时,对检测阈值进行调整。
  11. 根据权利要求10所述的装置,其中,第一计算模块包括:
    第一计算单元:设置为采用帧间回归平滑和频域平滑的方式估算阵列语音输入信号的固定波束输出功率谱,并采用帧间平滑和频域平滑的方式估算阵列语音输入信号的平均功率谱;
    第二计算单元:设置为根据所述固定波束输出功率谱和平均功率谱的比值计算每个频点功率比;
    第三计算单元:设置为以频点功率比最大的频点为中心,在设定宽度的子带范围内,根据所述子带范围内的平均功率比,采用帧间回归平滑的方式估计最大子带功率比。
  12. 根据权利要求11所述的装置,其中,所述固定波束输出功率谱计算公式为:
    Figure PCTCN2014094542-appb-100013
    其中,k为频点编号;λ为短时帧编号;
    Figure PCTCN2014094542-appb-100014
    为频点编号为b且短时帧编号为λ时的当前帧波束输出信号功率谱;ax为第一回归系数;l1为预设定频点数;其中,0<ax<1,k,λ,b,l1分别为正整数;
    所述阵列语音输入信号的平均功率谱的计算公式为:
    Figure PCTCN2014094542-appb-100015
    率谱;ay为第二回归系数,0<ay<1;
    所述每个频点功率比的计算公式为:
    Figure PCTCN2014094542-appb-100016
    所述最大子带功率谱比的计算公式为:
    r(λ)=arr(λ-1)+(1-ar)r(λ);
    r(λ-1)为r(λ)的上次计算结果,其初始值为设定宽度的子带范围内的平均功率比;ar为第三回归系数,0<aγ<1。
  13. 根据权利要求10所述的装置,其中,所述检测阈值调整状态包括有语音状态。
  14. 根据权利要求13所述的装置,其中,所述状态判断模块包括:
    第一判断单元:设置为在当前处于语音开始状态且最大子带功率大于当前检测阈值、且连续处于语音开始状态的帧数大于设定的第一帧数阈值时, 判断转入有语音状态;
    第二判断单元:设置为在当前处于语音结束状态且最大子带功率大于当前检测阈值时,判断转入有语音状态。
  15. 根据权利要求14所述的装置,其中,所述状态判断模块还包括:
    第三判断单元:设置为在前处于无语音状态且最大子带功率比大于当前检测阈值时,判断转入语音开始状态;
    第四判断单元:设置为在当前处于语音开始状态且最大子带功率小于等于当前检测阈值时,判断转入无语音状态;
    第五判断单元:设置为在当前处于有语音状态且最大子带功率小于等于当前检测阈值是,判断转入语音结束状态;
    第六判断单元:设置为在当前处于有语音状态且最大子带功率小于等于当前检测阈值、且连续处于语音结束状态的帧数大于设定的第二帧数阈值时,判断转入无语音状态。
  16. 根据权利要求10所述的装置,其中,所述装置还包括:
    信号接收模块:设置为接收通过语音采集设备输入的阵列语音输入信号;
    信号变换模块:设置为对所述阵列语音输入信号进行加窗截短,并进行短时傅里叶变换处理,得到所述阵列语音输入信号的时频表示信号;
    第二计算模块:设置为根据所述时频表示信号计算频域固定波束输出;
    第三计算模块:设置为根据所述频域固定波束输出计算阵列当前帧平均功率谱和当前帧波束输出信号功率谱;
    第四计算模块:设置为根据所述阵列当前帧平均功率谱计算阵列语音输入信号的固定波束输出功率谱;并根据所述当前帧波束输出信号功率谱计算阵列语音输入信号的平均功率谱。
  17. 根据权利要求16所述的装置,其中,所述固定波束输出采用所述原始阵列语音信号的时频表示信号乘以相应的预设波束参数;
    所述预设波束参数通过下述公式进行确定:
    Figure PCTCN2014094542-appb-100017
    约束条件为AH(k)d(k)=1,并且,
    Figure PCTCN2014094542-appb-100018
    Γ(k)为目标语音信号的理想扩散场归一化相干矩阵,该矩阵为N×N矩阵,其第n1行n2列元素为:
    Figure PCTCN2014094542-appb-100019
    上述关于Γ(k)的公式中,
    Figure PCTCN2014094542-appb-100020
    为第n1个麦克风和第n2个麦克风之间的距离,c是声速,K是短时傅里叶变换的长度;
    WNGmin(k)是白噪声增益;
    d(k)为目标声源到语音采集设备的空间导向矢量,其计算公式为:
    Figure PCTCN2014094542-appb-100021
    上述公式中,θ为目标声源到语音采集设备的方位角;d1……dN是第1到N个数字语音采集设备到数字语音采集设备阵列中心的距离;fs是采样频率,N为正整数。
  18. 根据权利要求10所述的装置,其中,所述阈值调整模块依据下述公式对检测阈值进行调整:
    Figure PCTCN2014094542-appb-100022
    其中,θ′(λ)为调整后的检测阈值;θL、θH分别为预设的语音检测阈值下限和上限;
    Figure PCTCN2014094542-appb-100023
    为有语音状态时对最大子带功率谱比进行缓慢回归平滑的值,0<θL<1,0<θH<1。
PCT/CN2014/094542 2014-06-27 2014-12-22 一种麦克风阵列语音检测方法及装置 WO2015196760A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410305486.XA CN105321528B (zh) 2014-06-27 2014-06-27 一种麦克风阵列语音检测方法及装置
CN201410305486.X 2014-06-27

Publications (1)

Publication Number Publication Date
WO2015196760A1 true WO2015196760A1 (zh) 2015-12-30

Family

ID=54936666

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/094542 WO2015196760A1 (zh) 2014-06-27 2014-12-22 一种麦克风阵列语音检测方法及装置

Country Status (2)

Country Link
CN (1) CN105321528B (zh)
WO (1) WO2015196760A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2554943A (en) * 2016-10-16 2018-04-18 Sentimoto Ltd Voice activity detection method and apparatus
CN112629639A (zh) * 2020-12-02 2021-04-09 西北工业大学 一种吊放声纳十二臂扩展式超指向性圆环阵
CN113488076A (zh) * 2021-06-30 2021-10-08 北京小米移动软件有限公司 音频信号处理方法及装置
CN113891228A (zh) * 2021-09-24 2022-01-04 珠海格力电器股份有限公司 麦克风故障检测方法及装置、控制设备、空调、存储介质
CN115061086A (zh) * 2022-05-12 2022-09-16 上海事凡物联网科技有限公司 一种基于微孔径麦克风阵列的运动目标检测方法

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10847173B2 (en) 2018-02-13 2020-11-24 Intel Corporation Selection between signal sources based upon calculated signal to noise ratio
WO2019232801A1 (en) * 2018-06-08 2019-12-12 Nokia Shanghai Bell Co., Ltd. Noise floor estimation for signal detection
CN109068012B (zh) * 2018-07-06 2021-04-27 南京时保联信息科技有限公司 一种用于音频会议系统的双端通话检测方法
CN110830643B (zh) * 2018-08-14 2021-11-16 西安中兴新软件有限责任公司 一种语音信号处理方法及装置、存储介质
TWI700004B (zh) * 2018-11-05 2020-07-21 塞席爾商元鼎音訊股份有限公司 減少干擾音影響之方法及聲音播放裝置
CN110049423A (zh) * 2019-04-22 2019-07-23 福州瑞芯微电子股份有限公司 一种利用广义互相关和能量谱检测麦克风的方法和系统
CN112133299B (zh) * 2019-06-25 2021-08-27 大众问问(北京)信息科技有限公司 一种声音信号的处理方法、装置及设备
CN111064856A (zh) * 2019-12-25 2020-04-24 欣诚信息技术有限公司 基于移动互联网的远程智能取证系统及方法
CN112562735B (zh) * 2020-11-27 2023-03-24 锐迪科微电子(上海)有限公司 语音检测方法、装置、设备和存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0127718A1 (fr) * 1983-06-07 1984-12-12 International Business Machines Corporation Procédé de détection d'activité dans un système de transmission de la voix
JPH11133997A (ja) * 1997-11-04 1999-05-21 Matsushita Electric Ind Co Ltd 有音無音判定装置
JP2008170789A (ja) * 2007-01-12 2008-07-24 Raytron:Kk 音声区間検出装置および音声区間検出方法
CN101790752A (zh) * 2007-09-28 2010-07-28 高通股份有限公司 多麦克风声音活动检测器
CN101968957A (zh) * 2010-10-28 2011-02-09 哈尔滨工程大学 一种噪声条件下的语音检测方法
CN102804261A (zh) * 2009-10-19 2012-11-28 瑞典爱立信有限公司 用于语音编码器的方法和语音活动检测器
CN103824563A (zh) * 2014-02-21 2014-05-28 深圳市微纳集成电路与系统应用研究院 一种基于模块复用的助听器去噪装置和方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0386765B1 (en) * 1989-03-10 1994-08-24 Nippon Telegraph And Telephone Corporation Method of detecting acoustic signal
EP1581026B1 (en) * 2004-03-17 2015-11-11 Nuance Communications, Inc. Method for detecting and reducing noise from a microphone array
JP4867798B2 (ja) * 2007-06-05 2012-02-01 ヤマハ株式会社 音声検出装置、音声会議システムおよび遠隔会議システム
US8898058B2 (en) * 2010-10-25 2014-11-25 Qualcomm Incorporated Systems, methods, and apparatus for voice activity detection
CN102509552B (zh) * 2011-10-21 2013-09-11 浙江大学 一种基于联合抑制的麦克风阵列语音增强方法
CN103165137B (zh) * 2011-12-19 2015-05-06 中国科学院声学研究所 一种非平稳噪声环境下传声器阵列的语音增强方法
CN103268766B (zh) * 2013-05-17 2015-07-01 泰凌微电子(上海)有限公司 双麦克风语音增强方法及装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0127718A1 (fr) * 1983-06-07 1984-12-12 International Business Machines Corporation Procédé de détection d'activité dans un système de transmission de la voix
JPH11133997A (ja) * 1997-11-04 1999-05-21 Matsushita Electric Ind Co Ltd 有音無音判定装置
JP2008170789A (ja) * 2007-01-12 2008-07-24 Raytron:Kk 音声区間検出装置および音声区間検出方法
CN101790752A (zh) * 2007-09-28 2010-07-28 高通股份有限公司 多麦克风声音活动检测器
CN102804261A (zh) * 2009-10-19 2012-11-28 瑞典爱立信有限公司 用于语音编码器的方法和语音活动检测器
CN101968957A (zh) * 2010-10-28 2011-02-09 哈尔滨工程大学 一种噪声条件下的语音检测方法
CN103824563A (zh) * 2014-02-21 2014-05-28 深圳市微纳集成电路与系统应用研究院 一种基于模块复用的助听器去噪装置和方法

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2554943A (en) * 2016-10-16 2018-04-18 Sentimoto Ltd Voice activity detection method and apparatus
CN112629639A (zh) * 2020-12-02 2021-04-09 西北工业大学 一种吊放声纳十二臂扩展式超指向性圆环阵
CN113488076A (zh) * 2021-06-30 2021-10-08 北京小米移动软件有限公司 音频信号处理方法及装置
CN113891228A (zh) * 2021-09-24 2022-01-04 珠海格力电器股份有限公司 麦克风故障检测方法及装置、控制设备、空调、存储介质
CN115061086A (zh) * 2022-05-12 2022-09-16 上海事凡物联网科技有限公司 一种基于微孔径麦克风阵列的运动目标检测方法

Also Published As

Publication number Publication date
CN105321528A (zh) 2016-02-10
CN105321528B (zh) 2019-11-05

Similar Documents

Publication Publication Date Title
WO2015196760A1 (zh) 一种麦克风阵列语音检测方法及装置
JP7011075B2 (ja) マイク・アレイに基づく対象音声取得方法及び装置
US11395061B2 (en) Signal processing apparatus and signal processing method
EP3172906B1 (en) Method and apparatus for wind noise detection
US10602267B2 (en) Sound signal processing apparatus and method for enhancing a sound signal
CN111418010B (zh) 一种多麦克风降噪方法、装置及终端设备
US10504539B2 (en) Voice activity detection systems and methods
WO2015196729A1 (zh) 一种麦克风阵列语音增强方法及装置
WO2020108614A1 (zh) 音频识别方法、定位目标音频的方法、装置和设备
US8898058B2 (en) Systems, methods, and apparatus for voice activity detection
US8620672B2 (en) Systems, methods, apparatus, and computer-readable media for phase-based processing of multichannel signal
US9197177B2 (en) Method and implementation apparatus for intelligently controlling volume of electronic device
TWI398855B (zh) 多重麥克風聲音活動偵測器
US9959886B2 (en) Spectral comb voice activity detection
CN103426440A (zh) 利用能量谱熵空间信息的语音端点检测装置及其检测方法
CN104464722A (zh) 基于时域和频域的语音活性检测方法和设备
US11749294B2 (en) Directional speech separation
EP3757993A1 (en) Pre-processing for automatic speech recognition
CN110169082B (zh) 用于组合音频信号输出的方法和装置、及计算机可读介质
US11610601B2 (en) Method and apparatus for determining speech presence probability and electronic device
Moghimi et al. An analysis of binaural spectro-temporal masking as nonlinear beamforming
Zhang et al. A robust speech enhancement method based on microphone array
Kako et al. Wiener filter design by estimating sensitivities between distributed asynchronous microphones and sound sources
KR101817421B1 (ko) 두 채널 구조에 기초하는 사전 음성 부재 확률의 추정 방법
Shanmugapriya et al. A thorough investigation on speech enhancement techniques for hearing aids

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14896238

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14896238

Country of ref document: EP

Kind code of ref document: A1