WO2020078210A1 - 混响语音信号中后混响功率谱的自适应估计方法及装置 - Google Patents
混响语音信号中后混响功率谱的自适应估计方法及装置 Download PDFInfo
- Publication number
- WO2020078210A1 WO2020078210A1 PCT/CN2019/109285 CN2019109285W WO2020078210A1 WO 2020078210 A1 WO2020078210 A1 WO 2020078210A1 CN 2019109285 W CN2019109285 W CN 2019109285W WO 2020078210 A1 WO2020078210 A1 WO 2020078210A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sub
- reverberation
- band
- power spectrum
- frame
- Prior art date
Links
- 238000001228 spectrum Methods 0.000 title claims abstract description 396
- 238000000034 method Methods 0.000 title claims abstract description 128
- 230000003044 adaptive effect Effects 0.000 title claims abstract description 36
- 238000001914 filtration Methods 0.000 claims description 110
- 230000008569 process Effects 0.000 claims description 66
- 230000004044 response Effects 0.000 claims description 39
- 238000004590 computer program Methods 0.000 claims description 35
- 238000009499 grossing Methods 0.000 claims description 23
- 238000012545 processing Methods 0.000 claims description 21
- 238000003491 array Methods 0.000 claims description 10
- 230000003595 spectral effect Effects 0.000 claims description 6
- 230000003111 delayed effect Effects 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 21
- 230000006870 function Effects 0.000 description 15
- 230000001629 suppression Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 6
- 235000019800 disodium phosphate Nutrition 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000005215 recombination Methods 0.000 description 2
- 230000006798 recombination Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- KLDZYURQCUYZBL-UHFFFAOYSA-N 2-[3-[(2-hydroxyphenyl)methylideneamino]propyliminomethyl]phenol Chemical compound OC1=CC=CC=C1C=NCCCN=CC1=CC=CC=C1O KLDZYURQCUYZBL-UHFFFAOYSA-N 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 201000001098 delayed sleep phase syndrome Diseases 0.000 description 1
- 208000033921 delayed sleep phase type circadian rhythm sleep disease Diseases 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0264—Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
Definitions
- the present disclosure relates to the field of speech signal processing, and in particular, to an adaptive estimation method and device for post-reverberation power spectrum in a reverberated speech signal.
- Reverberation In the far field, the voice signal picked up by the indoor microphone is inevitably interfered by the reflected signals from the indoor walls, ceiling and other obstacles, so linear singularity will occur. This kind of singularity is usually called reverberation, which will degrade the fidelity and intelligibility of speech, so that the performance of the speech communication system and the automatic speech recognition system will be reduced; The distance increases.
- Reverberation usually consists of early reverberation (i.e., pre-reverberation, which contains direct sound components) and late reverberation (i.e., post-reverberation).
- the voice signal dereverberation technology in the related art has the problems of high cost of the actual product, difficulty in structural design, limited dereverberation performance, or consumption of more computing resources.
- Embodiments of the present disclosure provide an adaptive estimation method and device for the post-reverberation power spectrum in a reverberated speech signal, to solve the problem that the voice signal dereverberation technology in the related art has high actual product cost, structural design difficulties, and demixing.
- the performance of the ringing is limited or it consumes more computing resources, which cannot effectively ensure the problem of dereverberation of the voice signal.
- an embodiment of the present disclosure provides an adaptive estimation method of the post-reverberation power spectrum in a reverberated speech signal, including:
- the post-reverberation sub-band self-power spectrum estimation is obtained.
- the obtaining an estimate of the sub-band self-power spectrum of the reverberation speech signal picked up by the microphone includes:
- the acquiring the linear prediction DLP prediction coefficient vector for the delay of the self-power spectrum estimation of the post-reverberation subband in the reverberation speech signal includes:
- Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the reverberation speech signal of the k-th sub-band of the t-th frame; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the
- the obtaining the post-reverberation sub-band self-power spectrum estimation according to the sub-band self-power spectrum estimation of the reverberation speech signal and the DLP prediction coefficient vector includes:
- the obtaining an estimate of the sub-band self-power spectrum of the reverberation speech signal picked up by the microphone includes:
- an estimate of the sub-band self-power spectrum of the mono output signal of the reverberation speech signal after the spatial filtering process is obtained.
- the acquiring the sub-band spectrum of the mono output signal of the reverberation speech signal picked up by the microphone array after spatial filtering includes:
- Z (t, k) is the subband spectrum of the mono output signal after the spatial filtering process of the kth subband of the tth frame
- X r (t, k) is the first subband of the kth subband of the tth frame Subband spectrum of r microphone output signals
- M is the total number of microphone arrays
- m 1, 2, ..., M
- t is the time index of the signal frame
- k is the subband index.
- the obtaining the estimation of the sub-band self-power spectrum of the mono output signal of the reverberation speech signal after the spatial filtering process according to the sub-band spectrum of the mono output signal includes:
- the acquiring the linear prediction DLP prediction coefficient vector for the delay of the self-power spectrum estimation of the post-reverberation subband in the reverberation speech signal includes:
- Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the mono output signal of the k-th sub-band of the t frame after spatial filtering; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the vector
- the obtaining the post-reverberation sub-band self-power spectrum estimation according to the sub-band self-power spectrum estimation and the DLP prediction coefficient includes:
- An embodiment of the present disclosure also provides an adaptive estimation device for a post-reverberation power spectrum in a reverberation speech signal, including a memory, a processor, and a computer program stored on the memory and executable on the processor; wherein , The processor implements the following steps when executing the computer program:
- the post-reverberation sub-band self-power spectrum estimation is obtained.
- the processor implements the following steps when executing the computer program:
- Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the reverberation speech signal of the k-th sub-band of the t-th frame; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the
- the processor implements the following steps when executing the computer program:
- an estimate of the sub-band self-power spectrum of the mono output signal of the reverberation speech signal after the spatial filtering process is obtained.
- Z (t, k) is the subband spectrum of the mono output signal after the spatial filtering process of the kth subband of the tth frame
- X r (t, k) is the first subband of the kth subband of the tth frame Subband spectrum of r microphone output signals
- M is the total number of microphone arrays
- m 1, 2, ..., M
- t is the time index of the signal frame
- k is the subband index.
- Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the mono output signal of the k-th sub-band of the t frame after spatial filtering; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the vector
- An embodiment of the present disclosure also provides a computer-readable storage medium on which a computer program is stored, wherein, when the computer program is executed by a processor, the above-mentioned adaptive estimation method of the post-reverberation power spectrum in the reverberation speech signal is realized .
- An embodiment of the present disclosure also provides an adaptive estimation device for the post-reverberation power spectrum in a reverberation speech signal, including:
- the first obtaining module is used for obtaining the estimation of the sub-band self-power spectrum of the reverberation speech signal picked up by the microphone;
- a second obtaining module configured to obtain a linear prediction DLP prediction coefficient vector used for delay estimation of the post-reverb subband self-power spectrum in the reverberation speech signal
- the third obtaining module is configured to obtain the post-reverberation sub-band self-power spectrum estimation according to the sub-band self-power spectrum estimation of the reverberation speech signal and the DLP prediction coefficient vector.
- the first acquisition module is configured to:
- the second obtaining module is used to:
- Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the reverberation speech signal of the k-th sub-band of the t-th frame; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the
- the third obtaining module is used to:
- the first obtaining module includes:
- a first acquiring unit configured to acquire the subband spectrum of the mono output signal after the spatial filtering process of the reverberation speech signal picked up by the microphone array;
- the second obtaining unit is configured to obtain an estimate of the sub-band self-power spectrum of the mono output signal of the reverberation speech signal after the spatial filtering process according to the sub-band spectrum of the mono output signal.
- the first obtaining unit is configured to:
- Z (t, k) is the subband spectrum of the mono output signal after the spatial filtering process of the kth subband of the tth frame
- X r (t, k) is the first subband of the kth subband of the tth frame Subband spectrum of r microphone output signals
- M is the total number of microphone arrays
- m 1, 2, ..., M
- t is the time index of the signal frame
- k is the subband index.
- the second obtaining unit is configured to:
- the second obtaining module is used to:
- Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the mono output signal of the k-th sub-band of the t frame after spatial filtering; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the vector
- the third obtaining module is used to:
- the above scheme by using the delayed linear prediction DLP prediction coefficient vector to obtain the post-reverberation subband self-power spectrum estimation, can ensure the effectiveness of speech signal dereverberation, reduce the difficulty of dereverberation, and improve the Reverberation efficiency.
- Figure 1 shows the principle block diagram of applying DLP to adaptively estimate the subband self-power spectrum of the reverberation signal
- FIG. 2 shows an algorithm flowchart of a method for suppressing post-reverberation components in a reverberation speech signal based on a single microphone
- Fig. 3 shows the principle block diagram of the method for suppressing the post-reverberation component in the reverberation speech signal based on the microphone array
- FIG. 4 shows an algorithm flowchart of the method for suppressing the post-reverberation component in the reverberation speech signal based on the microphone array
- FIG. 5 is a schematic flowchart of an adaptive estimation method of a post-reverb power spectrum in a reverb speech signal according to an embodiment of the present disclosure
- FIG. 6 is a schematic block diagram of an apparatus for adaptively estimating a post-reverberation power spectrum in a reverberation speech signal according to an embodiment of the present disclosure
- FIG. 7 is a schematic structural diagram of an apparatus for adaptively estimating a post-reverberation power spectrum in a reverberation speech signal according to an embodiment of the present disclosure.
- the first type uses microphone array processing technology. This technology first estimates the orientation of the sound source relative to the microphone array (Direction of Arrival, DOA). Directionality to enhance the direct signal component from the direction of the sound source, and reduce and eliminate the reflected signal component from the sound source from other directions, so as to achieve the purpose of dereverberation; in order to obtain a satisfactory dereverberation effect, the technology is usually A large number of microphones are required in order for the array to obtain sufficient directional gain.
- the second type of dereverberation technology is a method of suppressing the post-reverberation signal in the frequency domain.
- This method first estimates the reverberation time parameter (RT60) of the working environment, and estimates the power of the post-reverberation signal based on this Spectrum, and then apply spectral subtraction in noise suppression to the post-reverberation signal; although the technology does not involve the phase information of the signal and its processing performance is relatively robust, but because of the lack of work environment
- the high-precision real-time estimation algorithm of the reverberation time parameter (RT60) associated with frequency so the dereverberation performance of this technology is limited.
- the third type of dereverberation technology is based on the idea of inverse filtering. Its goal is to estimate the inverse filter of the room impulse response (RIR) that causes reverberation, and use it to filter the reverberation speech signal.
- RIR room impulse response
- the RTF inverse filter can accurately recover its source signal from the observed reverberation signal; Proof: Under the condition that the number of microphones is greater than the number of activated sound sources, and the RTF from each sound source to each microphone does not have a common zero point, the inverse filter solution of the above function exists. However, in practical applications, RTF (or its equivalent inverse filter) is time-varying and unknown, and needs to be estimated from the obtained observation data. To this end, a large number of scholars are devoted to the exploration and research in this field, and have proposed many methods.
- DLP Delayed Linear Prediction
- This method can effectively suppress post-reverberation based on shorter observation data, and it also has a pre-reverberation The effect of suppression; but its inherent computational complexity makes it impossible to apply in practice.
- NDLP linear prediction
- WPE Weighted Prediction Error
- the first type of de-reverberation technology based on microphone array processing its performance is limited by the number of microphones in the array. To obtain satisfactory de-reverberation results, a large number of microphones are inevitably required, which leads to increased cost and structure of the actual product The difficulty of design increases.
- the second type of dereverberation technology that suppresses the post-reverberation signal in the frequency domain needs to first estimate the reverberation time parameter (RT60) of the working environment, but because there is currently no reverberation time related to the frequency in the working environment Parameter (RT60) high-precision real-time estimation algorithm, so the dereverberation performance of this technology is limited.
- the third type of WPE method that can be practically used in the dereverberation technology based on the inverse filtering idea involves a pseudo-inverse operation of the correlation matrix of high-order observation data, so it usually consumes more computing resources when implemented on a commercial DSP.
- This disclosure extends the idea of DLP to the sub-band power spectrum domain, and proposes a low-complexity, real-time online adaptive estimation method for post-reverberation self-power spectrum.
- Sub-band spectrum applying Decision-Directed (DD) recursive smoothing technique to calculate the a priori SNR, and then calculate the sub-band gain function of the reverberation component after suppression, and use it to modify the sub-band spectrum of the observed signal , So as to achieve the purpose of suppressing the reverberation component.
- DD Decision-Directed
- the present disclosure addresses the problem that the voice signal dereverberation technology in the related art has the problems of high actual product cost, structural design difficulties, limited dereverberation performance, or consumes a lot of computing resources, and cannot effectively guarantee the dereverberation of voice signals.
- An adaptive estimation method and device for post-reverberation power spectrum in a reverberation speech signal is provided.
- a method for suppressing a post-reverberation component based on a single microphone is given, and then extended to a microphone array application scene.
- x (n) The impulse response of the room with a sound source to the microphone is h (n), the sound source signal is s (n), and the reverberation voice signal obtained by the microphone is x (n), then x (n) can be obtained by the following mathematical formula Statement:
- R is the length of the indoor impulse response
- D c is the critical point for distinguishing between pre-reverb and post-reverb
- s early (n) is the pre-reverb signal containing the direct sound source signal
- s late (n) is the post-mix
- the ring signal, s early (n) and s late (n) are respectively defined by the following formula:
- X (t, k), S (t, k), H (t, k), S early (t, k) and S late (t, k) are digital signals x (n) and s (n), respectively.
- H (n), s early (n) and s late (n) subband transforms N is the signal frame length of the subband transformation
- t is the time index of the signal frame
- k is the subband index
- n is the sample time index of the digital signal.
- the sub-band self-power spectrum corresponding to the sub-band spectral signal X (t, k) can be expressed as:
- P X (t, k), And P S (t, k) are the sub-band self-power spectra corresponding to the sub-band signals X (t, k), S early (t, k), S late (t, k) and S (t, k), respectively ,
- E ⁇ is the statistical average operator.
- formula 5 can be expressed as:
- Equation 6 shows that in the sub-band power spectrum domain, the DLP technique can be used to predict the sub-band self-power spectrum of the post-reverberation signal, and the residual of the prediction is the useful pre-reverb that is not related to the post-reverb signal
- the sub-band of the signal comes from the power spectrum and therefore must be non-negative.
- the cost function And penalty function They are:
- E k (t) is expressed as:
- Equation 15 In order to solve the best DLP prediction coefficient vector The NLMS adaptive algorithm can be expressed by Equation 15:
- Equation 9 E k (t) is the prediction error defined by Equation 9.
- the estimated subband self-power spectrum of the post-reverberation signal is:
- Equation 18 we use Equation 18 and Equation 19 to define the sub-band prior signal-to-noise ratio ⁇ (t, k) and the posterior signal-to-noise ratio ⁇ (t, k) as follows:
- ⁇ is the preset smoothing coefficient.
- Equation 20 can be equivalent to:
- a subband domain method for suppressing the post-reverberation component of the reverberation speech signal based on a single microphone is first proposed.
- the specific expression is:
- a constrained NLMS adaptive algorithm is proposed , Used to learn to update the DLP filter coefficient vector, and to obtain the subband self-power spectrum estimation of the post-reverberation signal based on this;
- the DD technique is used to Calculate the corresponding a priori signal-to-noise ratio estimate, and then obtain the sub-band gain function for post-reverberation suppression; use this sub-band gain function to modify the sub-band spectrum of the microphone observation signal to obtain the sub-band spectrum of the target signal.
- the sub-band signals of the M channels defined in Formula 25 are subjected to the following spatial averaging process to obtain the sub-band signal Y (t, k) of the spatially-filtered mono output, that is:
- Formula 25 and Formula 26 are actually an implementation form of the "delay-add" beamformer in the related art in the subband domain. It has been proved that this spatial processor has The defect of signal distortion caused by spatial correlation. To this end, we perform the following spatial processing on the sub-band signals of the M channels defined in Formula 25, to obtain the sub-band signal Z (t, k) of the spatially-filtered mono output as:
- the directivity pattern is equivalent to the "delay-add" beamformer in the related art.
- the formula 27 uses the spatial average of the power spectrum of the microphone received signal, rather than the spatial average of the (complex) spectrum used in the formula 26, the “delay-add” beam assignment is avoided. The defect of signal distortion caused by the spatial correlation of the shaper.
- the post-reverberation sub-band self-power spectrum estimate in the sub-band signal Z (t, k) is:
- Is the coefficient vector of the DLP adaptive filter on subband k, and its adaptive update is determined by the following constrained NLMS algorithm:
- the sub-band gain function calculator module for post-reverberation suppression will give G (t, k) as follows:
- 0 ⁇ ⁇ 1 is the preset smoothing coefficient
- the posterior SNR is estimated for:
- the estimated target subband signal with Z (t, k) modified by G (t, k) is as follows:
- the above scheme is applied to the post-processing of the microphone array, and a sub-band domain method for suppressing the post-reverb component in the reverb speech signal based on the microphone array is proposed.
- This method first defines a new beamformer as a spatial pre-processor for the sub-band spectrum of the observation signal acquired by the microphone array in the sub-band domain, thereby reducing the deviation of the sub-band spectrum; then the spatial pre-processor
- the output subband spectral signal is post-processed using the method proposed in the case of a single microphone, thus obtaining the final target speech signal, thereby completing the task of dereverberation; this new beamformer implemented in the subband domain , Its directional mode is equivalent to the "delay-add" beamformer in the related art, and reduces the deviation of the subband spectral signal, but it overcomes the "delay-add” beamform in the related art.
- the defect of the signal distortion caused by the spatial correlation between different channels of the microphone ensures that the method
- FIG. 4 The algorithm flow chart of the method for suppressing the post-reverberation component in the reverberation speech signal based on the microphone array is shown in FIG. 4, and its specific implementation process is:
- an embodiment of the present disclosure provides an adaptive estimation method of post-reverberation power spectrum in a reverberated speech signal, including:
- Step 51 Obtain an estimate of the sub-band self-power spectrum of the reverberation speech signal picked up by the microphone;
- Step 52 Obtain a linear prediction DLP prediction coefficient vector for the delay of the self-power spectrum estimation of the post-reverberation subband in the reverberation speech signal;
- Step 53 Obtain the post-reverberation sub-band self-power spectrum estimation according to the sub-band self-power spectrum estimation of the reverberation speech signal and the DLP prediction coefficient vector.
- the microphone is a single microphone
- step 51 is:
- step 52 is:
- Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the reverberation speech signal of the k-th sub-band of the t-th frame; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the
- step 53 is:
- the microphone is a microphone array
- step 51 is:
- an estimate of the sub-band self-power spectrum of the mono output signal of the reverberation speech signal after the spatial filtering process is obtained.
- the acquiring the subband spectrum of the mono output signal of the reverberation voice signal picked up by the microphone array after spatial filtering includes:
- Z (t, k) is the subband spectrum of the mono output signal after the spatial filtering process of the kth subband of the tth frame
- X r (t, k) is the first subband of the kth subband of the tth frame Subband spectrum of r microphone output signals
- M is the total number of microphone arrays
- m 1, 2, ..., M
- t is the time index of the signal frame
- k is the subband index.
- the obtaining the estimation of the sub-band self-power spectrum of the mono output signal of the reverberation speech signal after the spatial filtering process according to the sub-band spectrum of the mono output signal includes:
- the estimation of the sub-band self-power spectrum of the mono output signal after the spatial filtering process of the k-th sub-band of the t-th frame It is the estimation of the sub-band self-power spectrum of the mono output signal after the spatial filtering process in the k-th sub-band of frame t-1; k) is the subband spectrum of the mono output signal after the spatial filtering process of the kth subband of the tth frame; t is the time index of the signal frame, and k is the subband index.
- step 52 is:
- Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the mono output signal of the k-th sub-band of the t frame after spatial filtering; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the vector
- step 53 is:
- the self-adaptive estimation method of the post-reverberation power spectrum in the reverberation speech signal reduces the difficulty of dereverberation and improves the efficiency of dereverberation, which is similar to the methods in the related art Compared, it has better robustness and lower algorithm complexity, which is convenient for real-time online implementation in practice.
- an embodiment of the present disclosure also provides an adaptive estimation device for the post-reverberation power spectrum in a reverberated speech signal, including:
- the first obtaining module 61 is configured to obtain an estimate of the sub-band self-power spectrum of the reverberation speech signal picked up by the microphone;
- the second obtaining module 62 is configured to obtain a linear prediction DLP prediction coefficient vector used for the delay of the self-power spectrum estimation of the post-reverberation subband in the reverberation speech signal;
- the third obtaining module 63 is configured to obtain the post-reverberation sub-band self-power spectrum estimation according to the sub-band self-power spectrum estimation of the reverberation speech signal and the DLP prediction coefficient vector.
- the first obtaining module 61 is used to:
- the second obtaining module 62 is used to:
- Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the reverberation speech signal of the k-th sub-band of the t-th frame; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the
- the third obtaining module 63 is used to:
- the first obtaining module 61 includes:
- a first acquiring unit configured to acquire the subband spectrum of the mono output signal after the spatial filtering process of the reverberation speech signal picked up by the microphone array;
- the second obtaining unit is configured to obtain an estimate of the sub-band self-power spectrum of the mono output signal of the reverberation speech signal after the spatial filtering process according to the sub-band spectrum of the mono output signal.
- the first obtaining unit is configured to:
- Z (t, k) is the subband spectrum of the mono output signal after the spatial filtering process of the kth subband of the tth frame
- X r (t, k) is the first subband of the kth subband of the tth frame Subband spectrum of r microphone output signals
- M is the total number of microphone arrays
- m 1, 2, ..., M
- t is the time index of the signal frame
- k is the subband index.
- the second obtaining unit is configured to:
- the estimation of the sub-band self-power spectrum of the mono output signal after the spatial filtering process of the k-th sub-band of the t-th frame It is the estimation of the sub-band self-power spectrum of the mono output signal after the spatial filtering process in the k-th sub-band of frame t-1; ⁇ is the preset smoothing constant, and 0 ⁇ ⁇ 1; Z (t, k) is the subband spectrum of the mono output signal after the spatial filtering process of the kth subband of the tth frame; t is the time index of the signal frame, and k is the subband index.
- the second obtaining module 62 is used to:
- Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the mono output signal of the k-th sub-band of the t frame after spatial filtering; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the vector
- the third obtaining module 63 is used to:
- the embodiment of the device is one-to-one corresponding to the above method embodiment. All the implementation methods in the above method embodiment are applicable to the embodiment of the device, and the same technical effect can also be achieved.
- an embodiment of the present disclosure also provides an apparatus for adaptively estimating the post-reverberation power spectrum in a reverberated speech signal, including a memory 71, a processor 72, and stored on the memory 71.
- a computer program running on the processor, and the memory 71 is connected to the processor 72 through a bus interface 73; wherein, the processor 72 implements the following steps when executing the computer program:
- the post-reverberation sub-band self-power spectrum estimation is obtained.
- the processor 72 implements the following steps when executing the computer program:
- processor 72 implements the following steps when executing the computer program:
- Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the reverberation speech signal of the k-th sub-band of the t-th frame; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the
- processor 72 implements the following steps when executing the computer program:
- the processor 72 implements the following steps when executing the computer program:
- an estimate of the sub-band self-power spectrum of the mono output signal of the reverberation speech signal after the spatial filtering process is obtained.
- processor 72 implements the following steps when executing the computer program:
- Z (t, k) is the subband spectrum of the mono output signal after the spatial filtering process of the kth subband of the tth frame
- X r (t, k) is the first subband of the kth subband of the tth frame Subband spectrum of r microphone output signals
- M is the total number of microphone arrays
- m 1, 2, ..., M
- t is the time index of the signal frame
- k is the subband index.
- processor 72 implements the following steps when executing the computer program:
- processor 72 implements the following steps when executing the computer program:
- Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the mono output signal of the k-th sub-band of the t frame after spatial filtering; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the vector
- processor 72 implements the following steps when executing the computer program:
- An embodiment of the present disclosure also provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the above-mentioned adaptive estimation method of the post-reverberation power spectrum in the reverberation speech signal.
- the technical solution of the present disclosure essentially or part of the contribution to the related technology or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium and includes several instructions to make a A computer device (which may be a personal computer, server, or network device, etc.) performs all or part of the steps of the methods described in the various embodiments of the present disclosure.
- the foregoing storage media include various media that can store program codes, such as a U disk, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
- each component or each step can be decomposed and / or recombined.
- decompositions and / or recombinations should be regarded as equivalent solutions of the present disclosure.
- steps for performing the above-mentioned series of processing may naturally be executed in chronological order in the order described, but it does not necessarily need to be executed in chronological order, and some steps may be executed in parallel or independently of each other.
- the object of the present disclosure can also be achieved by running a program or a group of programs on any computing device.
- the computing device may be a well-known general-purpose device. Therefore, the object of the present disclosure can also be achieved only by providing a program product containing program code for implementing the method or device. That is, such a program product also constitutes the present disclosure, and a storage medium storing such a program product also constitutes the present disclosure.
- the storage medium may be any known storage medium or any storage medium developed in the future. It should also be noted that, in the device and method of the present disclosure, obviously, each component or each step can be decomposed and / or recombined.
- the embodiments described in the embodiments of the present disclosure may be implemented by hardware, software, firmware, middleware, microcode, or a combination thereof.
- the processing unit can be implemented in one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processor (Digital Signal Processing, DSP), digital signal processing device (DSP Device, DSPD), programmable Logic Device (Programmable Logic Device, PLD), Field Programmable Gate Array (Field-Programmable Gate Array, FPGA), general-purpose processor, controller, microcontroller, microprocessor, others for performing the functions described in this disclosure Electronic unit or its combination.
- ASIC Application Specific Integrated Circuits
- DSP Digital Signal Processing
- DSP Device digital signal processing device
- DPD digital signal processing device
- PLD programmable Logic Device
- Field Programmable Gate Array Field-Programmable Gate Array
- FPGA Field-Programmable Gate Array
- the technology described in the embodiments of the present disclosure may be implemented through modules (eg, procedures, functions, etc.) that perform the functions described in the embodiments of the present disclosure.
- the software codes can be stored in the memory and executed by the processor.
- the memory may be implemented in the processor or external to the processor.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Circuit For Audible Band Transducer (AREA)
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
Abstract
一种混响语音信号中后混响功率谱的自适应估计方法及装置。该混响语音信号中后混响功率谱的自适应估计方法,包括:获取麦克风拾取的混响语音信号的子带自功率谱的估计(51);获取用于混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量(52);根据混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计(53)。
Description
相关申请的交叉引用
本申请主张在2018年10月18日在中国提交的中国专利申请No.201811216983.7的优先权,其全部内容通过引用包含于此。
本公开涉及语音信号处理领域,特别涉及一种混响语音信号中后混响功率谱的自适应估计方法及装置。
在远场情况下,室内麦克风拾取的语音信号,由于不可避免地受到来自于室内墙壁、顶部天花板和其它障碍物反射信号的干扰,因而会发生线性奇变。这种奇变通常称之为混响,它将退化语音的保真度和可懂度,使得语音通信系统和语音自动识别系统的性能下降;并且,这种退化程度随着声源和麦克风间距离的增加而增大。混响通常由早期混响(即前混响,包含直达声成分)和后期混响(即后混响)组成,业已证明,前者实际上有益于改善语音的可懂度和噪声环境中的信噪比(Signal to Noise Ratio,SNR),而后者则加长了声源语音信号音素的长度,由此重叠屏蔽了其后续的音素,从而降低了语音的可懂度。
相关技术中的语音信号去混响技术存在实际产品的成本高和结构设计困难、去混响性能受限或耗费较多的计算资源的问题。
发明内容
本公开实施例提供一种混响语音信号中后混响功率谱的自适应估计方法及装置,以解决相关技术中的语音信号去混响技术存在实际产品的成本高和结构设计困难、去混响性能受限或耗费较多的计算资源,不能有效保证语音信号去混响的问题。
为了解决上述技术问题,本公开实施例提供一种混响语音信号中后混响 功率谱的自适应估计方法,包括:
获取麦克风拾取的混响语音信号的子带自功率谱的估计;
获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量;
根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计。
可选地,当所述麦克风为单麦克风时,所述获取麦克风拾取的混响语音信号的子带自功率谱的估计,包括:
其中,
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;
为第t-1帧第k个子带的混响语音信号的子带自功率谱的估计;X(t,k)为第t帧第k个子带的混响语音信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量,包括:
根据公式:
其中,
为第t+1帧子带k上的DLP预测系数矢量;
为第t帧子带k上的DLP预测系数矢量,且
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E
k(t)为预测误差,且
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计,包括:
其中,
为后混响子带自功率谱估计;
为第t帧子带k上的DLP预测系数矢量,且
W
τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
为第t-τ-D
s帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
可选地,当所述麦克风为麦克风阵列时,所述获取麦克风拾取的混响语音信号的子带自功率谱的估计,包括:
获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计。
进一步地,所述获取麦克风阵列拾取的混响语音信号经空间滤波处理后 的单声道输出信号的子带谱,包括:
其中,Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;X
r(t,k)为第t帧第k个子带的第r个麦克风输出信号的子带谱;M为麦克风阵列的总个数;
m=1,2,…,M;t为信号帧的时间索引,k为子带索引。
进一步地,所述根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计,包括:
其中,
为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
为第t-1帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量,包括:
根据公式:
其中,
为第t+1帧子带k上的DLP预测系数矢量;
为第t帧子带k上的DLP预测系数矢量,且
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E
k(t)为预测误差,且
为第t帧第k个子带的经空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述根据所述子带自功率谱的估计和DLP预测系数,获取后混响子带自功率谱估计,包括:
其中,
为后混响子带自功率谱估计;
为第t帧子带k上的DLP预测系数矢量,且
W
τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
为第t-τ-D
s帧第k个子带的空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
本公开实施例还提供一种混响语音信号中后混响功率谱的自适应估计装 置,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序;其中,所述处理器执行所述计算机程序时实现以下步骤:
获取麦克风拾取的混响语音信号的子带自功率谱的估计;
获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量;
根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计。
可选地,当所述麦克风为单麦克风时,所述处理器执行所述计算机程序时实现以下步骤:
其中,
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;
为第t-1帧第k个子带的混响语音信号的子带自功率谱的估计;X(t,k)为第t帧第k个子带的混响语音信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述处理器执行所述计算机程序时实现以下步骤:
根据公式:
其中,
为第t+1帧子带k上的DLP预测系数矢量;
为第t帧子带k上的DLP预测系数矢量,且
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E
k(t)为预测误差,且
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述处理器执行所述计算机程序时实现以下步骤:
其中,
为后混响子带自功率谱估计;
为第t帧子带k上的DLP预测系数矢量,且
W
τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
为第t-τ-D
s帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
可选地,当所述麦克风为麦克风阵列时,所述处理器执行所述计算机程序时实现以下步骤:
获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计。
进一步地,所述处理器执行所述计算机程序时实现以下步骤:
其中,Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;X
r(t,k)为第t帧第k个子带的第r个麦克风输出信号的子带谱;M为麦克风阵列的总个数;
m=1,2,…,M;t为信号帧的时间索引,k为子带索引。
进一步地,所述处理器执行所述计算机程序时实现以下步骤:
其中,
为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
为第t-1帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述处理器执行所述计算机程序时实现以下步骤:
根据公式:
其中,
为第t+1帧子带k上的DLP预测系数矢量;
为第t帧子带k上的DLP预测系数矢量,且
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E
k(t)为预测误差,且
为第t帧第k个子带的经空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述处理器执行所述计算机程序时实现以下步骤:
其中,
为后混响子带自功率谱估计;
为第t帧子带k上的DLP预测系数矢量,且
W
τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
为第t-τ-D
s帧第k个子带的空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现上述的混响语音信号中后混响功率谱的自适应估计方法。
本公开实施例还提供一种混响语音信号中后混响功率谱的自适应估计装置,包括:
第一获取模块,用于获取麦克风拾取的混响语音信号的子带自功率谱的估计;
第二获取模块,用于获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量;
第三获取模块,用于根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计。
可选地,当所述麦克风为单麦克风时,所述第一获取模块,用于:
其中,
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;
为第t-1帧第k个子带的混响语音信号的子带自功率谱的估计;X(t,k)为第t帧第k个子带的混响语音信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述第二获取模块,用于:
根据公式:
其中,
为第t+1帧子带k上的DLP预测系数矢量;
为第t帧子带k上的DLP预测系数矢量,且
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E
k(t)为预测误差,且
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间 索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述第三获取模块,用于:
其中,
为后混响子带自功率谱估计;
为第t帧子带k上的DLP预测系数矢量,且
W
τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
为第t-τ-D
s帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
可选地,当所述麦克风为麦克风阵列时,所述第一获取模块,包括:
第一获取单元,用于获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
第二获取单元,用于根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计。
进一步地,所述第一获取单元,用于:
其中,Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;X
r(t,k)为第t帧第k个子带的第r个麦克风输出信号的子带谱;M为麦克风阵列的总个数;
m=1,2,…,M;t 为信号帧的时间索引,k为子带索引。
进一步地,所述第二获取单元,用于:
其中,
为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
为第t-1帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述第二获取模块,用于:
根据公式:
其中,
为第t+1帧子带k上的DLP预测系数矢量;
为第t帧子带k上的DLP预测系数矢量,且
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E
k(t)为预测误差,且
为第t帧第k个子带的经空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述第三获取模块,用于:
其中,
为后混响子带自功率谱估计;
为第t帧子带k上的DLP预测系数矢量,且
W
τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
为第t-τ-D
s帧第k个子带的空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
本公开的有益效果是:
上述方案,通过利用延时的线性预测DLP预测系数矢量来进行后混响子带自功率谱估计的获取,可以保证语音信号去混响的有效性,降低了去混响的难度,提高了去混响的效率。
为了更清楚地说明本公开实施例的技术方案,下面将对本公开实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1表示应用DLP来自适应估计后混响信号子带自功率谱的原理框图;
图2表示基于单麦克风的混响语音信号中后混响成分抑制方法的算法流程图;
图3表示基于麦克风阵列的混响语音信号中后混响成分抑制方法的原理 框图;
图4表示基于麦克风阵列的混响语音信号中后混响成分抑制方法的算法流程图;
图5表示本公开实施例的混响语音信号中后混响功率谱的自适应估计方法的流程示意图;
图6表示本公开实施例的混响语音信号中后混响功率谱的自适应估计装置的模块示意图;
图7表示本公开实施例的混响语音信号中后混响功率谱的自适应估计装置的结构示意图。
为使本公开的目的、技术方案和优点更加清楚,下面将结合附图及具体实施例对本公开进行详细描述。
在相关技术中,语音信号去混响技术大致有三大类,第一类是采用麦克风阵列处理技术,该技术首先估计声源相对麦克风阵列的方位(Direction of Arrival,DOA),通过控制麦克风阵列的方向性来增强来自声源方向的直达信号成分,并减小和消除来自其它方向的声源反射信号成分,从而达到去混响的目的;为了获得令人满意的去混响效果,该技术通常需要大量数目的麦克风,以便阵列获得充分的方向性增益。第二类去混响技术则是在频域对后混响信号进行抑制处理的方法,该方法首先估计出工作环境的混响时间参数(RT60),并据此估计出后混响信号的功率谱,然后应用噪声抑制中的谱减法对后混响信号进行抑制处理;尽管该技术不涉及信号的相位信息而使其处理性能具有较好的鲁棒性,但由于目前尚缺乏关于工作环境中与频率关联的混响时间参数(RT60)的高精度实时估计算法,故该技术的去混响性能受限。第三类去混响技术则是基于逆滤波的思想,其目标是估计出引发混响的室内冲激响应(Room Impulse Response,RIR)的逆滤波器,用其对混响语音信号进行滤波处理以恢复源信号;在声源到麦克风的室内传递函数(Room Transfer Function,RTF)已知的情况下,用RTF的逆滤波器可以从观测的混响信号中精确地恢复出其源信号;业已证明:在麦克风数目大于已激活的声源数目、 并且每个声源到每个麦克风的RTF不存在共同的零点的条件下,上述功能的逆滤波器解是存在的。然而在实际应用中,RTF(或其等效的逆滤波器)是时变的、未知的,需要从已获的观测数据中估计出。为此,大量学者致力于该领域的探索和研究,提出了许多方法,最为引人注目的便是基于延时的线性预测(Delayed Linear Prediction,DLP)的后混响抑制技术,该技术能有效地抑制后混响成分而未明显地损伤语音的短时相关性,但它要求DLP的滤波器阶数很高(滤波器通常有数千个系数),因而需要很长的观测数据,由此导致该技术具有很高的计算负荷,难以在商用的数字信号处理器(Digital Signal Processor,DSP)芯片上实时实现。此外,人们还提出将时变语音信号源模型与多声道线性预测相结合来进行去混响的方法,该方法可以基于较短的观测数据有效地抑制后混响,而且对前混响也有抑制的效果;但它固有的计算复杂度致使其无法在实际中应用。最近,人们将基于DLP的去混响技术拓展到处理时变语音信号的场景,提出了一种称之为方差归一化延时的线性预测(NDLP)去混响技术,NDLP的频域实现即为著名的加权预测误差(Weighted Prediction Error,WPE)去混响算法;尽管WPE性能具有较好的鲁棒性,但它涉及一个高阶观测数据相关矩阵的伪逆运算,因而在商用DSP上实现时通常耗费较多的计算资源。
第一类基于麦克风阵列处理的去混响技术,其性能受限于阵列的麦克风数目,要获得令人满意的去混响结果,势必需要大量的麦克风,这便导致实际产品的成本提高和结构设计的困难增加。第二类在频域对后混响信号进行抑制处理的去混响技术需要首先估计出工作环境的混响时间参数(RT60),但由于目前尚缺乏关于工作环境中与频率关联的混响时间参数(RT60)的高精度实时估计算法,故该技术的去混响性能受限。第三类基于逆滤波思想的去混响技术中能实际应用的WPE方法涉及一个高阶观测数据相关矩阵的伪逆运算,因而在商用DSP上实现时通常耗费较多的计算资源。
本公开将DLP的思想拓展到子带功率谱域,提出一种关于后混响自功率谱的低复杂度、实时在线自适应估计方法,根据这一后混响自功率谱的估计和观测信号的子带谱,应用决策-引导(Decision-Directed,DD)递归平滑技术,来计算先验SNR,并据此计算抑制后混响成分的子带增益函数,用之来 修正观测信号子带谱,从而达到抑制后混响成分的目的。
本公开针对相关技术中的语音信号去混响技术存在实际产品的成本高和结构设计困难、去混响性能受限或耗费较多的计算资源,不能有效保证语音信号去混响的问题,提供一种混响语音信号中后混响功率谱的自适应估计方法及装置。
下面对本公开实施例的实现原理进行说明如下。
本公开实施例中,首先从单声道(即单麦克风)场景出发,给出一种基于单麦克风的后混响成分的抑制方法,然后推广到麦克风阵列应用场景。
一、基于单麦克风的混响语音信号中后混响成分的抑制方法
设有声源到麦克风的室内冲激响应为h(n),声源信号为s(n),麦克风获取的混响语音信号为x(n),那么x(n)可用下述数学公式一来表述:
公式一、
其中,R为室内冲击响应的长度,D
c为前混响和后混响区分的临界点,s
early(n)为包含直达声源信号的前混响信号,s
late(n)为后混响信号,s
early(n)和s
late(n)分别由下式定义:
应用分析滤波器组(Analysis Filter Bank,AFB)对公式一两边进行子带变换(短时傅里叶变换可以看作是子带变换的一种特例)可得:
公式四、
其中X(t,k)、S(t,k)、H(t,k)、S
early(t,k)和S
late(t,k)分别为数字信号x(n)、s(n)、h(n)、s
early(n)和s
late(n)的子带变换,
N为子带变换的信号帧长度,t为信号帧的时间索引,k为子带索引,n为数字信号的样本时间索引。
假设,相邻帧子带信号间的自相关性较低,那么子带谱信号X(t,k)对应的子带自功率谱可表述为:
采用延时的线性预测(DLP)表述法,公式五可以表示成:
公式六表明:在子带功率谱域,采用DLP技术可以预测估计出后混响信号的子带自功率谱,其预测估计的残差便是与后混响信号不相关的有用的前混响信号的子带自功率谱,因而一定是非负的。为将这一约束条件集成到DLP的预测系数求解中,我们来定义代价函数
和惩罚函数
分别为:
其中,E
k(t)用公式九表示为:
根据公式七、八和十三得到:
其中,μ和β为正常数,且0<μ(1+β)<2,E
k(t)为公式九定义的预测误差。
应用DLP来自适应估计后混响信号子带自功率谱的原理框图如图1所示。在实际工程实现上,观测信号子带自功率谱的估计可以用公式十六的时间递归平滑技术来计算,即:
这里0<λ<1为预设的平滑常数。那么,后混响信号子带自功率谱的估计为:
既然自适应滤波器可获得DLP系数矢量,根据公式十七我们即可获得后混响信号子带自功率谱的估计,那么应用谱减法技术来进行后混响信号的抑制便是很自然的事;为此,我们分别用公式十八和公式十九定义子带先验信噪比ξ(t,k)和后验信噪比η(t,k)如下:
相应地,根据Wiener滤波理论,我们可得后混响信号抑制的子带增益函数G(t,k)用公式二十一表示为:
用公式二十一计算的抑制增益来修正观测信号的子带谱,即获得前混响信号子带谱的一个有效估计为:
注意到公式二十中第一项可以等效为:
将公式二十三代入公式二十可得:
公式二十四、
上述方案,首先提出了一种基于单麦克风的混响语音信号中后混响成分抑制的子带域方法,具体表述为:在子带功率谱域,提出了一种带约束的NLMS自适应算法,用来学习更新DLP滤波器系数矢量,并据此获得后混响信号的子带自功率谱估计;根据后混响信号的子带功率谱估计和麦克风观测信号子带谱,应用DD技术来计算相应的先验信噪比估计值,进而求得用于后混响抑制的子带增益函数;用该子带增益函数来修正麦克风观测信号子带谱,从而获得目标信号的子带谱。
综上所述,基于单麦克风的混响语音信号中后混响成分抑制方法的算法流程图如图2所示,具体实现过程为:
首先,初始化算法相关的参数和变量,设置信号帧序号t=0;读取第t帧麦克风拾取的观测数据,并应用AFB对读取的第t帧观测数据进行子带变换,获取相应的子带谱X(t,k);根据公式九和公式十五至十七估计后混响信号子带自功率谱;根据公式二十四和公式二十一计算用于后混响信号抑制的子带抑制增益函数G(t,k);根据公式二十二计算目标信号的子带谱估计,并用SFB将目标子带谱变换为时域的目标语音信号并予以输出;判断处理过程是否结束,在处理过程未结束时,执行t=t+1,然后依次执行上述步骤,直到处理过程结束,结束处理流程。
二、基于麦克风阵列的混响语音信号中后混响成分的抑制方法
设室内有一个声源和一个由M个麦克风组成的阵列,记第m个麦克风拾 取的观测语音信号为x
m(n),m=1,2,…,M。那么,首先对麦克风阵列输入信号进行空间滤波预处理,然后对预处理的单声道输出信号,应用上面叙述中提出的方法对其中后混响成分进行抑制处理,从而获得增强处理后的子带谱
首先,应用AFB对M个麦克风阵列的时域输入数字信号{x
m(n),m=1,2,…,M}进行子带变换,相应地获得M个子带信号,它们分别记为X
m(t,k),m=1,2,…,M,这里t为信号帧时间索引,k为子带索引。不失一般性,假设第r个麦克风为参考麦克风,那么以参考麦克风子带信号的相位为基准,将其它所有麦克风子带信号的相位与之做同步处理,则得:
对公式二十五定义的M个声道的子带信号作如下的空间平均处理,便获得空间滤波单声道输出的子带信号Y(t,k),即:
公式二十五和公式二十六实际上是相关技术中的“延时-相加”波束赋型器在子带域的一种实现形式,业已证明这种空间处理器具有因不同声道间空间相关性所引发的信号畸变的缺陷。为此,我们对公式二十五定义的M个声道的子带信号作如下的空间处理,便获得空间滤波单声道输出的子带信号Z(t,k)为:
事实上,公式二十五和公式二十七在子带域定义的这种波束赋型器,其方向模式(directivity pattern)等同于相关技术中的“延时-相加”波束赋型器,但由于公式二十七式中采用了麦克风接收信号的功率谱空间平均,而不是像公式二十六式中所采用(复数)频谱的空间平均,因而避免了“延时-相加”波束赋型器的空间相关性所引发的信号畸变的缺陷。
应用上面介绍的基于单麦克风混响语音信号中后混响成分抑制方法,对上述波束赋型器输出的子带信号Z(t,k)进行处理,便可获得去混响的目标子带信号
再应用SFB对目标子带信号进行子带反变换,即得时域目标信号
基于麦克风阵列的混响语音信号中后混响成分抑制方法的原理框图如图3所示,其中,子带自功率谱计算器按下述公式二十八估计空间滤波器输出子带信号Z(t,k)的自功率谱:
而基于DLP后混响子带自功率谱估计器计算出子带信号Z(t,k)中的后混响子带自功率谱估计为:
公式三十二、
其中,0<μ(1+β)<2。
公式三十四、
用G(t,k)修正Z(t,k)的如下的目标子带信号的估计为:
上述方案为应用于麦克风阵列的后处理,提出的一种基于麦克风阵列的混响语音信号中后混响成分抑制的子带域方法。该方法首先在子带域,对麦克风阵列获取的观测信号子带谱,定义一种新的波束赋型器对其做空间预处理器,从而降低子带谱的偏差;然后对空间预处理器的输出子带谱信号应用基于单麦克风的情况中提出的方法进行后处理,因而获得最终的目标语音信号,从而完成去混响的任务;此种在子带域上实现的新型波束赋型器,其方向模式等同于相关技术中的“延时-相加”波束赋型器,并降低了子带谱信号的偏差,但它克服了相关技术中的“延时-相加”波束赋型器因不同声道间的空间相关性而引发的信号畸变的缺陷,从而确保基于单麦克风所提的方法用作麦克风阵列后处理器的运行环境。
基于麦克风阵列的混响语音信号中后混响成分抑制方法的算法流程图如图4所示,其具体实现过程为:
首先,初始化算法相关的参数和变量,设置信号帧序号t=0;读取第t帧M组麦克风拾取的观测数据,并应用AFB对读取的第t帧观测数据进行子带变换,获取M组相应的子带谱;根据公式二十五和公式二十七对M组麦克风信号子带谱进行相位同步和空间滤波处理,获取子带谱Z(t,k);根据公式二十八至三十五计算用于后混响信号抑制的子带抑制增益函数G(t,k);根据公式三十六计算目标信号的子带谱估计,并用SFB将目标子带谱变换为时域的目标语音信号并予以输出;判断处理过程是否结束,在处理过程未结束时,执行t=t+1,然后依次执行上述步骤,直到处理过程结束,结束处理流程。
下面对本公开实施例的具体实现过程说明如下。
如图5所示,本公开实施例提供一种混响语音信号中后混响功率谱的自适应估计方法,包括:
步骤51,获取麦克风拾取的混响语音信号的子带自功率谱的估计;
步骤52,获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量;
步骤53,根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计。
一、当所述麦克风为单麦克风时
具体地,所述步骤51的实现方式为:
其中,
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;
为第t-1帧第k个子带的混响语音信号的子带自功率谱的估计;X(t,k)为第t帧第k个子带的混响语音信号的子带谱;t为信号帧的时间索引,k为子带索引。
具体地,所述步骤52的实现方式为:
根据上述的公式十五:
其中,
为第t+1帧子带k上的DLP预测系数矢量;
为第t帧子带k上的DLP预测系数矢量,且
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E
k(t)为预测误差,且
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
具体地,所述步骤53的实现过程为:
根据上述的公式十七:
其中,
为后混响子带自功率谱估计;
为第t帧子带k上的DLP预测系数矢量,且
W
τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
为第t-τ-D
s帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
二、当所述麦克风为麦克风阵列时,
具体地,所述步骤51的实现方式为:
获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计。
进一步地,所述获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱,包括:
其中,Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;X
r(t,k)为第t帧第k个子带的第r个麦克风输出信号的子带谱;M为麦克风阵列的总个数;
m=1,2,…,M;t为信号帧的时间索引,k为子带索引。
进一步地,所述根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计,包括:
其中,
为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
为第t-1帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;t为信号帧的时间索引,k为子带索引。
具体地,所述步骤52的实现方式为:
根据上述的公式三十二:
其中,
为第t+1帧子带k上的DLP预测系数矢量;
为第t帧子带k上的DLP预测系数矢量,且
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E
k(t)为预测误差,且
为第t帧第k个子带的经空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
具体地,所述步骤53的实现过程为:
根据上述的公式二十九:
其中,
为后混响子带自功率谱估计;
为第t帧子带k上的DLP预测系数矢量,且
W
τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
为第t-τ-D
s帧第k个子带的空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
需要说明的是,本公开提出的这种混响语音信号中后混响功率谱的自适 应估计方法,降低了去混响的难度,提高了去混响的效率,与相关技术中的方法相比,它具有更好的鲁棒性、更低的算法复杂度,便于在实际中实时在线实现。
如图6所示,本公开实施例还提供一种混响语音信号中后混响功率谱的自适应估计装置,包括:
第一获取模块61,用于获取麦克风拾取的混响语音信号的子带自功率谱的估计;
第二获取模块62,用于获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量;
第三获取模块63,用于根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计。
可选地,当所述麦克风为单麦克风时,所述第一获取模块61,用于:
其中,
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;
为第t-1帧第k个子带的混响语音信号的子带自功率谱的估计;X(t,k)为第t帧第k个子带的混响语音信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述第二获取模块62,用于:
根据公式:
其中,
为第t+1帧子带k上的DLP预测系数矢量;
为第t帧子带k上的DLP预测系数矢量,且
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E
k(t)为预测误差,且
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述第三获取模块63,用于:
其中,
为后混响子带自功率谱估计;
为第t帧子带k上的DLP预测系数矢量,且
W
τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
为第t-τ-D
s帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
可选地,当所述麦克风为麦克风阵列时,所述第一获取模块61,包括:
第一获取单元,用于获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
第二获取单元,用于根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计。
进一步地,所述第一获取单元,用于:
其中,Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;X
r(t,k)为第t帧第k个子带的第r个麦克风输出信号的子带谱;M为麦克风阵列的总个数;
m=1,2,…,M;t为信号帧的时间索引,k为子带索引。
进一步地,所述第二获取单元,用于:
其中,
为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
为第t-1帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述第二获取模块62,用于:
根据公式:
其中,
为第t+1帧子带k上的DLP预测系数矢量;
为第t帧子带k上的DLP预测系数矢量,且
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E
k(t)为预测误差,且
为第t帧第k个子带的经空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述第三获取模块63,用于:
其中,
为后混响子带自功率谱估计;
为第t帧子带k上的DLP预测系数矢量,且
W
τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
为第t-τ-D
s帧第k个子带的空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
需要说明的是,该装置的实施例是与上述方法实施例一一对应的装置,上述方法实施例中所有实现方式均适用于该装置的实施例中,也能达到相同的技术效果。
如图7所示,本公开实施例还提供一种混响语音信号中后混响功率谱的 自适应估计装置,包括存储器71、处理器72及存储在所述存储器71上并可在所述处理器上运行的计算机程序,且所述存储器71通过总线接口73与所述处理器72连接;其中,所述处理器72执行所述计算机程序时实现以下步骤:
获取麦克风拾取的混响语音信号的子带自功率谱的估计;
获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量;
根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计。
可选地,当所述麦克风为单麦克风时,所述处理器72执行所述计算机程序时实现以下步骤:
其中,
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;
为第t-1帧第k个子带的混响语音信号的子带自功率谱的估计;X(t,k)为第t帧第k个子带的混响语音信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述处理器72执行所述计算机程序时实现以下步骤:
根据公式:
其中,
为第t+1帧子带k上的DLP预测系数矢量;
为第t帧子带k上的DLP预测系数矢量,且
为第t-D
s帧第k个子带的混 响语音信号的子带自功率谱矢量,
Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E
k(t)为预测误差,且
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述处理器72执行所述计算机程序时实现以下步骤:
其中,
为后混响子带自功率谱估计;
为第t帧子带k上的DLP预测系数矢量,且
W
τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
为第t-τ-D
s帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
可选地,当所述麦克风为麦克风阵列时,所述处理器72执行所述计算机程序时实现以下步骤:
获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计。
进一步地,所述处理器72执行所述计算机程序时实现以下步骤:
其中,Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;X
r(t,k)为第t帧第k个子带的第r个麦克风输出信号的子带谱;M为麦克风阵列的总个数;
m=1,2,…,M;t为信号帧的时间索引,k为子带索引。
进一步地,所述处理器72执行所述计算机程序时实现以下步骤:
其中,
为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
为第t-1帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述处理器72执行所述计算机程序时实现以下步骤:
根据公式:
其中,
为第t+1帧子带k上的DLP预测系数矢量;
为第t帧子带k上的DLP预测系数矢量,且
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E
k(t)为预测误差,且
为第t帧第k个子带的经空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述处理器72执行所述计算机程序时实现以下步骤:
其中,
为后混响子带自功率谱估计;
为第t帧子带k上的DLP预测系数矢量,且
W
τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R
s-D
s,
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D
c为前混响和后混响区分的临界点;
为第t-D
s帧第k个子带的混响语音信号的子带自功率谱矢量,
为第t-τ-D
s帧第k个子带的空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述的混响语音信号中后混响功率谱的自适应估计方法。
本公开的技术方案本质上或者说对相关技术做出贡献的部分或者该技术 方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
此外,需要指出的是,在本公开的装置和方法中,显然,各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本公开的等效方案。并且,执行上述系列处理的步骤可以自然地按照说明的顺序按时间顺序执行,但是并不需要一定按照时间顺序执行,某些步骤可以并行或彼此独立地执行。对本领域的普通技术人员而言,能够理解本公开的方法和装置的全部或者任何步骤或者部件,可以在任何计算装置(包括处理器、存储介质等)或者计算装置的网络中,以硬件、固件、软件或者它们的组合加以实现,这是本领域普通技术人员在阅读了本公开的说明的情况下运用他们的基本编程技能就能实现的。
因此,本公开的目的还可以通过在任何计算装置上运行一个程序或者一组程序来实现。所述计算装置可以是公知的通用装置。因此,本公开的目的也可以仅仅通过提供包含实现所述方法或者装置的程序代码的程序产品来实现。也就是说,这样的程序产品也构成本公开,并且存储有这样的程序产品的存储介质也构成本公开。显然,所述存储介质可以是任何公知的存储介质或者将来所开发出来的任何存储介质。还需要指出的是,在本公开的装置和方法中,显然,各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本公开的等效方案。并且,执行上述系列处理的步骤可以自然地按照说明的顺序按时间顺序执行,但是并不需要一定按照时间顺序执行。某些步骤可以并行或彼此独立地执行。
可以理解的是,本公开实施例描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一 个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本公开所述功能的其它电子单元或其组合中。
对于软件实现,可通过执行本公开实施例所述功能的模块(例如过程、函数等)来实现本公开实施例所述的技术。软件代码可存储在存储器中并通过处理器执行。存储器可以在处理器中或在处理器外部实现。
以上所述的是本公开的可选的实施方式,应当指出对于本技术领域的普通人员来说,在不脱离本公开所述的原理前提下还可以作出若干改进和润饰,这些改进和润饰也在本公开的保护范围内。
Claims (28)
- 一种混响语音信号中后混响功率谱的自适应估计方法,包括:获取麦克风拾取的混响语音信号的子带自功率谱的估计;获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量;根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计。
- 根据权利要求2所述的混响语音信号中后混响功率谱的自适应估计方法,其中,所述获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量,包括:根据公式:
- 根据权利要求2所述的混响语音信号中后混响功率谱的自适应估计方法,其中,所述根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计,包括:
- 根据权利要求1所述的混响语音信号中后混响功率谱的自适应估计方法,其中,当所述麦克风为麦克风阵列时,所述获取麦克风拾取的混响语音 信号的子带自功率谱的估计,包括:获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱;根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计。
- 根据权利要求5所述的混响语音信号中后混响功率谱的自适应估计方法,其中,所述获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量,包括:根据公式:
- 根据权利要求5所述的混响语音信号中后混响功率谱的自适应估计方法,其中,所述根据所述子带自功率谱的估计和DLP预测系数,获取后混响子带自功率谱估计,包括:
- 一种混响语音信号中后混响功率谱的自适应估计装置,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序;其中,所述处理器执行所述计算机程序时实现以下步骤:获取麦克风拾取的混响语音信号的子带自功率谱的估计;获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量;根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计。
- 根据权利要求11所述的混响语音信号中后混响功率谱的自适应估计 装置,其中,所述处理器执行所述计算机程序时实现以下步骤:根据公式:
- 根据权利要求11所述的混响语音信号中后混响功率谱的自适应估计装置,其中,所述处理器执行所述计算机程序时实现以下步骤:
- 根据权利要求10所述的混响语音信号中后混响功率谱的自适应估计装置,其中,当所述麦克风为麦克风阵列时,所述处理器执行所述计算机程序时实现以下步骤:获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱;根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计。
- 根据权利要求14所述的混响语音信号中后混响功率谱的自适应估计装置,其中,所述处理器执行所述计算机程序时实现以下步骤:根据公式:
- 根据权利要求14所述的混响语音信号中后混响功率谱的自适应估计装置,其中,所述处理器执行所述计算机程序时实现以下步骤:
- 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至9中任一项所述的混响语音信号中后混响功率谱的自适应估计方法。
- 一种混响语音信号中后混响功率谱的自适应估计装置,其中,包括:第一获取模块,用于获取麦克风拾取的混响语音信号的子带自功率谱的估计;第二获取模块,用于获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量;第三获取模块,用于根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计。
- 根据权利要求21所述的混响语音信号中后混响功率谱的自适应估计装置,其中,所述第二获取模块,用于:根据公式:
- 根据权利要求21所述的混响语音信号中后混响功率谱的自适应估计装置,其中,所述第三获取模块,用于:
- 根据权利要求20所述的混响语音信号中后混响功率谱的自适应估计装置,其中,当所述麦克风为麦克风阵列时,所述第一获取模块,包括:第一获取单元,用于获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱;第二获取单元,用于根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计。
- 根据权利要求24所述的混响语音信号中后混响功率谱的自适应估计装置,其中,所述第二获取模块,用于:根据公式:
- 根据权利要求24所述的混响语音信号中后混响功率谱的自适应估计装置,其中,所述第三获取模块,用于:
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811216983.7 | 2018-10-18 | ||
CN201811216983.7A CN109243476B (zh) | 2018-10-18 | 2018-10-18 | 混响语音信号中后混响功率谱的自适应估计方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020078210A1 true WO2020078210A1 (zh) | 2020-04-23 |
Family
ID=65052489
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/109285 WO2020078210A1 (zh) | 2018-10-18 | 2019-09-30 | 混响语音信号中后混响功率谱的自适应估计方法及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109243476B (zh) |
WO (1) | WO2020078210A1 (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109243476B (zh) * | 2018-10-18 | 2021-09-03 | 电信科学技术研究院有限公司 | 混响语音信号中后混响功率谱的自适应估计方法及装置 |
CN111489760B (zh) * | 2020-04-01 | 2023-05-16 | 腾讯科技(深圳)有限公司 | 语音信号去混响处理方法、装置、计算机设备和存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103440869A (zh) * | 2013-09-03 | 2013-12-11 | 大连理工大学 | 一种音频混响的抑制装置及其抑制方法 |
CN104658543A (zh) * | 2013-11-20 | 2015-05-27 | 大连佑嘉软件科技有限公司 | 一种室内混响消除的方法 |
US20160210976A1 (en) * | 2013-07-23 | 2016-07-21 | Arkamys | Method for suppressing the late reverberation of an audio signal |
CN108154885A (zh) * | 2017-12-15 | 2018-06-12 | 重庆邮电大学 | 一种使用qr-rls算法对多通道语音信号去混响方法 |
CN108172231A (zh) * | 2017-12-07 | 2018-06-15 | 中国科学院声学研究所 | 一种基于卡尔曼滤波的去混响方法及系统 |
CN109243476A (zh) * | 2018-10-18 | 2019-01-18 | 电信科学技术研究院有限公司 | 混响语音信号中后混响功率谱的自适应估计方法及装置 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1212608C (zh) * | 2003-09-12 | 2005-07-27 | 中国科学院声学研究所 | 一种采用后置滤波器的多通道语音增强方法 |
JP4705893B2 (ja) * | 2006-08-10 | 2011-06-22 | Okiセミコンダクタ株式会社 | エコーキャンセラ |
CN101908341B (zh) * | 2010-08-05 | 2012-05-23 | 浙江工业大学 | 一种基于g.729算法的语音编码优化方法 |
-
2018
- 2018-10-18 CN CN201811216983.7A patent/CN109243476B/zh active Active
-
2019
- 2019-09-30 WO PCT/CN2019/109285 patent/WO2020078210A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160210976A1 (en) * | 2013-07-23 | 2016-07-21 | Arkamys | Method for suppressing the late reverberation of an audio signal |
CN103440869A (zh) * | 2013-09-03 | 2013-12-11 | 大连理工大学 | 一种音频混响的抑制装置及其抑制方法 |
CN104658543A (zh) * | 2013-11-20 | 2015-05-27 | 大连佑嘉软件科技有限公司 | 一种室内混响消除的方法 |
CN108172231A (zh) * | 2017-12-07 | 2018-06-15 | 中国科学院声学研究所 | 一种基于卡尔曼滤波的去混响方法及系统 |
CN108154885A (zh) * | 2017-12-15 | 2018-06-12 | 重庆邮电大学 | 一种使用qr-rls算法对多通道语音信号去混响方法 |
CN109243476A (zh) * | 2018-10-18 | 2019-01-18 | 电信科学技术研究院有限公司 | 混响语音信号中后混响功率谱的自适应估计方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
CN109243476B (zh) | 2021-09-03 |
CN109243476A (zh) | 2019-01-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108172231B (zh) | 一种基于卡尔曼滤波的去混响方法及系统 | |
CN109597022B (zh) | 声源方位角运算、定位目标音频的方法、装置和设备 | |
CN110100457B (zh) | 基于噪声时变环境的加权预测误差的在线去混响算法 | |
CN109979476B (zh) | 一种语音去混响的方法及装置 | |
CN110148420A (zh) | 一种适用于噪声环境下的语音识别方法 | |
Xiao et al. | The NTU-ADSC systems for reverberation challenge 2014 | |
CN108538306B (zh) | 提高语音设备doa估计的方法及装置 | |
JP6225245B2 (ja) | 信号処理装置、方法及びプログラム | |
US9520138B2 (en) | Adaptive modulation filtering for spectral feature enhancement | |
Wang et al. | Mask weighted STFT ratios for relative transfer function estimation and its application to robust ASR | |
CN110660406A (zh) | 近距离交谈场景下双麦克风移动电话的实时语音降噪方法 | |
Nesta et al. | A flexible spatial blind source extraction framework for robust speech recognition in noisy environments | |
WO2020078210A1 (zh) | 混响语音信号中后混响功率谱的自适应估计方法及装置 | |
CN111681665A (zh) | 一种全向降噪方法、设备及存储介质 | |
Martín-Doñas et al. | Dual-channel DNN-based speech enhancement for smartphones | |
Nesta et al. | Robust Automatic Speech Recognition through On-line Semi Blind Signal Extraction | |
Habets et al. | Dereverberation | |
Dumortier et al. | Blind RT60 estimation robust across room sizes and source distances | |
CN111312275A (zh) | 一种基于子带分解的在线声源分离增强系统 | |
US11902757B2 (en) | Techniques for unified acoustic echo suppression using a recurrent neural network | |
Kinoshita et al. | Multi-step linear prediction based speech dereverberation in noisy reverberant environment. | |
CN107393553B (zh) | 用于语音活动检测的听觉特征提取方法 | |
Firoozabadi et al. | Combination of nested microphone array and subband processing for multiple simultaneous speaker localization | |
Nakatani et al. | Simultaneous denoising, dereverberation, and source separation using a unified convolutional beamformer | |
Ji et al. | Coherence-Based Dual-Channel Noise Reduction Algorithm in a Complex Noisy Environment. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19873280 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19873280 Country of ref document: EP Kind code of ref document: A1 |