WO2020078210A1 - 混响语音信号中后混响功率谱的自适应估计方法及装置 - Google Patents

混响语音信号中后混响功率谱的自适应估计方法及装置 Download PDF

Info

Publication number
WO2020078210A1
WO2020078210A1 PCT/CN2019/109285 CN2019109285W WO2020078210A1 WO 2020078210 A1 WO2020078210 A1 WO 2020078210A1 CN 2019109285 W CN2019109285 W CN 2019109285W WO 2020078210 A1 WO2020078210 A1 WO 2020078210A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
reverberation
band
power spectrum
frame
Prior art date
Application number
PCT/CN2019/109285
Other languages
English (en)
French (fr)
Inventor
梁民
Original Assignee
电信科学技术研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 电信科学技术研究院有限公司 filed Critical 电信科学技术研究院有限公司
Publication of WO2020078210A1 publication Critical patent/WO2020078210A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques

Definitions

  • the present disclosure relates to the field of speech signal processing, and in particular, to an adaptive estimation method and device for post-reverberation power spectrum in a reverberated speech signal.
  • Reverberation In the far field, the voice signal picked up by the indoor microphone is inevitably interfered by the reflected signals from the indoor walls, ceiling and other obstacles, so linear singularity will occur. This kind of singularity is usually called reverberation, which will degrade the fidelity and intelligibility of speech, so that the performance of the speech communication system and the automatic speech recognition system will be reduced; The distance increases.
  • Reverberation usually consists of early reverberation (i.e., pre-reverberation, which contains direct sound components) and late reverberation (i.e., post-reverberation).
  • the voice signal dereverberation technology in the related art has the problems of high cost of the actual product, difficulty in structural design, limited dereverberation performance, or consumption of more computing resources.
  • Embodiments of the present disclosure provide an adaptive estimation method and device for the post-reverberation power spectrum in a reverberated speech signal, to solve the problem that the voice signal dereverberation technology in the related art has high actual product cost, structural design difficulties, and demixing.
  • the performance of the ringing is limited or it consumes more computing resources, which cannot effectively ensure the problem of dereverberation of the voice signal.
  • an embodiment of the present disclosure provides an adaptive estimation method of the post-reverberation power spectrum in a reverberated speech signal, including:
  • the post-reverberation sub-band self-power spectrum estimation is obtained.
  • the obtaining an estimate of the sub-band self-power spectrum of the reverberation speech signal picked up by the microphone includes:
  • the acquiring the linear prediction DLP prediction coefficient vector for the delay of the self-power spectrum estimation of the post-reverberation subband in the reverberation speech signal includes:
  • Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the reverberation speech signal of the k-th sub-band of the t-th frame; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the
  • the obtaining the post-reverberation sub-band self-power spectrum estimation according to the sub-band self-power spectrum estimation of the reverberation speech signal and the DLP prediction coefficient vector includes:
  • the obtaining an estimate of the sub-band self-power spectrum of the reverberation speech signal picked up by the microphone includes:
  • an estimate of the sub-band self-power spectrum of the mono output signal of the reverberation speech signal after the spatial filtering process is obtained.
  • the acquiring the sub-band spectrum of the mono output signal of the reverberation speech signal picked up by the microphone array after spatial filtering includes:
  • Z (t, k) is the subband spectrum of the mono output signal after the spatial filtering process of the kth subband of the tth frame
  • X r (t, k) is the first subband of the kth subband of the tth frame Subband spectrum of r microphone output signals
  • M is the total number of microphone arrays
  • m 1, 2, ..., M
  • t is the time index of the signal frame
  • k is the subband index.
  • the obtaining the estimation of the sub-band self-power spectrum of the mono output signal of the reverberation speech signal after the spatial filtering process according to the sub-band spectrum of the mono output signal includes:
  • the acquiring the linear prediction DLP prediction coefficient vector for the delay of the self-power spectrum estimation of the post-reverberation subband in the reverberation speech signal includes:
  • Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the mono output signal of the k-th sub-band of the t frame after spatial filtering; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the vector
  • the obtaining the post-reverberation sub-band self-power spectrum estimation according to the sub-band self-power spectrum estimation and the DLP prediction coefficient includes:
  • An embodiment of the present disclosure also provides an adaptive estimation device for a post-reverberation power spectrum in a reverberation speech signal, including a memory, a processor, and a computer program stored on the memory and executable on the processor; wherein , The processor implements the following steps when executing the computer program:
  • the post-reverberation sub-band self-power spectrum estimation is obtained.
  • the processor implements the following steps when executing the computer program:
  • Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the reverberation speech signal of the k-th sub-band of the t-th frame; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the
  • the processor implements the following steps when executing the computer program:
  • an estimate of the sub-band self-power spectrum of the mono output signal of the reverberation speech signal after the spatial filtering process is obtained.
  • Z (t, k) is the subband spectrum of the mono output signal after the spatial filtering process of the kth subband of the tth frame
  • X r (t, k) is the first subband of the kth subband of the tth frame Subband spectrum of r microphone output signals
  • M is the total number of microphone arrays
  • m 1, 2, ..., M
  • t is the time index of the signal frame
  • k is the subband index.
  • Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the mono output signal of the k-th sub-band of the t frame after spatial filtering; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the vector
  • An embodiment of the present disclosure also provides a computer-readable storage medium on which a computer program is stored, wherein, when the computer program is executed by a processor, the above-mentioned adaptive estimation method of the post-reverberation power spectrum in the reverberation speech signal is realized .
  • An embodiment of the present disclosure also provides an adaptive estimation device for the post-reverberation power spectrum in a reverberation speech signal, including:
  • the first obtaining module is used for obtaining the estimation of the sub-band self-power spectrum of the reverberation speech signal picked up by the microphone;
  • a second obtaining module configured to obtain a linear prediction DLP prediction coefficient vector used for delay estimation of the post-reverb subband self-power spectrum in the reverberation speech signal
  • the third obtaining module is configured to obtain the post-reverberation sub-band self-power spectrum estimation according to the sub-band self-power spectrum estimation of the reverberation speech signal and the DLP prediction coefficient vector.
  • the first acquisition module is configured to:
  • the second obtaining module is used to:
  • Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the reverberation speech signal of the k-th sub-band of the t-th frame; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the
  • the third obtaining module is used to:
  • the first obtaining module includes:
  • a first acquiring unit configured to acquire the subband spectrum of the mono output signal after the spatial filtering process of the reverberation speech signal picked up by the microphone array;
  • the second obtaining unit is configured to obtain an estimate of the sub-band self-power spectrum of the mono output signal of the reverberation speech signal after the spatial filtering process according to the sub-band spectrum of the mono output signal.
  • the first obtaining unit is configured to:
  • Z (t, k) is the subband spectrum of the mono output signal after the spatial filtering process of the kth subband of the tth frame
  • X r (t, k) is the first subband of the kth subband of the tth frame Subband spectrum of r microphone output signals
  • M is the total number of microphone arrays
  • m 1, 2, ..., M
  • t is the time index of the signal frame
  • k is the subband index.
  • the second obtaining unit is configured to:
  • the second obtaining module is used to:
  • Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the mono output signal of the k-th sub-band of the t frame after spatial filtering; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the vector
  • the third obtaining module is used to:
  • the above scheme by using the delayed linear prediction DLP prediction coefficient vector to obtain the post-reverberation subband self-power spectrum estimation, can ensure the effectiveness of speech signal dereverberation, reduce the difficulty of dereverberation, and improve the Reverberation efficiency.
  • Figure 1 shows the principle block diagram of applying DLP to adaptively estimate the subband self-power spectrum of the reverberation signal
  • FIG. 2 shows an algorithm flowchart of a method for suppressing post-reverberation components in a reverberation speech signal based on a single microphone
  • Fig. 3 shows the principle block diagram of the method for suppressing the post-reverberation component in the reverberation speech signal based on the microphone array
  • FIG. 4 shows an algorithm flowchart of the method for suppressing the post-reverberation component in the reverberation speech signal based on the microphone array
  • FIG. 5 is a schematic flowchart of an adaptive estimation method of a post-reverb power spectrum in a reverb speech signal according to an embodiment of the present disclosure
  • FIG. 6 is a schematic block diagram of an apparatus for adaptively estimating a post-reverberation power spectrum in a reverberation speech signal according to an embodiment of the present disclosure
  • FIG. 7 is a schematic structural diagram of an apparatus for adaptively estimating a post-reverberation power spectrum in a reverberation speech signal according to an embodiment of the present disclosure.
  • the first type uses microphone array processing technology. This technology first estimates the orientation of the sound source relative to the microphone array (Direction of Arrival, DOA). Directionality to enhance the direct signal component from the direction of the sound source, and reduce and eliminate the reflected signal component from the sound source from other directions, so as to achieve the purpose of dereverberation; in order to obtain a satisfactory dereverberation effect, the technology is usually A large number of microphones are required in order for the array to obtain sufficient directional gain.
  • the second type of dereverberation technology is a method of suppressing the post-reverberation signal in the frequency domain.
  • This method first estimates the reverberation time parameter (RT60) of the working environment, and estimates the power of the post-reverberation signal based on this Spectrum, and then apply spectral subtraction in noise suppression to the post-reverberation signal; although the technology does not involve the phase information of the signal and its processing performance is relatively robust, but because of the lack of work environment
  • the high-precision real-time estimation algorithm of the reverberation time parameter (RT60) associated with frequency so the dereverberation performance of this technology is limited.
  • the third type of dereverberation technology is based on the idea of inverse filtering. Its goal is to estimate the inverse filter of the room impulse response (RIR) that causes reverberation, and use it to filter the reverberation speech signal.
  • RIR room impulse response
  • the RTF inverse filter can accurately recover its source signal from the observed reverberation signal; Proof: Under the condition that the number of microphones is greater than the number of activated sound sources, and the RTF from each sound source to each microphone does not have a common zero point, the inverse filter solution of the above function exists. However, in practical applications, RTF (or its equivalent inverse filter) is time-varying and unknown, and needs to be estimated from the obtained observation data. To this end, a large number of scholars are devoted to the exploration and research in this field, and have proposed many methods.
  • DLP Delayed Linear Prediction
  • This method can effectively suppress post-reverberation based on shorter observation data, and it also has a pre-reverberation The effect of suppression; but its inherent computational complexity makes it impossible to apply in practice.
  • NDLP linear prediction
  • WPE Weighted Prediction Error
  • the first type of de-reverberation technology based on microphone array processing its performance is limited by the number of microphones in the array. To obtain satisfactory de-reverberation results, a large number of microphones are inevitably required, which leads to increased cost and structure of the actual product The difficulty of design increases.
  • the second type of dereverberation technology that suppresses the post-reverberation signal in the frequency domain needs to first estimate the reverberation time parameter (RT60) of the working environment, but because there is currently no reverberation time related to the frequency in the working environment Parameter (RT60) high-precision real-time estimation algorithm, so the dereverberation performance of this technology is limited.
  • the third type of WPE method that can be practically used in the dereverberation technology based on the inverse filtering idea involves a pseudo-inverse operation of the correlation matrix of high-order observation data, so it usually consumes more computing resources when implemented on a commercial DSP.
  • This disclosure extends the idea of DLP to the sub-band power spectrum domain, and proposes a low-complexity, real-time online adaptive estimation method for post-reverberation self-power spectrum.
  • Sub-band spectrum applying Decision-Directed (DD) recursive smoothing technique to calculate the a priori SNR, and then calculate the sub-band gain function of the reverberation component after suppression, and use it to modify the sub-band spectrum of the observed signal , So as to achieve the purpose of suppressing the reverberation component.
  • DD Decision-Directed
  • the present disclosure addresses the problem that the voice signal dereverberation technology in the related art has the problems of high actual product cost, structural design difficulties, limited dereverberation performance, or consumes a lot of computing resources, and cannot effectively guarantee the dereverberation of voice signals.
  • An adaptive estimation method and device for post-reverberation power spectrum in a reverberation speech signal is provided.
  • a method for suppressing a post-reverberation component based on a single microphone is given, and then extended to a microphone array application scene.
  • x (n) The impulse response of the room with a sound source to the microphone is h (n), the sound source signal is s (n), and the reverberation voice signal obtained by the microphone is x (n), then x (n) can be obtained by the following mathematical formula Statement:
  • R is the length of the indoor impulse response
  • D c is the critical point for distinguishing between pre-reverb and post-reverb
  • s early (n) is the pre-reverb signal containing the direct sound source signal
  • s late (n) is the post-mix
  • the ring signal, s early (n) and s late (n) are respectively defined by the following formula:
  • X (t, k), S (t, k), H (t, k), S early (t, k) and S late (t, k) are digital signals x (n) and s (n), respectively.
  • H (n), s early (n) and s late (n) subband transforms N is the signal frame length of the subband transformation
  • t is the time index of the signal frame
  • k is the subband index
  • n is the sample time index of the digital signal.
  • the sub-band self-power spectrum corresponding to the sub-band spectral signal X (t, k) can be expressed as:
  • P X (t, k), And P S (t, k) are the sub-band self-power spectra corresponding to the sub-band signals X (t, k), S early (t, k), S late (t, k) and S (t, k), respectively ,
  • E ⁇ is the statistical average operator.
  • formula 5 can be expressed as:
  • Equation 6 shows that in the sub-band power spectrum domain, the DLP technique can be used to predict the sub-band self-power spectrum of the post-reverberation signal, and the residual of the prediction is the useful pre-reverb that is not related to the post-reverb signal
  • the sub-band of the signal comes from the power spectrum and therefore must be non-negative.
  • the cost function And penalty function They are:
  • E k (t) is expressed as:
  • Equation 15 In order to solve the best DLP prediction coefficient vector The NLMS adaptive algorithm can be expressed by Equation 15:
  • Equation 9 E k (t) is the prediction error defined by Equation 9.
  • the estimated subband self-power spectrum of the post-reverberation signal is:
  • Equation 18 we use Equation 18 and Equation 19 to define the sub-band prior signal-to-noise ratio ⁇ (t, k) and the posterior signal-to-noise ratio ⁇ (t, k) as follows:
  • is the preset smoothing coefficient.
  • Equation 20 can be equivalent to:
  • a subband domain method for suppressing the post-reverberation component of the reverberation speech signal based on a single microphone is first proposed.
  • the specific expression is:
  • a constrained NLMS adaptive algorithm is proposed , Used to learn to update the DLP filter coefficient vector, and to obtain the subband self-power spectrum estimation of the post-reverberation signal based on this;
  • the DD technique is used to Calculate the corresponding a priori signal-to-noise ratio estimate, and then obtain the sub-band gain function for post-reverberation suppression; use this sub-band gain function to modify the sub-band spectrum of the microphone observation signal to obtain the sub-band spectrum of the target signal.
  • the sub-band signals of the M channels defined in Formula 25 are subjected to the following spatial averaging process to obtain the sub-band signal Y (t, k) of the spatially-filtered mono output, that is:
  • Formula 25 and Formula 26 are actually an implementation form of the "delay-add" beamformer in the related art in the subband domain. It has been proved that this spatial processor has The defect of signal distortion caused by spatial correlation. To this end, we perform the following spatial processing on the sub-band signals of the M channels defined in Formula 25, to obtain the sub-band signal Z (t, k) of the spatially-filtered mono output as:
  • the directivity pattern is equivalent to the "delay-add" beamformer in the related art.
  • the formula 27 uses the spatial average of the power spectrum of the microphone received signal, rather than the spatial average of the (complex) spectrum used in the formula 26, the “delay-add” beam assignment is avoided. The defect of signal distortion caused by the spatial correlation of the shaper.
  • the post-reverberation sub-band self-power spectrum estimate in the sub-band signal Z (t, k) is:
  • Is the coefficient vector of the DLP adaptive filter on subband k, and its adaptive update is determined by the following constrained NLMS algorithm:
  • the sub-band gain function calculator module for post-reverberation suppression will give G (t, k) as follows:
  • 0 ⁇ ⁇ 1 is the preset smoothing coefficient
  • the posterior SNR is estimated for:
  • the estimated target subband signal with Z (t, k) modified by G (t, k) is as follows:
  • the above scheme is applied to the post-processing of the microphone array, and a sub-band domain method for suppressing the post-reverb component in the reverb speech signal based on the microphone array is proposed.
  • This method first defines a new beamformer as a spatial pre-processor for the sub-band spectrum of the observation signal acquired by the microphone array in the sub-band domain, thereby reducing the deviation of the sub-band spectrum; then the spatial pre-processor
  • the output subband spectral signal is post-processed using the method proposed in the case of a single microphone, thus obtaining the final target speech signal, thereby completing the task of dereverberation; this new beamformer implemented in the subband domain , Its directional mode is equivalent to the "delay-add" beamformer in the related art, and reduces the deviation of the subband spectral signal, but it overcomes the "delay-add” beamform in the related art.
  • the defect of the signal distortion caused by the spatial correlation between different channels of the microphone ensures that the method
  • FIG. 4 The algorithm flow chart of the method for suppressing the post-reverberation component in the reverberation speech signal based on the microphone array is shown in FIG. 4, and its specific implementation process is:
  • an embodiment of the present disclosure provides an adaptive estimation method of post-reverberation power spectrum in a reverberated speech signal, including:
  • Step 51 Obtain an estimate of the sub-band self-power spectrum of the reverberation speech signal picked up by the microphone;
  • Step 52 Obtain a linear prediction DLP prediction coefficient vector for the delay of the self-power spectrum estimation of the post-reverberation subband in the reverberation speech signal;
  • Step 53 Obtain the post-reverberation sub-band self-power spectrum estimation according to the sub-band self-power spectrum estimation of the reverberation speech signal and the DLP prediction coefficient vector.
  • the microphone is a single microphone
  • step 51 is:
  • step 52 is:
  • Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the reverberation speech signal of the k-th sub-band of the t-th frame; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the
  • step 53 is:
  • the microphone is a microphone array
  • step 51 is:
  • an estimate of the sub-band self-power spectrum of the mono output signal of the reverberation speech signal after the spatial filtering process is obtained.
  • the acquiring the subband spectrum of the mono output signal of the reverberation voice signal picked up by the microphone array after spatial filtering includes:
  • Z (t, k) is the subband spectrum of the mono output signal after the spatial filtering process of the kth subband of the tth frame
  • X r (t, k) is the first subband of the kth subband of the tth frame Subband spectrum of r microphone output signals
  • M is the total number of microphone arrays
  • m 1, 2, ..., M
  • t is the time index of the signal frame
  • k is the subband index.
  • the obtaining the estimation of the sub-band self-power spectrum of the mono output signal of the reverberation speech signal after the spatial filtering process according to the sub-band spectrum of the mono output signal includes:
  • the estimation of the sub-band self-power spectrum of the mono output signal after the spatial filtering process of the k-th sub-band of the t-th frame It is the estimation of the sub-band self-power spectrum of the mono output signal after the spatial filtering process in the k-th sub-band of frame t-1; k) is the subband spectrum of the mono output signal after the spatial filtering process of the kth subband of the tth frame; t is the time index of the signal frame, and k is the subband index.
  • step 52 is:
  • Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the mono output signal of the k-th sub-band of the t frame after spatial filtering; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the vector
  • step 53 is:
  • the self-adaptive estimation method of the post-reverberation power spectrum in the reverberation speech signal reduces the difficulty of dereverberation and improves the efficiency of dereverberation, which is similar to the methods in the related art Compared, it has better robustness and lower algorithm complexity, which is convenient for real-time online implementation in practice.
  • an embodiment of the present disclosure also provides an adaptive estimation device for the post-reverberation power spectrum in a reverberated speech signal, including:
  • the first obtaining module 61 is configured to obtain an estimate of the sub-band self-power spectrum of the reverberation speech signal picked up by the microphone;
  • the second obtaining module 62 is configured to obtain a linear prediction DLP prediction coefficient vector used for the delay of the self-power spectrum estimation of the post-reverberation subband in the reverberation speech signal;
  • the third obtaining module 63 is configured to obtain the post-reverberation sub-band self-power spectrum estimation according to the sub-band self-power spectrum estimation of the reverberation speech signal and the DLP prediction coefficient vector.
  • the first obtaining module 61 is used to:
  • the second obtaining module 62 is used to:
  • Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the reverberation speech signal of the k-th sub-band of the t-th frame; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the
  • the third obtaining module 63 is used to:
  • the first obtaining module 61 includes:
  • a first acquiring unit configured to acquire the subband spectrum of the mono output signal after the spatial filtering process of the reverberation speech signal picked up by the microphone array;
  • the second obtaining unit is configured to obtain an estimate of the sub-band self-power spectrum of the mono output signal of the reverberation speech signal after the spatial filtering process according to the sub-band spectrum of the mono output signal.
  • the first obtaining unit is configured to:
  • Z (t, k) is the subband spectrum of the mono output signal after the spatial filtering process of the kth subband of the tth frame
  • X r (t, k) is the first subband of the kth subband of the tth frame Subband spectrum of r microphone output signals
  • M is the total number of microphone arrays
  • m 1, 2, ..., M
  • t is the time index of the signal frame
  • k is the subband index.
  • the second obtaining unit is configured to:
  • the estimation of the sub-band self-power spectrum of the mono output signal after the spatial filtering process of the k-th sub-band of the t-th frame It is the estimation of the sub-band self-power spectrum of the mono output signal after the spatial filtering process in the k-th sub-band of frame t-1; ⁇ is the preset smoothing constant, and 0 ⁇ ⁇ 1; Z (t, k) is the subband spectrum of the mono output signal after the spatial filtering process of the kth subband of the tth frame; t is the time index of the signal frame, and k is the subband index.
  • the second obtaining module 62 is used to:
  • Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the mono output signal of the k-th sub-band of the t frame after spatial filtering; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the vector
  • the third obtaining module 63 is used to:
  • the embodiment of the device is one-to-one corresponding to the above method embodiment. All the implementation methods in the above method embodiment are applicable to the embodiment of the device, and the same technical effect can also be achieved.
  • an embodiment of the present disclosure also provides an apparatus for adaptively estimating the post-reverberation power spectrum in a reverberated speech signal, including a memory 71, a processor 72, and stored on the memory 71.
  • a computer program running on the processor, and the memory 71 is connected to the processor 72 through a bus interface 73; wherein, the processor 72 implements the following steps when executing the computer program:
  • the post-reverberation sub-band self-power spectrum estimation is obtained.
  • the processor 72 implements the following steps when executing the computer program:
  • processor 72 implements the following steps when executing the computer program:
  • Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the reverberation speech signal of the k-th sub-band of the t-th frame; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the
  • processor 72 implements the following steps when executing the computer program:
  • the processor 72 implements the following steps when executing the computer program:
  • an estimate of the sub-band self-power spectrum of the mono output signal of the reverberation speech signal after the spatial filtering process is obtained.
  • processor 72 implements the following steps when executing the computer program:
  • Z (t, k) is the subband spectrum of the mono output signal after the spatial filtering process of the kth subband of the tth frame
  • X r (t, k) is the first subband of the kth subband of the tth frame Subband spectrum of r microphone output signals
  • M is the total number of microphone arrays
  • m 1, 2, ..., M
  • t is the time index of the signal frame
  • k is the subband index.
  • processor 72 implements the following steps when executing the computer program:
  • processor 72 implements the following steps when executing the computer program:
  • Is the DLP prediction coefficient vector in subband k of frame t + 1; Is the vector of DLP prediction coefficients in subband k of frame t, and Is the sub-band self-power spectrum vector of the reverberation speech signal of the k-th sub-band in the tD s frame, Q is the number of DLP coefficients, and Q R s -D s , R is the length of the indoor impulse response, N is the length of the sub-band transformed speech signal frame, D c is the critical point for distinguishing between pre-reverb and post-reverb; ⁇ and ⁇ are normal numbers, and 0 ⁇ (1 + ⁇ ) ⁇ 2; E k (t) is the prediction error, and Is the estimation of the sub-band self-power spectrum of the mono output signal of the k-th sub-band of the t frame after spatial filtering; t is the time index of the signal frame, k is the sub-band index, and T is the transpose operator of the vector
  • processor 72 implements the following steps when executing the computer program:
  • An embodiment of the present disclosure also provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the above-mentioned adaptive estimation method of the post-reverberation power spectrum in the reverberation speech signal.
  • the technical solution of the present disclosure essentially or part of the contribution to the related technology or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium and includes several instructions to make a A computer device (which may be a personal computer, server, or network device, etc.) performs all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the foregoing storage media include various media that can store program codes, such as a U disk, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
  • each component or each step can be decomposed and / or recombined.
  • decompositions and / or recombinations should be regarded as equivalent solutions of the present disclosure.
  • steps for performing the above-mentioned series of processing may naturally be executed in chronological order in the order described, but it does not necessarily need to be executed in chronological order, and some steps may be executed in parallel or independently of each other.
  • the object of the present disclosure can also be achieved by running a program or a group of programs on any computing device.
  • the computing device may be a well-known general-purpose device. Therefore, the object of the present disclosure can also be achieved only by providing a program product containing program code for implementing the method or device. That is, such a program product also constitutes the present disclosure, and a storage medium storing such a program product also constitutes the present disclosure.
  • the storage medium may be any known storage medium or any storage medium developed in the future. It should also be noted that, in the device and method of the present disclosure, obviously, each component or each step can be decomposed and / or recombined.
  • the embodiments described in the embodiments of the present disclosure may be implemented by hardware, software, firmware, middleware, microcode, or a combination thereof.
  • the processing unit can be implemented in one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processor (Digital Signal Processing, DSP), digital signal processing device (DSP Device, DSPD), programmable Logic Device (Programmable Logic Device, PLD), Field Programmable Gate Array (Field-Programmable Gate Array, FPGA), general-purpose processor, controller, microcontroller, microprocessor, others for performing the functions described in this disclosure Electronic unit or its combination.
  • ASIC Application Specific Integrated Circuits
  • DSP Digital Signal Processing
  • DSP Device digital signal processing device
  • DPD digital signal processing device
  • PLD programmable Logic Device
  • Field Programmable Gate Array Field-Programmable Gate Array
  • FPGA Field-Programmable Gate Array
  • the technology described in the embodiments of the present disclosure may be implemented through modules (eg, procedures, functions, etc.) that perform the functions described in the embodiments of the present disclosure.
  • the software codes can be stored in the memory and executed by the processor.
  • the memory may be implemented in the processor or external to the processor.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

一种混响语音信号中后混响功率谱的自适应估计方法及装置。该混响语音信号中后混响功率谱的自适应估计方法,包括:获取麦克风拾取的混响语音信号的子带自功率谱的估计(51);获取用于混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量(52);根据混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计(53)。

Description

混响语音信号中后混响功率谱的自适应估计方法及装置
相关申请的交叉引用
本申请主张在2018年10月18日在中国提交的中国专利申请No.201811216983.7的优先权,其全部内容通过引用包含于此。
技术领域
本公开涉及语音信号处理领域,特别涉及一种混响语音信号中后混响功率谱的自适应估计方法及装置。
背景技术
在远场情况下,室内麦克风拾取的语音信号,由于不可避免地受到来自于室内墙壁、顶部天花板和其它障碍物反射信号的干扰,因而会发生线性奇变。这种奇变通常称之为混响,它将退化语音的保真度和可懂度,使得语音通信系统和语音自动识别系统的性能下降;并且,这种退化程度随着声源和麦克风间距离的增加而增大。混响通常由早期混响(即前混响,包含直达声成分)和后期混响(即后混响)组成,业已证明,前者实际上有益于改善语音的可懂度和噪声环境中的信噪比(Signal to Noise Ratio,SNR),而后者则加长了声源语音信号音素的长度,由此重叠屏蔽了其后续的音素,从而降低了语音的可懂度。
相关技术中的语音信号去混响技术存在实际产品的成本高和结构设计困难、去混响性能受限或耗费较多的计算资源的问题。
发明内容
本公开实施例提供一种混响语音信号中后混响功率谱的自适应估计方法及装置,以解决相关技术中的语音信号去混响技术存在实际产品的成本高和结构设计困难、去混响性能受限或耗费较多的计算资源,不能有效保证语音信号去混响的问题。
为了解决上述技术问题,本公开实施例提供一种混响语音信号中后混响 功率谱的自适应估计方法,包括:
获取麦克风拾取的混响语音信号的子带自功率谱的估计;
获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量;
根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计。
可选地,当所述麦克风为单麦克风时,所述获取麦克风拾取的混响语音信号的子带自功率谱的估计,包括:
根据公式:
Figure PCTCN2019109285-appb-000001
获取混响语音信号的子带自功率谱的估计;
其中,
Figure PCTCN2019109285-appb-000002
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;
Figure PCTCN2019109285-appb-000003
为第t-1帧第k个子带的混响语音信号的子带自功率谱的估计;X(t,k)为第t帧第k个子带的混响语音信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量,包括:
根据公式:
Figure PCTCN2019109285-appb-000004
获取DLP预测系数矢量;
其中,
Figure PCTCN2019109285-appb-000005
为第t+1帧子带k上的DLP预测系数矢量;
Figure PCTCN2019109285-appb-000006
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000007
Figure PCTCN2019109285-appb-000008
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000009
Figure PCTCN2019109285-appb-000010
Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000011
Figure PCTCN2019109285-appb-000012
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E k(t)为预测误差,且
Figure PCTCN2019109285-appb-000013
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计,包括:
根据公式:
Figure PCTCN2019109285-appb-000014
Figure PCTCN2019109285-appb-000015
获取后混响子带自功率谱估计;
其中,
Figure PCTCN2019109285-appb-000016
为后混响子带自功率谱估计;
Figure PCTCN2019109285-appb-000017
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000018
W τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000019
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;
Figure PCTCN2019109285-appb-000020
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000021
Figure PCTCN2019109285-appb-000022
为第t-τ-D s帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
可选地,当所述麦克风为麦克风阵列时,所述获取麦克风拾取的混响语音信号的子带自功率谱的估计,包括:
获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计。
进一步地,所述获取麦克风阵列拾取的混响语音信号经空间滤波处理后 的单声道输出信号的子带谱,包括:
根据公式:
Figure PCTCN2019109285-appb-000023
获取混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
其中,Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;X r(t,k)为第t帧第k个子带的第r个麦克风输出信号的子带谱;M为麦克风阵列的总个数;
Figure PCTCN2019109285-appb-000024
m=1,2,…,M;t为信号帧的时间索引,k为子带索引。
进一步地,所述根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计,包括:
根据公式:
Figure PCTCN2019109285-appb-000025
获取经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
其中,
Figure PCTCN2019109285-appb-000026
为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
Figure PCTCN2019109285-appb-000027
为第t-1帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量,包括:
根据公式:
Figure PCTCN2019109285-appb-000028
获取用于所述混响语音信号经空间滤波处理后的单声道输出信号中后混响子带自功率谱估计的DLP预测系数矢量;
其中,
Figure PCTCN2019109285-appb-000029
为第t+1帧子带k上的DLP预测系数矢量;
Figure PCTCN2019109285-appb-000030
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000031
Figure PCTCN2019109285-appb-000032
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000033
Figure PCTCN2019109285-appb-000034
Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000035
Figure PCTCN2019109285-appb-000036
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E k(t)为预测误差,且
Figure PCTCN2019109285-appb-000037
为第t帧第k个子带的经空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述根据所述子带自功率谱的估计和DLP预测系数,获取后混响子带自功率谱估计,包括:
根据公式:
Figure PCTCN2019109285-appb-000038
Figure PCTCN2019109285-appb-000039
获取后混响子带自功率谱估计;
其中,
Figure PCTCN2019109285-appb-000040
为后混响子带自功率谱估计;
Figure PCTCN2019109285-appb-000041
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000042
W τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000043
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;
Figure PCTCN2019109285-appb-000044
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000045
Figure PCTCN2019109285-appb-000046
为第t-τ-D s帧第k个子带的空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
本公开实施例还提供一种混响语音信号中后混响功率谱的自适应估计装 置,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序;其中,所述处理器执行所述计算机程序时实现以下步骤:
获取麦克风拾取的混响语音信号的子带自功率谱的估计;
获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量;
根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计。
可选地,当所述麦克风为单麦克风时,所述处理器执行所述计算机程序时实现以下步骤:
根据公式:
Figure PCTCN2019109285-appb-000047
获取混响语音信号的子带自功率谱的估计;
其中,
Figure PCTCN2019109285-appb-000048
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;
Figure PCTCN2019109285-appb-000049
为第t-1帧第k个子带的混响语音信号的子带自功率谱的估计;X(t,k)为第t帧第k个子带的混响语音信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述处理器执行所述计算机程序时实现以下步骤:
根据公式:
Figure PCTCN2019109285-appb-000050
获取DLP预测系数矢量;
其中,
Figure PCTCN2019109285-appb-000051
为第t+1帧子带k上的DLP预测系数矢量;
Figure PCTCN2019109285-appb-000052
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000053
Figure PCTCN2019109285-appb-000054
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000055
Figure PCTCN2019109285-appb-000056
Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000057
Figure PCTCN2019109285-appb-000058
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E k(t)为预测误差,且
Figure PCTCN2019109285-appb-000059
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述处理器执行所述计算机程序时实现以下步骤:
根据公式:
Figure PCTCN2019109285-appb-000060
Figure PCTCN2019109285-appb-000061
获取后混响子带自功率谱估计;
其中,
Figure PCTCN2019109285-appb-000062
为后混响子带自功率谱估计;
Figure PCTCN2019109285-appb-000063
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000064
W τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000065
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;
Figure PCTCN2019109285-appb-000066
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000067
Figure PCTCN2019109285-appb-000068
为第t-τ-D s帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
可选地,当所述麦克风为麦克风阵列时,所述处理器执行所述计算机程序时实现以下步骤:
获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计。
进一步地,所述处理器执行所述计算机程序时实现以下步骤:
根据公式:
Figure PCTCN2019109285-appb-000069
获取混响语音信号经空间 滤波处理后的单声道输出信号的子带谱;
其中,Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;X r(t,k)为第t帧第k个子带的第r个麦克风输出信号的子带谱;M为麦克风阵列的总个数;
Figure PCTCN2019109285-appb-000070
m=1,2,…,M;t为信号帧的时间索引,k为子带索引。
进一步地,所述处理器执行所述计算机程序时实现以下步骤:
根据公式:
Figure PCTCN2019109285-appb-000071
获取经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
其中,
Figure PCTCN2019109285-appb-000072
为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
Figure PCTCN2019109285-appb-000073
为第t-1帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述处理器执行所述计算机程序时实现以下步骤:
根据公式:
Figure PCTCN2019109285-appb-000074
获取用于所述混响语音信号经空间滤波处理后的单声道输出信号中后混响子带自功率谱估计的DLP预测系数矢量;
其中,
Figure PCTCN2019109285-appb-000075
为第t+1帧子带k上的DLP预测系数矢量;
Figure PCTCN2019109285-appb-000076
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000077
Figure PCTCN2019109285-appb-000078
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000079
Figure PCTCN2019109285-appb-000080
Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000081
Figure PCTCN2019109285-appb-000082
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E k(t)为预测误差,且
Figure PCTCN2019109285-appb-000083
为第t帧第k个子带的经空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述处理器执行所述计算机程序时实现以下步骤:
根据公式:
Figure PCTCN2019109285-appb-000084
Figure PCTCN2019109285-appb-000085
获取后混响子带自功率谱估计;
其中,
Figure PCTCN2019109285-appb-000086
为后混响子带自功率谱估计;
Figure PCTCN2019109285-appb-000087
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000088
W τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000089
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;
Figure PCTCN2019109285-appb-000090
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000091
Figure PCTCN2019109285-appb-000092
为第t-τ-D s帧第k个子带的空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现上述的混响语音信号中后混响功率谱的自适应估计方法。
本公开实施例还提供一种混响语音信号中后混响功率谱的自适应估计装置,包括:
第一获取模块,用于获取麦克风拾取的混响语音信号的子带自功率谱的估计;
第二获取模块,用于获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量;
第三获取模块,用于根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计。
可选地,当所述麦克风为单麦克风时,所述第一获取模块,用于:
根据公式:
Figure PCTCN2019109285-appb-000093
获取混响语音信号的子带自功率谱的估计;
其中,
Figure PCTCN2019109285-appb-000094
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;
Figure PCTCN2019109285-appb-000095
为第t-1帧第k个子带的混响语音信号的子带自功率谱的估计;X(t,k)为第t帧第k个子带的混响语音信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述第二获取模块,用于:
根据公式:
Figure PCTCN2019109285-appb-000096
获取DLP预测系数矢量;
其中,
Figure PCTCN2019109285-appb-000097
为第t+1帧子带k上的DLP预测系数矢量;
Figure PCTCN2019109285-appb-000098
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000099
Figure PCTCN2019109285-appb-000100
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000101
Figure PCTCN2019109285-appb-000102
Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000103
Figure PCTCN2019109285-appb-000104
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E k(t)为预测误差,且
Figure PCTCN2019109285-appb-000105
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间 索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述第三获取模块,用于:
根据公式:
Figure PCTCN2019109285-appb-000106
Figure PCTCN2019109285-appb-000107
获取后混响子带自功率谱估计;
其中,
Figure PCTCN2019109285-appb-000108
为后混响子带自功率谱估计;
Figure PCTCN2019109285-appb-000109
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000110
W τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000111
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;
Figure PCTCN2019109285-appb-000112
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000113
Figure PCTCN2019109285-appb-000114
为第t-τ-D s帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
可选地,当所述麦克风为麦克风阵列时,所述第一获取模块,包括:
第一获取单元,用于获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
第二获取单元,用于根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计。
进一步地,所述第一获取单元,用于:
根据公式:
Figure PCTCN2019109285-appb-000115
获取混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
其中,Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;X r(t,k)为第t帧第k个子带的第r个麦克风输出信号的子带谱;M为麦克风阵列的总个数;
Figure PCTCN2019109285-appb-000116
m=1,2,…,M;t 为信号帧的时间索引,k为子带索引。
进一步地,所述第二获取单元,用于:
根据公式:
Figure PCTCN2019109285-appb-000117
获取经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
其中,
Figure PCTCN2019109285-appb-000118
为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
Figure PCTCN2019109285-appb-000119
为第t-1帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述第二获取模块,用于:
根据公式:
Figure PCTCN2019109285-appb-000120
获取用于所述混响语音信号经空间滤波处理后的单声道输出信号中后混响子带自功率谱估计的DLP预测系数矢量;
其中,
Figure PCTCN2019109285-appb-000121
为第t+1帧子带k上的DLP预测系数矢量;
Figure PCTCN2019109285-appb-000122
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000123
Figure PCTCN2019109285-appb-000124
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000125
Figure PCTCN2019109285-appb-000126
Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000127
Figure PCTCN2019109285-appb-000128
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E k(t)为预测误差,且
Figure PCTCN2019109285-appb-000129
为第t帧第k个子带的经空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述第三获取模块,用于:
根据公式:
Figure PCTCN2019109285-appb-000130
Figure PCTCN2019109285-appb-000131
获取后混响子带自功率谱估计;
其中,
Figure PCTCN2019109285-appb-000132
为后混响子带自功率谱估计;
Figure PCTCN2019109285-appb-000133
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000134
W τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000135
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;
Figure PCTCN2019109285-appb-000136
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000137
Figure PCTCN2019109285-appb-000138
为第t-τ-D s帧第k个子带的空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
本公开的有益效果是:
上述方案,通过利用延时的线性预测DLP预测系数矢量来进行后混响子带自功率谱估计的获取,可以保证语音信号去混响的有效性,降低了去混响的难度,提高了去混响的效率。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对本公开实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1表示应用DLP来自适应估计后混响信号子带自功率谱的原理框图;
图2表示基于单麦克风的混响语音信号中后混响成分抑制方法的算法流程图;
图3表示基于麦克风阵列的混响语音信号中后混响成分抑制方法的原理 框图;
图4表示基于麦克风阵列的混响语音信号中后混响成分抑制方法的算法流程图;
图5表示本公开实施例的混响语音信号中后混响功率谱的自适应估计方法的流程示意图;
图6表示本公开实施例的混响语音信号中后混响功率谱的自适应估计装置的模块示意图;
图7表示本公开实施例的混响语音信号中后混响功率谱的自适应估计装置的结构示意图。
具体实施方式
为使本公开的目的、技术方案和优点更加清楚,下面将结合附图及具体实施例对本公开进行详细描述。
在相关技术中,语音信号去混响技术大致有三大类,第一类是采用麦克风阵列处理技术,该技术首先估计声源相对麦克风阵列的方位(Direction of Arrival,DOA),通过控制麦克风阵列的方向性来增强来自声源方向的直达信号成分,并减小和消除来自其它方向的声源反射信号成分,从而达到去混响的目的;为了获得令人满意的去混响效果,该技术通常需要大量数目的麦克风,以便阵列获得充分的方向性增益。第二类去混响技术则是在频域对后混响信号进行抑制处理的方法,该方法首先估计出工作环境的混响时间参数(RT60),并据此估计出后混响信号的功率谱,然后应用噪声抑制中的谱减法对后混响信号进行抑制处理;尽管该技术不涉及信号的相位信息而使其处理性能具有较好的鲁棒性,但由于目前尚缺乏关于工作环境中与频率关联的混响时间参数(RT60)的高精度实时估计算法,故该技术的去混响性能受限。第三类去混响技术则是基于逆滤波的思想,其目标是估计出引发混响的室内冲激响应(Room Impulse Response,RIR)的逆滤波器,用其对混响语音信号进行滤波处理以恢复源信号;在声源到麦克风的室内传递函数(Room Transfer Function,RTF)已知的情况下,用RTF的逆滤波器可以从观测的混响信号中精确地恢复出其源信号;业已证明:在麦克风数目大于已激活的声源数目、 并且每个声源到每个麦克风的RTF不存在共同的零点的条件下,上述功能的逆滤波器解是存在的。然而在实际应用中,RTF(或其等效的逆滤波器)是时变的、未知的,需要从已获的观测数据中估计出。为此,大量学者致力于该领域的探索和研究,提出了许多方法,最为引人注目的便是基于延时的线性预测(Delayed Linear Prediction,DLP)的后混响抑制技术,该技术能有效地抑制后混响成分而未明显地损伤语音的短时相关性,但它要求DLP的滤波器阶数很高(滤波器通常有数千个系数),因而需要很长的观测数据,由此导致该技术具有很高的计算负荷,难以在商用的数字信号处理器(Digital Signal Processor,DSP)芯片上实时实现。此外,人们还提出将时变语音信号源模型与多声道线性预测相结合来进行去混响的方法,该方法可以基于较短的观测数据有效地抑制后混响,而且对前混响也有抑制的效果;但它固有的计算复杂度致使其无法在实际中应用。最近,人们将基于DLP的去混响技术拓展到处理时变语音信号的场景,提出了一种称之为方差归一化延时的线性预测(NDLP)去混响技术,NDLP的频域实现即为著名的加权预测误差(Weighted Prediction Error,WPE)去混响算法;尽管WPE性能具有较好的鲁棒性,但它涉及一个高阶观测数据相关矩阵的伪逆运算,因而在商用DSP上实现时通常耗费较多的计算资源。
第一类基于麦克风阵列处理的去混响技术,其性能受限于阵列的麦克风数目,要获得令人满意的去混响结果,势必需要大量的麦克风,这便导致实际产品的成本提高和结构设计的困难增加。第二类在频域对后混响信号进行抑制处理的去混响技术需要首先估计出工作环境的混响时间参数(RT60),但由于目前尚缺乏关于工作环境中与频率关联的混响时间参数(RT60)的高精度实时估计算法,故该技术的去混响性能受限。第三类基于逆滤波思想的去混响技术中能实际应用的WPE方法涉及一个高阶观测数据相关矩阵的伪逆运算,因而在商用DSP上实现时通常耗费较多的计算资源。
本公开将DLP的思想拓展到子带功率谱域,提出一种关于后混响自功率谱的低复杂度、实时在线自适应估计方法,根据这一后混响自功率谱的估计和观测信号的子带谱,应用决策-引导(Decision-Directed,DD)递归平滑技术,来计算先验SNR,并据此计算抑制后混响成分的子带增益函数,用之来 修正观测信号子带谱,从而达到抑制后混响成分的目的。
本公开针对相关技术中的语音信号去混响技术存在实际产品的成本高和结构设计困难、去混响性能受限或耗费较多的计算资源,不能有效保证语音信号去混响的问题,提供一种混响语音信号中后混响功率谱的自适应估计方法及装置。
下面对本公开实施例的实现原理进行说明如下。
本公开实施例中,首先从单声道(即单麦克风)场景出发,给出一种基于单麦克风的后混响成分的抑制方法,然后推广到麦克风阵列应用场景。
一、基于单麦克风的混响语音信号中后混响成分的抑制方法
设有声源到麦克风的室内冲激响应为h(n),声源信号为s(n),麦克风获取的混响语音信号为x(n),那么x(n)可用下述数学公式一来表述:
公式一、
Figure PCTCN2019109285-appb-000139
其中,R为室内冲击响应的长度,D c为前混响和后混响区分的临界点,s early(n)为包含直达声源信号的前混响信号,s late(n)为后混响信号,s early(n)和s late(n)分别由下式定义:
公式二、
Figure PCTCN2019109285-appb-000140
公式三、
Figure PCTCN2019109285-appb-000141
应用分析滤波器组(Analysis Filter Bank,AFB)对公式一两边进行子带变换(短时傅里叶变换可以看作是子带变换的一种特例)可得:
公式四、
Figure PCTCN2019109285-appb-000142
其中X(t,k)、S(t,k)、H(t,k)、S early(t,k)和S late(t,k)分别为数字信号x(n)、s(n)、h(n)、s early(n)和s late(n)的子带变换,
Figure PCTCN2019109285-appb-000143
N为子带变换的信号帧长度,t为信号帧的时间索引,k为子带索引,n为数字信号的样本时间索引。
假设,相邻帧子带信号间的自相关性较低,那么子带谱信号X(t,k)对应的子带自功率谱可表述为:
公式五、
Figure PCTCN2019109285-appb-000144
Figure PCTCN2019109285-appb-000145
其中,P X(t,k)、
Figure PCTCN2019109285-appb-000146
和P S(t,k)分别为子带信号X(t,k)、S early(t,k)、S late(t,k)和S(t,k)所对应的子带自功率谱,E{·}为统计平均算子。
采用延时的线性预测(DLP)表述法,公式五可以表示成:
公式六、
Figure PCTCN2019109285-appb-000147
其中W τ(t,k)为第t帧第k个子带上DLP的第τ个非负的系数,τ=0,1,2,…,Q-1;Q=R s-D s为DLP的系数个数,
Figure PCTCN2019109285-appb-000148
Figure PCTCN2019109285-appb-000149
为后混响子带自功率谱的估计。
公式六表明:在子带功率谱域,采用DLP技术可以预测估计出后混响信号的子带自功率谱,其预测估计的残差便是与后混响信号不相关的有用的前混响信号的子带自功率谱,因而一定是非负的。为将这一约束条件集成到DLP的预测系数求解中,我们来定义代价函数
Figure PCTCN2019109285-appb-000150
和惩罚函数
Figure PCTCN2019109285-appb-000151
分别为:
公式七、
Figure PCTCN2019109285-appb-000152
公式八、
Figure PCTCN2019109285-appb-000153
其中,E k(t)用公式九表示为:
公式九、
Figure PCTCN2019109285-appb-000154
Figure PCTCN2019109285-appb-000155
用公式十表示为:
公式十、
Figure PCTCN2019109285-appb-000156
Figure PCTCN2019109285-appb-000157
Figure PCTCN2019109285-appb-000158
用公式十一表示为:
公式十一、
Figure PCTCN2019109285-appb-000159
那么,最优的预测系数矢量
Figure PCTCN2019109285-appb-000160
则是使下述准则函数
Figure PCTCN2019109285-appb-000161
达到最小化的解,即:
公式十二、
Figure PCTCN2019109285-appb-000162
其中,
Figure PCTCN2019109285-appb-000163
由公式十三定义:
公式十三、
Figure PCTCN2019109285-appb-000164
这里的β为正常数。
根据公式七、八和十三得到:
公式十四、
Figure PCTCN2019109285-appb-000165
Figure PCTCN2019109285-appb-000166
从而求解最佳的DLP预测系数矢量
Figure PCTCN2019109285-appb-000167
的NLMS自适应算法可由公式十五表示:
公式十五、
Figure PCTCN2019109285-appb-000168
Figure PCTCN2019109285-appb-000169
其中,μ和β为正常数,且0<μ(1+β)<2,E k(t)为公式九定义的预测误差。
应用DLP来自适应估计后混响信号子带自功率谱的原理框图如图1所示。在实际工程实现上,观测信号子带自功率谱的估计可以用公式十六的时间递归平滑技术来计算,即:
公式十六、
Figure PCTCN2019109285-appb-000170
这里0<λ<1为预设的平滑常数。那么,后混响信号子带自功率谱的估计为:
公式十七、
Figure PCTCN2019109285-appb-000171
既然自适应滤波器可获得DLP系数矢量,根据公式十七我们即可获得后混响信号子带自功率谱的估计,那么应用谱减法技术来进行后混响信号的抑制便是很自然的事;为此,我们分别用公式十八和公式十九定义子带先验信噪比ξ(t,k)和后验信噪比η(t,k)如下:
公式十八、
Figure PCTCN2019109285-appb-000172
公式十九、
Figure PCTCN2019109285-appb-000173
那么,应用DD技术按下述递归公式来计算先验信噪比的估计
Figure PCTCN2019109285-appb-000174
即:
公式二十、
Figure PCTCN2019109285-appb-000175
其中,
Figure PCTCN2019109285-appb-000176
为后验信噪比η(t,k)的估计,α为预设的平滑系数。
相应地,根据Wiener滤波理论,我们可得后混响信号抑制的子带增益函数G(t,k)用公式二十一表示为:
公式二十一、
Figure PCTCN2019109285-appb-000177
用公式二十一计算的抑制增益来修正观测信号的子带谱,即获得前混响信号子带谱的一个有效估计为:
公式二十二、
Figure PCTCN2019109285-appb-000178
应用合成滤波器组(Synthesis Filter bank,SFB)将
Figure PCTCN2019109285-appb-000179
从子带域变 换回时域语音信号
Figure PCTCN2019109285-appb-000180
输出给后续相关处理系统。
注意到公式二十中第一项可以等效为:
公式二十三、
Figure PCTCN2019109285-appb-000181
将公式二十三代入公式二十可得:
公式二十四、
Figure PCTCN2019109285-appb-000182
上述方案,首先提出了一种基于单麦克风的混响语音信号中后混响成分抑制的子带域方法,具体表述为:在子带功率谱域,提出了一种带约束的NLMS自适应算法,用来学习更新DLP滤波器系数矢量,并据此获得后混响信号的子带自功率谱估计;根据后混响信号的子带功率谱估计和麦克风观测信号子带谱,应用DD技术来计算相应的先验信噪比估计值,进而求得用于后混响抑制的子带增益函数;用该子带增益函数来修正麦克风观测信号子带谱,从而获得目标信号的子带谱。
综上所述,基于单麦克风的混响语音信号中后混响成分抑制方法的算法流程图如图2所示,具体实现过程为:
首先,初始化算法相关的参数和变量,设置信号帧序号t=0;读取第t帧麦克风拾取的观测数据,并应用AFB对读取的第t帧观测数据进行子带变换,获取相应的子带谱X(t,k);根据公式九和公式十五至十七估计后混响信号子带自功率谱;根据公式二十四和公式二十一计算用于后混响信号抑制的子带抑制增益函数G(t,k);根据公式二十二计算目标信号的子带谱估计,并用SFB将目标子带谱变换为时域的目标语音信号并予以输出;判断处理过程是否结束,在处理过程未结束时,执行t=t+1,然后依次执行上述步骤,直到处理过程结束,结束处理流程。
二、基于麦克风阵列的混响语音信号中后混响成分的抑制方法
设室内有一个声源和一个由M个麦克风组成的阵列,记第m个麦克风拾 取的观测语音信号为x m(n),m=1,2,…,M。那么,首先对麦克风阵列输入信号进行空间滤波预处理,然后对预处理的单声道输出信号,应用上面叙述中提出的方法对其中后混响成分进行抑制处理,从而获得增强处理后的子带谱
Figure PCTCN2019109285-appb-000183
首先,应用AFB对M个麦克风阵列的时域输入数字信号{x m(n),m=1,2,…,M}进行子带变换,相应地获得M个子带信号,它们分别记为X m(t,k),m=1,2,…,M,这里t为信号帧时间索引,k为子带索引。不失一般性,假设第r个麦克风为参考麦克风,那么以参考麦克风子带信号的相位为基准,将其它所有麦克风子带信号的相位与之做同步处理,则得:
公式二十五、
Figure PCTCN2019109285-appb-000184
对公式二十五定义的M个声道的子带信号作如下的空间平均处理,便获得空间滤波单声道输出的子带信号Y(t,k),即:
公式二十六、
Figure PCTCN2019109285-appb-000185
公式二十五和公式二十六实际上是相关技术中的“延时-相加”波束赋型器在子带域的一种实现形式,业已证明这种空间处理器具有因不同声道间空间相关性所引发的信号畸变的缺陷。为此,我们对公式二十五定义的M个声道的子带信号作如下的空间处理,便获得空间滤波单声道输出的子带信号Z(t,k)为:
公式二十七、
Figure PCTCN2019109285-appb-000186
事实上,公式二十五和公式二十七在子带域定义的这种波束赋型器,其方向模式(directivity pattern)等同于相关技术中的“延时-相加”波束赋型器,但由于公式二十七式中采用了麦克风接收信号的功率谱空间平均,而不是像公式二十六式中所采用(复数)频谱的空间平均,因而避免了“延时-相加”波束赋型器的空间相关性所引发的信号畸变的缺陷。
应用上面介绍的基于单麦克风混响语音信号中后混响成分抑制方法,对上述波束赋型器输出的子带信号Z(t,k)进行处理,便可获得去混响的目标子带信号
Figure PCTCN2019109285-appb-000187
再应用SFB对目标子带信号进行子带反变换,即得时域目标信号
Figure PCTCN2019109285-appb-000188
基于麦克风阵列的混响语音信号中后混响成分抑制方法的原理框图如图3所示,其中,子带自功率谱计算器按下述公式二十八估计空间滤波器输出子带信号Z(t,k)的自功率谱:
公式二十八、
Figure PCTCN2019109285-appb-000189
而基于DLP后混响子带自功率谱估计器计算出子带信号Z(t,k)中的后混响子带自功率谱估计为:
公式二十九、
Figure PCTCN2019109285-appb-000190
其中,
Figure PCTCN2019109285-appb-000191
为子带k上的DLP自适应滤波器的系数矢量,其自适应更新由下述的约束型NLMS算法确定:
公式三十、
Figure PCTCN2019109285-appb-000192
公式三十一、
Figure PCTCN2019109285-appb-000193
公式三十二、
Figure PCTCN2019109285-appb-000194
其中,0<μ(1+β)<2。
根据
Figure PCTCN2019109285-appb-000195
和Z(t,k),后混响抑制的子带增益函数计算器模块将给出G(t,k)如下:
公式三十三、
Figure PCTCN2019109285-appb-000196
其中,先验SNR的估计
Figure PCTCN2019109285-appb-000197
由下式递归平滑求得:
公式三十四、
Figure PCTCN2019109285-appb-000198
这里,0<α<1为预设的平滑系数,后验SNR的估计
Figure PCTCN2019109285-appb-000199
为:
公式三十五、
Figure PCTCN2019109285-appb-000200
用G(t,k)修正Z(t,k)的如下的目标子带信号的估计为:
公式三十六、
Figure PCTCN2019109285-appb-000201
应用SFB将目标子带信号变换为时域目标语音信号
Figure PCTCN2019109285-appb-000202
上述方案为应用于麦克风阵列的后处理,提出的一种基于麦克风阵列的混响语音信号中后混响成分抑制的子带域方法。该方法首先在子带域,对麦克风阵列获取的观测信号子带谱,定义一种新的波束赋型器对其做空间预处理器,从而降低子带谱的偏差;然后对空间预处理器的输出子带谱信号应用基于单麦克风的情况中提出的方法进行后处理,因而获得最终的目标语音信号,从而完成去混响的任务;此种在子带域上实现的新型波束赋型器,其方向模式等同于相关技术中的“延时-相加”波束赋型器,并降低了子带谱信号的偏差,但它克服了相关技术中的“延时-相加”波束赋型器因不同声道间的空间相关性而引发的信号畸变的缺陷,从而确保基于单麦克风所提的方法用作麦克风阵列后处理器的运行环境。
基于麦克风阵列的混响语音信号中后混响成分抑制方法的算法流程图如图4所示,其具体实现过程为:
首先,初始化算法相关的参数和变量,设置信号帧序号t=0;读取第t帧M组麦克风拾取的观测数据,并应用AFB对读取的第t帧观测数据进行子带变换,获取M组相应的子带谱;根据公式二十五和公式二十七对M组麦克风信号子带谱进行相位同步和空间滤波处理,获取子带谱Z(t,k);根据公式二十八至三十五计算用于后混响信号抑制的子带抑制增益函数G(t,k);根据公式三十六计算目标信号的子带谱估计,并用SFB将目标子带谱变换为时域的目标语音信号并予以输出;判断处理过程是否结束,在处理过程未结束时,执行t=t+1,然后依次执行上述步骤,直到处理过程结束,结束处理流程。
下面对本公开实施例的具体实现过程说明如下。
如图5所示,本公开实施例提供一种混响语音信号中后混响功率谱的自适应估计方法,包括:
步骤51,获取麦克风拾取的混响语音信号的子带自功率谱的估计;
步骤52,获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量;
步骤53,根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计。
一、当所述麦克风为单麦克风时
具体地,所述步骤51的实现方式为:
根据上述的公式十六:
Figure PCTCN2019109285-appb-000203
获取混响语音信号的子带自功率谱的估计;
其中,
Figure PCTCN2019109285-appb-000204
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;
Figure PCTCN2019109285-appb-000205
为第t-1帧第k个子带的混响语音信号的子带自功率谱的估计;X(t,k)为第t帧第k个子带的混响语音信号的子带谱;t为信号帧的时间索引,k为子带索引。
具体地,所述步骤52的实现方式为:
根据上述的公式十五:
Figure PCTCN2019109285-appb-000206
获取DLP预测系数矢量;
其中,
Figure PCTCN2019109285-appb-000207
为第t+1帧子带k上的DLP预测系数矢量;
Figure PCTCN2019109285-appb-000208
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000209
Figure PCTCN2019109285-appb-000210
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000211
Figure PCTCN2019109285-appb-000212
Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000213
Figure PCTCN2019109285-appb-000214
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E k(t)为预测误差,且
Figure PCTCN2019109285-appb-000215
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
具体地,所述步骤53的实现过程为:
根据上述的公式十七:
Figure PCTCN2019109285-appb-000216
获取后混响子带自功率谱估计;
其中,
Figure PCTCN2019109285-appb-000217
为后混响子带自功率谱估计;
Figure PCTCN2019109285-appb-000218
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000219
W τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000220
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;
Figure PCTCN2019109285-appb-000221
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000222
Figure PCTCN2019109285-appb-000223
为第t-τ-D s帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
二、当所述麦克风为麦克风阵列时,
具体地,所述步骤51的实现方式为:
获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计。
进一步地,所述获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱,包括:
根据上述的公式二十七:
Figure PCTCN2019109285-appb-000224
获取混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
其中,Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;X r(t,k)为第t帧第k个子带的第r个麦克风输出信号的子带谱;M为麦克风阵列的总个数;
Figure PCTCN2019109285-appb-000225
m=1,2,…,M;t为信号帧的时间索引,k为子带索引。
进一步地,所述根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计,包括:
根据上述的公式二十八:
Figure PCTCN2019109285-appb-000226
获取经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
其中,
Figure PCTCN2019109285-appb-000227
为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
Figure PCTCN2019109285-appb-000228
为第t-1帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;t为信号帧的时间索引,k为子带索引。
具体地,所述步骤52的实现方式为:
根据上述的公式三十二:
Figure PCTCN2019109285-appb-000229
获取用于所述混响语音信号经空间滤波处理后的单声道输出信号中后混响子带自功率谱估计的DLP预测系数矢量;
其中,
Figure PCTCN2019109285-appb-000230
为第t+1帧子带k上的DLP预测系数矢量;
Figure PCTCN2019109285-appb-000231
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000232
Figure PCTCN2019109285-appb-000233
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000234
Figure PCTCN2019109285-appb-000235
Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000236
Figure PCTCN2019109285-appb-000237
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E k(t)为预测误差,且
Figure PCTCN2019109285-appb-000238
为第t帧第k个子带的经空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
具体地,所述步骤53的实现过程为:
根据上述的公式二十九:
Figure PCTCN2019109285-appb-000239
获取后混响子带自功率谱估计;
其中,
Figure PCTCN2019109285-appb-000240
为后混响子带自功率谱估计;
Figure PCTCN2019109285-appb-000241
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000242
W τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000243
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;
Figure PCTCN2019109285-appb-000244
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000245
Figure PCTCN2019109285-appb-000246
为第t-τ-D s帧第k个子带的空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
需要说明的是,本公开提出的这种混响语音信号中后混响功率谱的自适 应估计方法,降低了去混响的难度,提高了去混响的效率,与相关技术中的方法相比,它具有更好的鲁棒性、更低的算法复杂度,便于在实际中实时在线实现。
如图6所示,本公开实施例还提供一种混响语音信号中后混响功率谱的自适应估计装置,包括:
第一获取模块61,用于获取麦克风拾取的混响语音信号的子带自功率谱的估计;
第二获取模块62,用于获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量;
第三获取模块63,用于根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计。
可选地,当所述麦克风为单麦克风时,所述第一获取模块61,用于:
根据公式:
Figure PCTCN2019109285-appb-000247
获取混响语音信号的子带自功率谱的估计;
其中,
Figure PCTCN2019109285-appb-000248
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;
Figure PCTCN2019109285-appb-000249
为第t-1帧第k个子带的混响语音信号的子带自功率谱的估计;X(t,k)为第t帧第k个子带的混响语音信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述第二获取模块62,用于:
根据公式:
Figure PCTCN2019109285-appb-000250
获取DLP预测系数矢量;
其中,
Figure PCTCN2019109285-appb-000251
为第t+1帧子带k上的DLP预测系数矢量;
Figure PCTCN2019109285-appb-000252
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000253
Figure PCTCN2019109285-appb-000254
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000255
Figure PCTCN2019109285-appb-000256
Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000257
Figure PCTCN2019109285-appb-000258
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E k(t)为预测误差,且
Figure PCTCN2019109285-appb-000259
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述第三获取模块63,用于:
根据公式:
Figure PCTCN2019109285-appb-000260
Figure PCTCN2019109285-appb-000261
获取后混响子带自功率谱估计;
其中,
Figure PCTCN2019109285-appb-000262
为后混响子带自功率谱估计;
Figure PCTCN2019109285-appb-000263
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000264
W τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000265
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;
Figure PCTCN2019109285-appb-000266
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000267
Figure PCTCN2019109285-appb-000268
为第t-τ-D s帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
可选地,当所述麦克风为麦克风阵列时,所述第一获取模块61,包括:
第一获取单元,用于获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
第二获取单元,用于根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计。
进一步地,所述第一获取单元,用于:
根据公式:
Figure PCTCN2019109285-appb-000269
获取混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
其中,Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;X r(t,k)为第t帧第k个子带的第r个麦克风输出信号的子带谱;M为麦克风阵列的总个数;
Figure PCTCN2019109285-appb-000270
m=1,2,…,M;t为信号帧的时间索引,k为子带索引。
进一步地,所述第二获取单元,用于:
根据公式:
Figure PCTCN2019109285-appb-000271
获取经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
其中,
Figure PCTCN2019109285-appb-000272
为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
Figure PCTCN2019109285-appb-000273
为第t-1帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述第二获取模块62,用于:
根据公式:
Figure PCTCN2019109285-appb-000274
获取用于所述混响语音信号经空间滤波处理后的单声道输出信号中后混响子带自功率谱估计的DLP预测系数矢量;
其中,
Figure PCTCN2019109285-appb-000275
为第t+1帧子带k上的DLP预测系数矢量;
Figure PCTCN2019109285-appb-000276
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000277
Figure PCTCN2019109285-appb-000278
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000279
Figure PCTCN2019109285-appb-000280
Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000281
Figure PCTCN2019109285-appb-000282
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E k(t)为预测误差,且
Figure PCTCN2019109285-appb-000283
为第t帧第k个子带的经空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述第三获取模块63,用于:
根据公式:
Figure PCTCN2019109285-appb-000284
Figure PCTCN2019109285-appb-000285
获取后混响子带自功率谱估计;
其中,
Figure PCTCN2019109285-appb-000286
为后混响子带自功率谱估计;
Figure PCTCN2019109285-appb-000287
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000288
W τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000289
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;
Figure PCTCN2019109285-appb-000290
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000291
Figure PCTCN2019109285-appb-000292
为第t-τ-D s帧第k个子带的空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
需要说明的是,该装置的实施例是与上述方法实施例一一对应的装置,上述方法实施例中所有实现方式均适用于该装置的实施例中,也能达到相同的技术效果。
如图7所示,本公开实施例还提供一种混响语音信号中后混响功率谱的 自适应估计装置,包括存储器71、处理器72及存储在所述存储器71上并可在所述处理器上运行的计算机程序,且所述存储器71通过总线接口73与所述处理器72连接;其中,所述处理器72执行所述计算机程序时实现以下步骤:
获取麦克风拾取的混响语音信号的子带自功率谱的估计;
获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量;
根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计。
可选地,当所述麦克风为单麦克风时,所述处理器72执行所述计算机程序时实现以下步骤:
根据公式:
Figure PCTCN2019109285-appb-000293
获取混响语音信号的子带自功率谱的估计;
其中,
Figure PCTCN2019109285-appb-000294
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;
Figure PCTCN2019109285-appb-000295
为第t-1帧第k个子带的混响语音信号的子带自功率谱的估计;X(t,k)为第t帧第k个子带的混响语音信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述处理器72执行所述计算机程序时实现以下步骤:
根据公式:
Figure PCTCN2019109285-appb-000296
获取DLP预测系数矢量;
其中,
Figure PCTCN2019109285-appb-000297
为第t+1帧子带k上的DLP预测系数矢量;
Figure PCTCN2019109285-appb-000298
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000299
Figure PCTCN2019109285-appb-000300
为第t-D s帧第k个子带的混 响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000301
Figure PCTCN2019109285-appb-000302
Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000303
Figure PCTCN2019109285-appb-000304
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E k(t)为预测误差,且
Figure PCTCN2019109285-appb-000305
为第t帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述处理器72执行所述计算机程序时实现以下步骤:
根据公式:
Figure PCTCN2019109285-appb-000306
Figure PCTCN2019109285-appb-000307
获取后混响子带自功率谱估计;
其中,
Figure PCTCN2019109285-appb-000308
为后混响子带自功率谱估计;
Figure PCTCN2019109285-appb-000309
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000310
W τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000311
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;
Figure PCTCN2019109285-appb-000312
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000313
Figure PCTCN2019109285-appb-000314
为第t-τ-D s帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
可选地,当所述麦克风为麦克风阵列时,所述处理器72执行所述计算机程序时实现以下步骤:
获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计。
进一步地,所述处理器72执行所述计算机程序时实现以下步骤:
根据公式:
Figure PCTCN2019109285-appb-000315
获取混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
其中,Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;X r(t,k)为第t帧第k个子带的第r个麦克风输出信号的子带谱;M为麦克风阵列的总个数;
Figure PCTCN2019109285-appb-000316
m=1,2,…,M;t为信号帧的时间索引,k为子带索引。
进一步地,所述处理器72执行所述计算机程序时实现以下步骤:
根据公式:
Figure PCTCN2019109285-appb-000317
获取经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
其中,
Figure PCTCN2019109285-appb-000318
为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
Figure PCTCN2019109285-appb-000319
为第t-1帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;t为信号帧的时间索引,k为子带索引。
进一步地,所述处理器72执行所述计算机程序时实现以下步骤:
根据公式:
Figure PCTCN2019109285-appb-000320
获取用于所述混响语音信号经空间滤波处理后的单声道输出信号中后混响子带自功率谱估计的DLP预测系数矢量;
其中,
Figure PCTCN2019109285-appb-000321
为第t+1帧子带k上的DLP预测系数矢量;
Figure PCTCN2019109285-appb-000322
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000323
Figure PCTCN2019109285-appb-000324
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000325
Figure PCTCN2019109285-appb-000326
Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000327
Figure PCTCN2019109285-appb-000328
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E k(t)为预测误差,且
Figure PCTCN2019109285-appb-000329
为第t帧第k个子带的经空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
进一步地,所述处理器72执行所述计算机程序时实现以下步骤:
根据公式:
Figure PCTCN2019109285-appb-000330
Figure PCTCN2019109285-appb-000331
获取后混响子带自功率谱估计;
其中,
Figure PCTCN2019109285-appb-000332
为后混响子带自功率谱估计;
Figure PCTCN2019109285-appb-000333
为第t帧子带k上的DLP预测系数矢量,且
Figure PCTCN2019109285-appb-000334
W τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R s-D s
Figure PCTCN2019109285-appb-000335
R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;
Figure PCTCN2019109285-appb-000336
为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
Figure PCTCN2019109285-appb-000337
Figure PCTCN2019109285-appb-000338
为第t-τ-D s帧第k个子带的空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述的混响语音信号中后混响功率谱的自适应估计方法。
本公开的技术方案本质上或者说对相关技术做出贡献的部分或者该技术 方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
此外,需要指出的是,在本公开的装置和方法中,显然,各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本公开的等效方案。并且,执行上述系列处理的步骤可以自然地按照说明的顺序按时间顺序执行,但是并不需要一定按照时间顺序执行,某些步骤可以并行或彼此独立地执行。对本领域的普通技术人员而言,能够理解本公开的方法和装置的全部或者任何步骤或者部件,可以在任何计算装置(包括处理器、存储介质等)或者计算装置的网络中,以硬件、固件、软件或者它们的组合加以实现,这是本领域普通技术人员在阅读了本公开的说明的情况下运用他们的基本编程技能就能实现的。
因此,本公开的目的还可以通过在任何计算装置上运行一个程序或者一组程序来实现。所述计算装置可以是公知的通用装置。因此,本公开的目的也可以仅仅通过提供包含实现所述方法或者装置的程序代码的程序产品来实现。也就是说,这样的程序产品也构成本公开,并且存储有这样的程序产品的存储介质也构成本公开。显然,所述存储介质可以是任何公知的存储介质或者将来所开发出来的任何存储介质。还需要指出的是,在本公开的装置和方法中,显然,各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本公开的等效方案。并且,执行上述系列处理的步骤可以自然地按照说明的顺序按时间顺序执行,但是并不需要一定按照时间顺序执行。某些步骤可以并行或彼此独立地执行。
可以理解的是,本公开实施例描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一 个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本公开所述功能的其它电子单元或其组合中。
对于软件实现,可通过执行本公开实施例所述功能的模块(例如过程、函数等)来实现本公开实施例所述的技术。软件代码可存储在存储器中并通过处理器执行。存储器可以在处理器中或在处理器外部实现。
以上所述的是本公开的可选的实施方式,应当指出对于本技术领域的普通人员来说,在不脱离本公开所述的原理前提下还可以作出若干改进和润饰,这些改进和润饰也在本公开的保护范围内。

Claims (28)

  1. 一种混响语音信号中后混响功率谱的自适应估计方法,包括:
    获取麦克风拾取的混响语音信号的子带自功率谱的估计;
    获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量;
    根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计。
  2. 根据权利要求1所述的混响语音信号中后混响功率谱的自适应估计方法,其中,当所述麦克风为单麦克风时,所述获取麦克风拾取的混响语音信号的子带自功率谱的估计,包括:
    根据公式:
    Figure PCTCN2019109285-appb-100001
    获取混响语音信号的子带自功率谱的估计;
    其中,
    Figure PCTCN2019109285-appb-100002
    为第t帧第k个子带的混响语音信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;
    Figure PCTCN2019109285-appb-100003
    为第t-1帧第k个子带的混响语音信号的子带自功率谱的估计;X(t,k)为第t帧第k个子带的混响语音信号的子带谱;t为信号帧的时间索引,k为子带索引。
  3. 根据权利要求2所述的混响语音信号中后混响功率谱的自适应估计方法,其中,所述获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量,包括:
    根据公式:
    Figure PCTCN2019109285-appb-100004
    获取DLP预测系数矢量;
    其中,
    Figure PCTCN2019109285-appb-100005
    为第t+1帧子带k上的DLP预测系数矢量;
    Figure PCTCN2019109285-appb-100006
    为第 t帧子带k上的DLP预测系数矢量,且
    Figure PCTCN2019109285-appb-100007
    Figure PCTCN2019109285-appb-100008
    为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
    Figure PCTCN2019109285-appb-100009
    Figure PCTCN2019109285-appb-100010
    Q为DLP的系数个数,且Q=R s-D s
    Figure PCTCN2019109285-appb-100011
    Figure PCTCN2019109285-appb-100012
    R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E k(t)为预测误差,且
    Figure PCTCN2019109285-appb-100013
    为第t帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
  4. 根据权利要求2所述的混响语音信号中后混响功率谱的自适应估计方法,其中,所述根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计,包括:
    根据公式:
    Figure PCTCN2019109285-appb-100014
    Figure PCTCN2019109285-appb-100015
    获取后混响子带自功率谱估计;
    其中,
    Figure PCTCN2019109285-appb-100016
    为后混响子带自功率谱估计;
    Figure PCTCN2019109285-appb-100017
    为第t帧子带k上的DLP预测系数矢量,且
    Figure PCTCN2019109285-appb-100018
    W τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R s-D s
    Figure PCTCN2019109285-appb-100019
    R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;
    Figure PCTCN2019109285-appb-100020
    为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
    Figure PCTCN2019109285-appb-100021
    Figure PCTCN2019109285-appb-100022
    为第t-τ-D s帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
  5. 根据权利要求1所述的混响语音信号中后混响功率谱的自适应估计方法,其中,当所述麦克风为麦克风阵列时,所述获取麦克风拾取的混响语音 信号的子带自功率谱的估计,包括:
    获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
    根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计。
  6. 根据权利要求5所述的混响语音信号中后混响功率谱的自适应估计方法,其中,所述获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱,包括:
    根据公式:
    Figure PCTCN2019109285-appb-100023
    获取混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
    其中,Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;X r(t,k)为第t帧第k个子带的第r个麦克风输出信号的子带谱;M为麦克风阵列的总个数;
    Figure PCTCN2019109285-appb-100024
    t为信号帧的时间索引,k为子带索引。
  7. 根据权利要求5所述的混响语音信号中后混响功率谱的自适应估计方法,其中,所述根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计,包括:
    根据公式:
    Figure PCTCN2019109285-appb-100025
    获取经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
    其中,
    Figure PCTCN2019109285-appb-100026
    为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
    Figure PCTCN2019109285-appb-100027
    为第t-1帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;t为信号帧的时间索引,k为子带索引。
  8. 根据权利要求5所述的混响语音信号中后混响功率谱的自适应估计方法,其中,所述获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量,包括:
    根据公式:
    Figure PCTCN2019109285-appb-100028
    获取用于所述混响语音信号经空间滤波处理后的单声道输出信号中后混响子带自功率谱估计的DLP预测系数矢量;
    其中,
    Figure PCTCN2019109285-appb-100029
    为第t+1帧子带k上的DLP预测系数矢量;
    Figure PCTCN2019109285-appb-100030
    为第t帧子带k上的DLP预测系数矢量,且
    Figure PCTCN2019109285-appb-100031
    Figure PCTCN2019109285-appb-100032
    为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
    Figure PCTCN2019109285-appb-100033
    Figure PCTCN2019109285-appb-100034
    Q为DLP的系数个数,且Q=R s-D s
    Figure PCTCN2019109285-appb-100035
    Figure PCTCN2019109285-appb-100036
    R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E k(t)为预测误差,且
    Figure PCTCN2019109285-appb-100037
    为第t帧第k个子带的经空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
  9. 根据权利要求5所述的混响语音信号中后混响功率谱的自适应估计方法,其中,所述根据所述子带自功率谱的估计和DLP预测系数,获取后混响子带自功率谱估计,包括:
    根据公式:
    Figure PCTCN2019109285-appb-100038
    Figure PCTCN2019109285-appb-100039
    获取后混响子带自功率谱估计;
    其中,
    Figure PCTCN2019109285-appb-100040
    为后混响子带自功率谱估计;
    Figure PCTCN2019109285-appb-100041
    为第t帧子带k上的DLP预测系数矢量,且
    Figure PCTCN2019109285-appb-100042
    W τ(t,k)为 第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R s-D s
    Figure PCTCN2019109285-appb-100043
    R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;
    Figure PCTCN2019109285-appb-100044
    为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
    Figure PCTCN2019109285-appb-100045
    Figure PCTCN2019109285-appb-100046
    为第t-τ-D s帧第k个子带的空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
  10. 一种混响语音信号中后混响功率谱的自适应估计装置,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序;其中,所述处理器执行所述计算机程序时实现以下步骤:
    获取麦克风拾取的混响语音信号的子带自功率谱的估计;
    获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量;
    根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计。
  11. 根据权利要求10所述的混响语音信号中后混响功率谱的自适应估计装置,其中,当所述麦克风为单麦克风时,所述处理器执行所述计算机程序时实现以下步骤:
    根据公式:
    Figure PCTCN2019109285-appb-100047
    获取混响语音信号的子带自功率谱的估计;
    其中,
    Figure PCTCN2019109285-appb-100048
    为第t帧第k个子带的混响语音信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;
    Figure PCTCN2019109285-appb-100049
    为第t-1帧第k个子带的混响语音信号的子带自功率谱的估计;X(t,k)为第t帧第k个子带的混响语音信号的子带谱;t为信号帧的时间索引,k为子带索引。
  12. 根据权利要求11所述的混响语音信号中后混响功率谱的自适应估计 装置,其中,所述处理器执行所述计算机程序时实现以下步骤:
    根据公式:
    Figure PCTCN2019109285-appb-100050
    获取DLP预测系数矢量;
    其中,
    Figure PCTCN2019109285-appb-100051
    为第t+1帧子带k上的DLP预测系数矢量;
    Figure PCTCN2019109285-appb-100052
    为第t帧子带k上的DLP预测系数矢量,且
    Figure PCTCN2019109285-appb-100053
    Figure PCTCN2019109285-appb-100054
    为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
    Figure PCTCN2019109285-appb-100055
    Figure PCTCN2019109285-appb-100056
    Q为DLP的系数个数,且Q=R s-D s
    Figure PCTCN2019109285-appb-100057
    Figure PCTCN2019109285-appb-100058
    R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E k(t)为预测误差,且
    Figure PCTCN2019109285-appb-100059
    为第t帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
  13. 根据权利要求11所述的混响语音信号中后混响功率谱的自适应估计装置,其中,所述处理器执行所述计算机程序时实现以下步骤:
    根据公式:
    Figure PCTCN2019109285-appb-100060
    Figure PCTCN2019109285-appb-100061
    获取后混响子带自功率谱估计;
    其中,
    Figure PCTCN2019109285-appb-100062
    为后混响子带自功率谱估计;
    Figure PCTCN2019109285-appb-100063
    为第t帧子带k上的DLP预测系数矢量,且
    Figure PCTCN2019109285-appb-100064
    W τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R s-D s
    Figure PCTCN2019109285-appb-100065
    R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;
    Figure PCTCN2019109285-appb-100066
    为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
    Figure PCTCN2019109285-appb-100067
    Figure PCTCN2019109285-appb-100068
    为第t-τ-D s帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
  14. 根据权利要求10所述的混响语音信号中后混响功率谱的自适应估计装置,其中,当所述麦克风为麦克风阵列时,所述处理器执行所述计算机程序时实现以下步骤:
    获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
    根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计。
  15. 根据权利要求14所述的混响语音信号中后混响功率谱的自适应估计装置,其中,所述处理器执行所述计算机程序时实现以下步骤:
    根据公式:
    Figure PCTCN2019109285-appb-100069
    获取混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
    其中,Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;X r(t,k)为第t帧第k个子带的第r个麦克风输出信号的子带谱;M为麦克风阵列的总个数;
    Figure PCTCN2019109285-appb-100070
    t为信号帧的时间索引,k为子带索引。
  16. 根据权利要求14所述的混响语音信号中后混响功率谱的自适应估计装置,其中,所述处理器执行所述计算机程序时实现以下步骤:
    根据公式:
    Figure PCTCN2019109285-appb-100071
    获取经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
    其中,
    Figure PCTCN2019109285-appb-100072
    为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
    Figure PCTCN2019109285-appb-100073
    为第t-1帧第k个子带的经空间滤波处 理后的单声道输出信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;t为信号帧的时间索引,k为子带索引。
  17. 根据权利要求14所述的混响语音信号中后混响功率谱的自适应估计装置,其中,所述处理器执行所述计算机程序时实现以下步骤:
    根据公式:
    Figure PCTCN2019109285-appb-100074
    获取用于所述混响语音信号经空间滤波处理后的单声道输出信号中后混响子带自功率谱估计的DLP预测系数矢量;
    其中,
    Figure PCTCN2019109285-appb-100075
    为第t+1帧子带k上的DLP预测系数矢量;
    Figure PCTCN2019109285-appb-100076
    为第t帧子带k上的DLP预测系数矢量,且
    Figure PCTCN2019109285-appb-100077
    Figure PCTCN2019109285-appb-100078
    为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
    Figure PCTCN2019109285-appb-100079
    Figure PCTCN2019109285-appb-100080
    Q为DLP的系数个数,且Q=R s-D s
    Figure PCTCN2019109285-appb-100081
    Figure PCTCN2019109285-appb-100082
    R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E k(t)为预测误差,且
    Figure PCTCN2019109285-appb-100083
    为第t帧第k个子带的经空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
  18. 根据权利要求14所述的混响语音信号中后混响功率谱的自适应估计装置,其中,所述处理器执行所述计算机程序时实现以下步骤:
    根据公式:
    Figure PCTCN2019109285-appb-100084
    Figure PCTCN2019109285-appb-100085
    获取后混响子带自功率谱估计;
    其中,
    Figure PCTCN2019109285-appb-100086
    为后混响子带自功率谱估计;
    Figure PCTCN2019109285-appb-100087
    为第t帧子带k上的 DLP预测系数矢量,且
    Figure PCTCN2019109285-appb-100088
    W τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R s-D s
    Figure PCTCN2019109285-appb-100089
    R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;
    Figure PCTCN2019109285-appb-100090
    为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
    Figure PCTCN2019109285-appb-100091
    Figure PCTCN2019109285-appb-100092
    为第t-τ-D s帧第k个子带的空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
  19. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至9中任一项所述的混响语音信号中后混响功率谱的自适应估计方法。
  20. 一种混响语音信号中后混响功率谱的自适应估计装置,其中,包括:
    第一获取模块,用于获取麦克风拾取的混响语音信号的子带自功率谱的估计;
    第二获取模块,用于获取用于所述混响语音信号中后混响子带自功率谱估计的延时的线性预测DLP预测系数矢量;
    第三获取模块,用于根据所述混响语音信号的子带自功率谱的估计和DLP预测系数矢量,获取后混响子带自功率谱估计。
  21. 根据权利要求20所述的混响语音信号中后混响功率谱的自适应估计装置,其中,当所述麦克风为单麦克风时,所述第一获取模块,用于:
    根据公式:
    Figure PCTCN2019109285-appb-100093
    获取混响语音信号的子带自功率谱的估计;
    其中,
    Figure PCTCN2019109285-appb-100094
    为第t帧第k个子带的混响语音信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;
    Figure PCTCN2019109285-appb-100095
    为第t-1帧第k个子带的混响语音信号的子带自功率谱的估计;X(t,k)为第t帧第k个子带的混响语音 信号的子带谱;t为信号帧的时间索引,k为子带索引。
  22. 根据权利要求21所述的混响语音信号中后混响功率谱的自适应估计装置,其中,所述第二获取模块,用于:
    根据公式:
    Figure PCTCN2019109285-appb-100096
    获取DLP预测系数矢量;
    其中,
    Figure PCTCN2019109285-appb-100097
    为第t+1帧子带k上的DLP预测系数矢量;
    Figure PCTCN2019109285-appb-100098
    为第t帧子带k上的DLP预测系数矢量,且
    Figure PCTCN2019109285-appb-100099
    Figure PCTCN2019109285-appb-100100
    为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
    Figure PCTCN2019109285-appb-100101
    Figure PCTCN2019109285-appb-100102
    Q为DLP的系数个数,且Q=R s-D s
    Figure PCTCN2019109285-appb-100103
    Figure PCTCN2019109285-appb-100104
    R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E k(t)为预测误差,且
    Figure PCTCN2019109285-appb-100105
    为第t帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
  23. 根据权利要求21所述的混响语音信号中后混响功率谱的自适应估计装置,其中,所述第三获取模块,用于:
    根据公式:
    Figure PCTCN2019109285-appb-100106
    Figure PCTCN2019109285-appb-100107
    获取后混响子带自功率谱估计;
    其中,
    Figure PCTCN2019109285-appb-100108
    为后混响子带自功率谱估计;
    Figure PCTCN2019109285-appb-100109
    为第t帧子带k上的DLP预测系数矢量,且
    Figure PCTCN2019109285-appb-100110
    W τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R s-D s
    Figure PCTCN2019109285-appb-100111
    R为室内冲击响应的长 度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;
    Figure PCTCN2019109285-appb-100112
    为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
    Figure PCTCN2019109285-appb-100113
    Figure PCTCN2019109285-appb-100114
    为第t-τ-D s帧第k个子带的混响语音信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
  24. 根据权利要求20所述的混响语音信号中后混响功率谱的自适应估计装置,其中,当所述麦克风为麦克风阵列时,所述第一获取模块,包括:
    第一获取单元,用于获取麦克风阵列拾取的混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
    第二获取单元,用于根据所述单声道输出信号的子带谱,获取混响语音信号经空间滤波处理后的单声道输出信号的子带自功率谱的估计。
  25. 根据权利要求24所述的混响语音信号中后混响功率谱的自适应估计装置,其中,所述第一获取单元,用于:
    根据公式:
    Figure PCTCN2019109285-appb-100115
    获取混响语音信号经空间滤波处理后的单声道输出信号的子带谱;
    其中,Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;X r(t,k)为第t帧第k个子带的第r个麦克风输出信号的子带谱;M为麦克风阵列的总个数;
    Figure PCTCN2019109285-appb-100116
    t为信号帧的时间索引,k为子带索引。
  26. 根据权利要求24所述的混响语音信号中后混响功率谱的自适应估计装置,其中,所述第二获取单元,用于:
    根据公式:
    Figure PCTCN2019109285-appb-100117
    获取经空间滤波处理后的单声道输出信号的子带自功率谱的估计;
    其中,
    Figure PCTCN2019109285-appb-100118
    为第t帧第k个子带的经空间滤波处理后的单声道输出信 号的子带自功率谱的估计;
    Figure PCTCN2019109285-appb-100119
    为第t-1帧第k个子带的经空间滤波处理后的单声道输出信号的子带自功率谱的估计;λ为预设的平滑常数,且0<λ<1;Z(t,k)为第t帧第k个子带的经空间滤波处理后的单声道输出信号的子带谱;t为信号帧的时间索引,k为子带索引。
  27. 根据权利要求24所述的混响语音信号中后混响功率谱的自适应估计装置,其中,所述第二获取模块,用于:
    根据公式:
    Figure PCTCN2019109285-appb-100120
    获取用于所述混响语音信号经空间滤波处理后的单声道输出信号中后混响子带自功率谱估计的DLP预测系数矢量;
    其中,
    Figure PCTCN2019109285-appb-100121
    为第t+1帧子带k上的DLP预测系数矢量;
    Figure PCTCN2019109285-appb-100122
    为第t帧子带k上的DLP预测系数矢量,且
    Figure PCTCN2019109285-appb-100123
    Figure PCTCN2019109285-appb-100124
    为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
    Figure PCTCN2019109285-appb-100125
    Figure PCTCN2019109285-appb-100126
    Q为DLP的系数个数,且Q=R s-D s
    Figure PCTCN2019109285-appb-100127
    Figure PCTCN2019109285-appb-100128
    R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;μ和β为正常数,且0<μ(1+β)<2;E k(t)为预测误差,且
    Figure PCTCN2019109285-appb-100129
    为第t帧第k个子带的经空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
  28. 根据权利要求24所述的混响语音信号中后混响功率谱的自适应估计装置,其中,所述第三获取模块,用于:
    根据公式:
    Figure PCTCN2019109285-appb-100130
    Figure PCTCN2019109285-appb-100131
    获取后混响子带自功率谱估计;
    其中,
    Figure PCTCN2019109285-appb-100132
    为后混响子带自功率谱估计;
    Figure PCTCN2019109285-appb-100133
    为第t帧子带k上的DLP预测系数矢量,且
    Figure PCTCN2019109285-appb-100134
    W τ(t,k)为第t帧第k个子带的DLP第τ个预测系数,τ=0,1,2,…,Q-1,Q为DLP的系数个数,且Q=R s-D s
    Figure PCTCN2019109285-appb-100135
    R为室内冲击响应的长度,N为子带变换的语音信号帧的长度,D c为前混响和后混响区分的临界点;
    Figure PCTCN2019109285-appb-100136
    为第t-D s帧第k个子带的混响语音信号的子带自功率谱矢量,
    Figure PCTCN2019109285-appb-100137
    Figure PCTCN2019109285-appb-100138
    为第t-τ-D s帧第k个子带的空间滤波处理后单声道输出信号的子带自功率谱的估计;t为信号帧的时间索引,k为子带索引,T为矢量的转置运算符。
PCT/CN2019/109285 2018-10-18 2019-09-30 混响语音信号中后混响功率谱的自适应估计方法及装置 WO2020078210A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811216983.7 2018-10-18
CN201811216983.7A CN109243476B (zh) 2018-10-18 2018-10-18 混响语音信号中后混响功率谱的自适应估计方法及装置

Publications (1)

Publication Number Publication Date
WO2020078210A1 true WO2020078210A1 (zh) 2020-04-23

Family

ID=65052489

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/109285 WO2020078210A1 (zh) 2018-10-18 2019-09-30 混响语音信号中后混响功率谱的自适应估计方法及装置

Country Status (2)

Country Link
CN (1) CN109243476B (zh)
WO (1) WO2020078210A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109243476B (zh) * 2018-10-18 2021-09-03 电信科学技术研究院有限公司 混响语音信号中后混响功率谱的自适应估计方法及装置
CN111489760B (zh) * 2020-04-01 2023-05-16 腾讯科技(深圳)有限公司 语音信号去混响处理方法、装置、计算机设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440869A (zh) * 2013-09-03 2013-12-11 大连理工大学 一种音频混响的抑制装置及其抑制方法
CN104658543A (zh) * 2013-11-20 2015-05-27 大连佑嘉软件科技有限公司 一种室内混响消除的方法
US20160210976A1 (en) * 2013-07-23 2016-07-21 Arkamys Method for suppressing the late reverberation of an audio signal
CN108154885A (zh) * 2017-12-15 2018-06-12 重庆邮电大学 一种使用qr-rls算法对多通道语音信号去混响方法
CN108172231A (zh) * 2017-12-07 2018-06-15 中国科学院声学研究所 一种基于卡尔曼滤波的去混响方法及系统
CN109243476A (zh) * 2018-10-18 2019-01-18 电信科学技术研究院有限公司 混响语音信号中后混响功率谱的自适应估计方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1212608C (zh) * 2003-09-12 2005-07-27 中国科学院声学研究所 一种采用后置滤波器的多通道语音增强方法
JP4705893B2 (ja) * 2006-08-10 2011-06-22 Okiセミコンダクタ株式会社 エコーキャンセラ
CN101908341B (zh) * 2010-08-05 2012-05-23 浙江工业大学 一种基于g.729算法的语音编码优化方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160210976A1 (en) * 2013-07-23 2016-07-21 Arkamys Method for suppressing the late reverberation of an audio signal
CN103440869A (zh) * 2013-09-03 2013-12-11 大连理工大学 一种音频混响的抑制装置及其抑制方法
CN104658543A (zh) * 2013-11-20 2015-05-27 大连佑嘉软件科技有限公司 一种室内混响消除的方法
CN108172231A (zh) * 2017-12-07 2018-06-15 中国科学院声学研究所 一种基于卡尔曼滤波的去混响方法及系统
CN108154885A (zh) * 2017-12-15 2018-06-12 重庆邮电大学 一种使用qr-rls算法对多通道语音信号去混响方法
CN109243476A (zh) * 2018-10-18 2019-01-18 电信科学技术研究院有限公司 混响语音信号中后混响功率谱的自适应估计方法及装置

Also Published As

Publication number Publication date
CN109243476B (zh) 2021-09-03
CN109243476A (zh) 2019-01-18

Similar Documents

Publication Publication Date Title
CN108172231B (zh) 一种基于卡尔曼滤波的去混响方法及系统
CN109597022B (zh) 声源方位角运算、定位目标音频的方法、装置和设备
CN110100457B (zh) 基于噪声时变环境的加权预测误差的在线去混响算法
CN109979476B (zh) 一种语音去混响的方法及装置
CN110148420A (zh) 一种适用于噪声环境下的语音识别方法
Xiao et al. The NTU-ADSC systems for reverberation challenge 2014
CN108538306B (zh) 提高语音设备doa估计的方法及装置
JP6225245B2 (ja) 信号処理装置、方法及びプログラム
US9520138B2 (en) Adaptive modulation filtering for spectral feature enhancement
Wang et al. Mask weighted STFT ratios for relative transfer function estimation and its application to robust ASR
CN110660406A (zh) 近距离交谈场景下双麦克风移动电话的实时语音降噪方法
Nesta et al. A flexible spatial blind source extraction framework for robust speech recognition in noisy environments
WO2020078210A1 (zh) 混响语音信号中后混响功率谱的自适应估计方法及装置
CN111681665A (zh) 一种全向降噪方法、设备及存储介质
Martín-Doñas et al. Dual-channel DNN-based speech enhancement for smartphones
Nesta et al. Robust Automatic Speech Recognition through On-line Semi Blind Signal Extraction
Habets et al. Dereverberation
Dumortier et al. Blind RT60 estimation robust across room sizes and source distances
CN111312275A (zh) 一种基于子带分解的在线声源分离增强系统
US11902757B2 (en) Techniques for unified acoustic echo suppression using a recurrent neural network
Kinoshita et al. Multi-step linear prediction based speech dereverberation in noisy reverberant environment.
CN107393553B (zh) 用于语音活动检测的听觉特征提取方法
Firoozabadi et al. Combination of nested microphone array and subband processing for multiple simultaneous speaker localization
Nakatani et al. Simultaneous denoising, dereverberation, and source separation using a unified convolutional beamformer
Ji et al. Coherence-Based Dual-Channel Noise Reduction Algorithm in a Complex Noisy Environment.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19873280

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19873280

Country of ref document: EP

Kind code of ref document: A1