CN116758928A - MCLP language dereverberation method and system based on time-varying forgetting factor - Google Patents
MCLP language dereverberation method and system based on time-varying forgetting factor Download PDFInfo
- Publication number
- CN116758928A CN116758928A CN202310271405.8A CN202310271405A CN116758928A CN 116758928 A CN116758928 A CN 116758928A CN 202310271405 A CN202310271405 A CN 202310271405A CN 116758928 A CN116758928 A CN 116758928A
- Authority
- CN
- China
- Prior art keywords
- time
- signal
- mclp
- forgetting factor
- dereverberation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 239000011159 matrix material Substances 0.000 claims abstract description 25
- 230000003595 spectral effect Effects 0.000 claims abstract description 19
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 8
- 230000014509 gene expression Effects 0.000 claims abstract description 7
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 6
- 230000003044 adaptive effect Effects 0.000 claims description 5
- 239000000654 additive Substances 0.000 claims description 3
- 230000000996 additive effect Effects 0.000 claims description 3
- 210000005069 ears Anatomy 0.000 claims description 3
- 238000009432 framing Methods 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 2
- 230000008030 elimination Effects 0.000 abstract 1
- 238000003379 elimination reaction Methods 0.000 abstract 1
- 238000004422 calculation algorithm Methods 0.000 description 18
- 238000004590 computer program Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 4
- 230000006872 improvement Effects 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 101150096839 Fcmr gene Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0224—Processing in the time domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0264—Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Filters That Use Time-Delay Elements (AREA)
Abstract
The application discloses an MCLP language dereverberation method and system based on a time-varying forgetting factor in the technical field of voice acquisition, and the method comprises the following steps: collecting signals, and converting the signals into a time-frequency domain; introducing the acquired time-frequency domain signals into an MCLP model to obtain a prediction coefficient matrix of a linear prediction filter; calculating the power spectral density of the acquired signals to obtain a weighting coefficient of a reverberation prediction filter, introducing matrix QR decomposition and adding a time-varying forgetting factor on the basis of a least square method when solving the prediction coefficient, improving the dereverberation capacity and stability, and calculating the late reverberation according to the prediction coefficient matrix; and (3) repeating the step (3) for each frame of signal, substituting the late reverberation into the expression in the step (2) and performing inverse short-time Fourier transform on the result to obtain a desired signal, namely a dereverberated speech signal. The method has the advantages of rapid convergence speed, improved system stability, better voice reverberation elimination performance and better audio quality.
Description
Technical Field
The application belongs to the technical field of voice acquisition, and particularly relates to an MCLP language dereverberation method and system based on a time-varying forgetting factor.
Background
The voice recording is carried out in the closed space (such as a recording studio), the communication can be reflected by walls, floors and ceilings, the collected voice signals inevitably contain reverberation, the reverberation can reduce the recognition degree and quality of the voice signals, the hearing aid system is affected, the performance of the voice automatic recognition system is reduced, and inconvenience is brought to listeners. Therefore, research into the dereverberation algorithm has become very active in recent years.
The algorithm for speech dereverberation broadly comprises three classes: inverse filtering, spectral enhancement, and probabilistic model-based methods. Inverse filtering is a frequently studied dereverberation scheme that relies on room acoustic properties with certain limitations in applicability. In addition, microphone array techniques are also well known in multichannel speech dereverberation, which can suppress reverberation to some extent by spatially distinguishing sounds in different directions. The principle of multi-channel linear prediction is to design a linear predictor to estimate the reverberant part of speech, and subtracting the estimated part from the reverberant speech can estimate the desired speech signal. Among other things, jukic et al propose a MCLP and WPE based speech dereverberation algorithm with sparse priors that includes two multi-channel schemes: WPE with Complex Generalized Gaussian (CGG) a priori and WPE using iterative re-weighted least squares (IRLS) methods. Both schemes have significant performance in reverberant environments.
However, the RLS algorithm has the problems of potential instability caused by an increased condition number in the matrix inverse transformation process, slow convergence when the system suddenly changes due to the use of a constant forgetting factor, and the like. The former can be solved with QR decomposition, while the latter is usually solved with adaptive forgetting factors. However, the VFF control design of current algorithms relies primarily on estimation errors.
Disclosure of Invention
Aiming at the defects of the prior art, the application aims to provide an MCLP language dereverberation method and system based on a time-varying forgetting factor so as to solve the problems in the prior art.
The aim of the application can be achieved by the following technical scheme:
an MCLP language dereverberation method based on a time-varying forgetting factor, comprising the steps of:
step 1, firstly, carrying out signal acquisition by using a microphone array in a single sound source environment, and converting the signals into a time-frequency domain through short-time Fourier transform;
step 2, introducing the time-frequency domain signals acquired in the step 1 into an MCLP model to obtain expressions of a prediction coefficient matrix of a linear prediction filter related to the expected signals, the late reverberation and the original acquired signals;
step 3, calculating the power spectral density of the acquired signal to obtain a weighting coefficient of a reverberation prediction filter, introducing matrix QR decomposition and adding a time-varying forgetting factor on the basis of a least square method when solving the prediction coefficient, improving the dereverberation capacity and stability, and calculating the late reverberation according to the prediction coefficient matrix;
and 4, repeating the step 3 for each frame of signal, substituting the late reverberation into the expression of the step 2, and performing the inverse short-time Fourier transform on the result to obtain a desired signal, namely a dereverberated voice signal.
Preferably, the signals captured by the microphone capturing array in the step 1 are as follows:
y(n)=x(n)+v(n)
where y (n) represents the signal captured by the microphone, x (n) represents the speech signal, and v (n) is additive noise;
the signals captured by the mth microphone are expressed as follows, using a short-time fourier transform of the time signals:
x m (k,n)=d m (k,n)+u m (k,n)
wherein x is m (k, n) represents the signal of the mth microphone at the time frame of n and the frequency of k, d m (k, n) signals representing early reflections of speech and direct sound, which need to be preserved, as desired signals, u m (k, n) represents late reverberation.
Preferably, the required estimated signal obtained by the MCLP model in step 2 is as follows:
wherein:
where "∈" represents an estimated value, H represents a complex conjugate,is a matrix of prediction coefficients, τ is the prediction delay, the direct speech part and the earliest reflected part remain as the desired speech components, and the direct speech signal is obtained by subtracting u (n) from the mix.
Preferably, in the step 3, the prediction filter is obtained by maximizing the sparsity of the desired speech signal in the time-frequency domain, as follows:
wherein w (n) is used to represent a weighting coefficient, and gamma is between (0, 1) and is expressed as a forgetting factor;
the weighting coefficients are expressed as follows:
where ε is an infinitesimal number, w (n) is guaranteed to be a non-negative number, p represents the shape parameter, and d (n) has the following power spectral density:
wherein:
in the method, in the process of the application,representing the power spectral density of the signal, alpha representing the attenuation coefficient, td representing the duration of the earliest reflected speech portion, T 60 Represents the reverberation time, n τ Representing the corresponding delay per frame, β is a smoothing factor.
Preferably, w (n) represents as follows:
the estimate of late reverberation is as follows:
recursive solution using least squaresExpressed as:
preferably, the time-varying forgetting factor control is based on an approximate derivative of the filter coefficients as follows:
wherein w is i (n) denotes the tap of the i-th filter,is its approximate inverse in time, η is the calculated smoothed tap weight +.>Forgetting factor of|·|| 1 L representing a vector 1 Norms, by->Mapping the convergence state of the adaptive filter to the expected variance of the time-varying forgetting factor gamma (n), calculating +.>The absolute value of the approximate derivative of (2) is G c (n):
And calculates the average thereof over a time window of time length TGet->Average value reuse->To indicate, will->And->Normalizing to obtain->With gamma L And gamma H To represent the upper and lower bounds, and γ (n) at each iteration update is as follows:
an MCLP language dereverberation system based on a time-varying forgetting factor, comprising:
the voice acquisition module is used for simulating human ears through a microphone and carrying out framing processing on the received signals;
the late reverberation estimation module is used for calculating to obtain a prediction coefficient matrix;
and the expected signal calculation module is used for calculating an expected signal.
Preferably, the late reverberation estimation module includes a power spectral density calculation module and a filter coefficient prediction module.
The application has the beneficial effects that:
the MCLP language dereverberation method based on the time-varying forgetting factor uses the least square method to calculate the predictive linear matrix, has high convergence rate, adds matrix QR decomposition on the basis of the least square method in the voice signal processing process, improves the stability of the system, uses the time-varying forgetting factor, ensures that the time-varying forgetting factor has better voice dereverberation performance, and the processed voice signal has higher PESQ score and better audio quality.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort.
FIG. 1 is a schematic diagram of an MCLP dereverberation system based on a time-varying forgetting factor in an embodiment of the present application;
FIG. 2 is a graph of the effect of algorithmic dereverberation in an embodiment of the present application;
FIG. 3 is a ΔMFCC distance improvement graph for three algorithms in an embodiment of the present application;
FIG. 4 is a graph of the average peSQ score of speech signals before and after dereverberation by different methods in an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The embodiment provides an MCLP language dereverberation method based on a time-varying forgetting factor, which comprises the following steps:
step 1, carrying out signal acquisition by using a microphone array in a single sound source environment, and converting the signals into a time-frequency domain through short-time Fourier transform (STFT);
the voice signal generated by this sound source is captured by M microphones, and the signals captured by the microphones are expressed as follows:
y(n)=x(n)+v(n)(1)
where y (n) represents the signal captured by the microphone, x (n) represents the speech signal, and v (n) is additive noise. Let v (n) =0.
The signals captured by the mth microphone are expressed as follows, using a short-time fourier transform of the time signals:
x m (k,n)=d m (k,n)+u m (k,n) (2)
wherein x is m (k, n) represents the signal of the mth microphone at the time frame of n and the frequency of k, d m (k, n) signals representing the early reflections of speech and the direct sound are signals that need to be preserved, called desired signals. u (u) m (k, n) represents late reverberation.
Step 2, introducing the signals acquired in the step 1 into an MCLP model to obtain expressions of a prediction coefficient matrix of a linear prediction filter related to the expected signals, the early reflection signals and the original acquired signals;
in the MCLP model, u (k, n) represents the following formula:
wherein L is g Representing the length of the MCLP filter, τ is taken as the prediction delay in the time domain, gm is the prediction coefficient of the linear prediction filter, k is omitted after each frequency point is calculated, and the formula is as follows:
x(n)=d(n)+u(n) (4)
the estimation signal required by the formula (3) and the formula (4) is:
wherein:
where "∈" represents an estimated value and H represents a complex conjugate.Is a matrix of prediction coefficients. i is the prediction delay and the direct speech part and the earliest reflected part remain as the desired speech components. The direct speech signal may be obtained by subtracting u (n) from the mix.
Step 3, calculating the power spectral density of the acquired signal to obtain a weighting coefficient of a reverberation prediction filter, introducing matrix QR decomposition and adding a time-varying forgetting factor on the basis of a least square method when solving the prediction coefficient, improving the dereverberation capacity and stability, and calculating the late reverberation according to the prediction coefficient matrix;
using the dereverberation model represented by equations (5) (6), a prediction coefficient matrix needs to be solvedObtaining a prediction filter by maximizing sparsity of a desired speech signal in a time-frequency domain, as follows
Where w (n) is used to represent the weighting factor; the value of gamma is between (0, 1), which is expressed as forgetting factor; the weighting coefficients can in turn be expressed as follows:
where ε is an infinitesimal number, for ensuring that w (n) is a non-negative number and p represents the shape parameter, and assuming that the late reverberation obeys the exponential distribution, the power spectral density of d (n) can be expressed as follows:
wherein:
in the method, in the process of the application,represents the power spectral density of the signal, alpha represents the attenuation coefficient, T d For representing the duration of the earliest reflected speech part, T 60 Represents the reverberation time, n τ Representing the corresponding delay per frame, β is a smoothing factor.
Due toSo w (n) can be represented as follows:
from the above resultsCan be measured by taking in (8)Calculate->The estimate of late reverberation can be expressed as follows:
recursive solution using least squaresCan be expressed as:
the matrix QR decomposition combined with least squares (QR-RLS) algorithm matrix is shown in table 1:
TABLE 1
The proposed time-varying forgetting factor control scheme is based on the approximate derivative of the filter coefficients as follows:
wherein w is i (n) denotes the tap of the i-th filter,is its approximate inverse in time. Eta is the calculated smoothed tap weight +.>Forgetting factor of (c). I.I 1 L representing a vector 1 Norms. When the algorithm converges to its steady state,gradually decreasing from its initial value to a very small value, but being quite unstable in tracking the impulse response of the time-varying channel. Therefore, go through +.>The convergence state of the adaptive filter is mapped to the desired variance of the time-varying forgetting factor y (n). Calculate->The absolute value of the approximate derivative of (2) is G c (n):
And calculates the average thereof over a time window of time length TGet->Average value reuse->To represent. Will->And->Normalized, we obtained +.>This is a more stable measure of convergence of the adaptive filter, using gamma L And gamma H To represent the upper and lower bounds, and γ (n) at each iteration update is as follows:
substituting formula (32) for beta in formula (8) results in a MCLP dereverberation algorithm based on a time-varying forgetting factor.
Wherein, the system parameters are shown in table 2:
TABLE 2
Parameter name | (symbol) | Value of |
Sampling rate | f s | 16kHz |
Window length | wlen | 512(32ms) |
Frame shifting | wlen/4 | 128(8ms) |
Filter order | Lg | 30 |
And 4, repeating the step 3 for each frame of signal, substituting the late reverberation into the expression of the step 2, and performing the inverse short-time Fourier transform on the result to obtain a desired signal, namely a dereverberated voice signal.
An MCLP language dereverberation system based on a time-varying forgetting factor, the structure of which is shown in fig. 1, comprises:
1. the voice acquisition module: the human ear is simulated by two microphones with a spacing of 15cm, and the received signals are subjected to framing processing, wherein the signals can be expressed as x (n) =d (n) +u (n), x (n) represents signals captured by the microphones, d (n) represents voice signals, and u (n) represents post reverberation. Introducing MCLP model to obtain reverberation signal asThe desired signal thus required can be expressed asWherein->Is a prediction coefficient matrix;
2. and a late reverberation estimation module: the module comprises a power spectral density calculation module and a filterAnd a coefficient prediction module. The power spectral density calculation module calculates the power spectral densities of d (n), x (n) and u (n) The filter coefficient prediction module obtains->Substituted into-> Obtaining a prediction coefficient matrix->
3. The expected signal calculation module: calculating reverberation signalSubtracting the reverberation signal u (n) from the signal x (n) captured by the microphone to obtain a desired signal +.>
And (3) performing dereverberation effect comparison:
the reverberations signal was dereverberated using RLS algorithm and VFF-QR-RLS algorithm, with a forgetting factor γ of 0.99, and the result is shown in fig. 2.
It can be found that the signals processed by RLS and VFF-QR-RLS remove partial reverberation below 4kHz compared with the original signals, and the signals processed by VFF-QR-RLS have more obvious dereverberation effect compared with the signals processed by RLS.
The performance of the algorithm was evaluated with Mel frequency cepstral coefficient distance improvement (Δmfcc), which is a good spectral envelope parameter. Because the MFCC is at a certain levelThe auditory characteristics of human ears are simulated to a certain extent, so that the distortion measure based on the Mel cepstrum coefficient can accurately represent the distortion of the reverberation voice. Using pure voice as reference signal, calculating MFCC distortion distance between reference signal and reverberant signal and dereverberated signal, and recording as MFCC in And MFCC out . The difference then yields a Mel-frequency cepstrum distance improvement (Δmfcc), which, when larger, indicates a better dereverberation effect.
The forgetting factor gamma of the RLS and QR-RLS algorithm is 0.96, and the forgetting factor gamma of the VFF-QR-RLS is L =0.96,γ H =0.99. The simulation results are shown in fig. 3.
As can be seen from fig. 3 (upper) and fig. 3 (middle), QR-RLS has the same effect as RLS, and the condition number is reduced by the QR decomposition method, and also can exhibit better numerical properties. Comparing the circled portions of the graph, the VFF-QR-RLS algorithm can be found to be faster and more stable, and has better numerical stability.
To further evaluate the performance and dereverberation effect of the algorithm, the dereverberation speech in the experiment was also evaluated using a speech quality perception evaluation, the final score being the average of the experimental results of 10 sets of different simulated reverberation samples, the dereverberation signal scores of the different algorithms being shown in fig. 4 (reverberation time T 60 =300 ms,600ms,900 ms), it can be seen from the data in the figure that the score of the VFF-QR-RLS algorithm is highest among the different degrees of reverberation, which also verifies the effectiveness of the algorithm.
In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present application and not for limiting the same, and although the present application has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the application without departing from the spirit and scope of the application, which is intended to be covered by the claims.
Claims (10)
1. An MCLP language dereverberation method based on a time-varying forgetting factor, comprising the steps of:
step 1, firstly, carrying out signal acquisition by using a microphone array in a single sound source environment, and converting the signals into a time-frequency domain through short-time Fourier transform;
step 2, introducing the time-frequency domain signals acquired in the step 1 into an MCLP model to obtain expressions of a prediction coefficient matrix of a linear prediction filter related to the expected signals, the late reverberation and the original acquired signals;
step 3, calculating the power spectral density of the acquired signal to obtain a weighting coefficient of a reverberation prediction filter, introducing matrix QR decomposition and adding a time-varying forgetting factor on the basis of a least square method when solving the prediction coefficient, improving the dereverberation capacity and stability, and calculating the late reverberation according to the prediction coefficient matrix;
and 4, repeating the step 3 for each frame of signal, substituting the late reverberation into the expression of the step 2, and performing the inverse short-time Fourier transform on the result to obtain a desired signal, namely a dereverberated voice signal.
2. The MCLP language dereverberation method based on a time-varying forgetting factor of claim 1, wherein the signals captured by the microphone array in step 1 are as follows:
y(n)=x(n)+v(n)
where y (n) represents the signal captured by the microphone, x (n) represents the speech signal, and v (n) is additive noise;
the signals captured by the mth microphone are expressed as follows, using a short-time fourier transform of the time signals:
x m (k,n)=d m (k,n)+u m (k,n)
wherein x is m (k, m) represents the signal of the mth microphone at the time frame of n and the frequency of k, d m (k, n) signals representing early reflections of speech and direct sound, which need to be preserved, as desired signals, u m (k, n) represents late reverberation.
3. The MCLP language dereverberation method based on the time-varying forgetting factor of claim 2, wherein the required estimated signal obtained by the MCLP model in step 2 is as follows:
wherein:
where "∈" represents an estimated value, H represents a complex conjugate,is a matrix of prediction coefficients, τ is the prediction delay, the direct speech part and the earliest reflected part remain as the desired speech components, and the direct speech signal is obtained by subtracting u (n) from the mix.
4. A MCLP language dereverberation method based on a time-varying forgetting factor as claimed in claim 3, wherein the prediction filter is obtained in step 3 by maximizing the sparsity of the desired speech signal in the time-frequency domain by:
wherein w (n) is used to represent a weighting coefficient, and gamma is between (0, 1) and is expressed as a forgetting factor;
the weighting coefficients are expressed as follows:
where ε is an infinitesimal number, w (n) is guaranteed to be a non-negative number, p represents the shape parameter, and d (n) has the following power spectral density:
wherein:
in the method, in the process of the application,represents the power spectral density of the signal, alpha represents the attenuation coefficient, T d For representing the duration of the earliest reflected speech part, T 60 Represents the reverberation time, n τ Representing the corresponding delay per frame, β is a smoothing factor.
5. A time-varying forgetting factor based MCLP language dereverberation method as in claim 4, wherein w (n) is expressed as follows:
the estimate of late reverberation is as follows:
recursive solution using least squaresExpressed as:
6. a MCLP language dereverberation method based on a time-varying forgetting factor as claimed in claim 5, wherein the time-varying forgetting factor control is based on an approximate derivative of a filter coefficient as follows:
wherein w is i (n) denotes the tap of the i-th filter,is its approximate inverse in time, η is the calculated smoothed tap weight +.>Forgetting factor of|·|| 1 L representing a vector 1 Norms, by->Mapping the convergence state of the adaptive filter to the expected variance of the time-varying forgetting factor gamma (n), calculating +.>The absolute value of the approximate derivative of (2) is G c (n):
And calculates the average thereof over a time window of time length TGet->Average reuse of (2)To indicate, will->And->Normalizing to obtain->With gamma L And gamma H To represent the upper and lower bounds, and γ (n) at each iteration update is as follows:
7. an MCLP language dereverberation system based on a time-varying forgetting factor, comprising:
the voice acquisition module is used for simulating human ears through a microphone and carrying out framing processing on the received signals;
the late reverberation estimation module is used for calculating to obtain a prediction coefficient matrix;
and the expected signal calculation module is used for calculating an expected signal.
8. An MCLP speech dereverberation system based on a time-varying forgetting factor as in claim 7, wherein the late reverberation estimation module comprises a power spectral density calculation module and a filter coefficient prediction module.
9. A time-varying forgetting factor based MCLP language dereverberation system as in claim 8The system is characterized in that the signal is expressed as x (n) =d (n) +u (n), and the reverberant signal is obtained by introducing an MCLP modelThe desired signal is represented asIs a prediction coefficient matrix;
the power spectral density calculation module calculates the power spectral densities of d (n), x (n) and u (n) The filter coefficient prediction module obtains->Substituted into-> Obtaining a prediction coefficient matrix->
The expected signal calculation module calculates a reverberation signalSubtracting the reverberation signal u (n) from the signal x (n) captured by the microphone to obtain a desired signal +.>
10. A time-varying forgetting factor-based MCLP language dereverberation controller storing a time-varying forgetting factor-based MCLP language dereverberation system program to run the time-varying forgetting factor-based MCLP language dereverberation system of claims 7-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310271405.8A CN116758928A (en) | 2023-03-20 | 2023-03-20 | MCLP language dereverberation method and system based on time-varying forgetting factor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310271405.8A CN116758928A (en) | 2023-03-20 | 2023-03-20 | MCLP language dereverberation method and system based on time-varying forgetting factor |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116758928A true CN116758928A (en) | 2023-09-15 |
Family
ID=87950219
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310271405.8A Pending CN116758928A (en) | 2023-03-20 | 2023-03-20 | MCLP language dereverberation method and system based on time-varying forgetting factor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116758928A (en) |
-
2023
- 2023-03-20 CN CN202310271405.8A patent/CN116758928A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4107613B2 (en) | Low cost filter coefficient determination method in dereverberation. | |
CN107393550B (en) | Voice processing method and device | |
EP1993320B1 (en) | Reverberation removal device, reverberation removal method, reverberation removal program, and recording medium | |
JP5666444B2 (en) | Apparatus and method for processing an audio signal for speech enhancement using feature extraction | |
JP6169849B2 (en) | Sound processor | |
JPH1115491A (en) | Environmentally compensated method of processing speech | |
Tsao et al. | Generalized maximum a posteriori spectral amplitude estimation for speech enhancement | |
CN108172231A (en) | A kind of dereverberation method and system based on Kalman filtering | |
US9520138B2 (en) | Adaptive modulation filtering for spectral feature enhancement | |
CN111312269B (en) | Rapid echo cancellation method in intelligent loudspeaker box | |
US11483651B2 (en) | Processing audio signals | |
CN112687276B (en) | Audio signal processing method and device and storage medium | |
JP6748304B2 (en) | Signal processing device using neural network, signal processing method using neural network, and signal processing program | |
CN110111802B (en) | Kalman filtering-based adaptive dereverberation method | |
Garg et al. | A comparative study of noise reduction techniques for automatic speech recognition systems | |
Mack et al. | Single-Channel Dereverberation Using Direct MMSE Optimization and Bidirectional LSTM Networks. | |
JP2023536104A (en) | Noise reduction using machine learning | |
CN115424627A (en) | Voice enhancement hybrid processing method based on convolution cycle network and WPE algorithm | |
CN110475181B (en) | Equipment configuration method, device, equipment and storage medium | |
CN110111804B (en) | Self-adaptive dereverberation method based on RLS algorithm | |
CN113571076A (en) | Signal processing method, signal processing device, electronic equipment and storage medium | |
Sadjadi et al. | A comparison of front-end compensation strategies for robust LVCSR under room reverberation and increased vocal effort | |
JP4098647B2 (en) | Acoustic signal dereverberation method and apparatus, acoustic signal dereverberation program, and recording medium recording the program | |
CN113160842B (en) | MCLP-based voice dereverberation method and system | |
CN116758928A (en) | MCLP language dereverberation method and system based on time-varying forgetting factor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |