CN113362842A - Audio signal processing method and device - Google Patents
Audio signal processing method and device Download PDFInfo
- Publication number
- CN113362842A CN113362842A CN202110739121.8A CN202110739121A CN113362842A CN 113362842 A CN113362842 A CN 113362842A CN 202110739121 A CN202110739121 A CN 202110739121A CN 113362842 A CN113362842 A CN 113362842A
- Authority
- CN
- China
- Prior art keywords
- echo path
- path vector
- echo
- vector
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 82
- 238000003672 processing method Methods 0.000 title claims abstract description 22
- 239000013598 vector Substances 0.000 claims abstract description 184
- 238000000034 method Methods 0.000 claims abstract description 102
- 238000001914 filtration Methods 0.000 claims abstract description 101
- 230000008569 process Effects 0.000 claims abstract description 68
- 230000008859 change Effects 0.000 claims abstract description 41
- 238000012545 processing Methods 0.000 claims abstract description 35
- 230000004044 response Effects 0.000 claims abstract description 25
- 238000004891 communication Methods 0.000 abstract description 27
- 238000001514 detection method Methods 0.000 abstract description 13
- 230000000694 effects Effects 0.000 abstract description 12
- 238000005516 engineering process Methods 0.000 abstract description 7
- 230000001052 transient effect Effects 0.000 description 10
- 230000003044 adaptive effect Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 5
- 230000001629 suppression Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010220 Pearson correlation analysis Methods 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000002592 echocardiography Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/26—Pre-filtering or post-filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
- Telephone Function (AREA)
Abstract
The present disclosure relates to the field of voice communication technologies, and in particular, to an audio signal processing method and apparatus. An audio signal processing method comprising: performing first filtering processing on the basis of a reference signal and a first audio signal picked up by a microphone to obtain a first echo path vector; wherein the first audio signal comprises an echo signal resulting from the playing of the reference signal by a speaker; performing second filtering processing on the basis of the reference signal and the first audio signal to obtain a second echo path vector; a filter update rate of the first filter process is different from a filter update rate of the second filter process; determining that a change in echo path is detected in response to the correlation of the first echo path vector and the second echo path vector not being greater than a preset threshold. The method disclosed by the embodiment of the invention can effectively detect the change of the echo path, has stronger detection universality and robustness and improves the echo cancellation effect.
Description
Technical Field
The present disclosure relates to the field of voice communication technologies, and in particular, to an audio signal processing method and apparatus.
Background
For the voice communication field, after the near-end speaker plays the sound transmitted from the far-end, the near-end microphone picks up the sound again and transmits the sound to the far-end, so as to generate acoustic echo. Acoustic echo can severely impact voice call quality, and echo cancellation is a necessary process for voice communications.
In the related art, an adaptive filter is often used to estimate an echo path for echo cancellation, but for a complex Double talk (Double talk) acoustic scene such as multi-person online voice, the echo path frequently changes, and the echo cancellation effect is poor.
Disclosure of Invention
In order to improve an echo cancellation effect of a voice communication system, embodiments of the present disclosure provide an audio signal processing method and apparatus, an electronic device, and a storage medium.
In a first aspect, the disclosed embodiments provide an audio signal processing method, including:
performing first filtering processing on the basis of a reference signal and a first audio signal picked up by a microphone to obtain a first echo path vector; wherein the first audio signal comprises an echo signal resulting from the playing of the reference signal by a speaker;
performing second filtering processing on the basis of the reference signal and the first audio signal to obtain a second echo path vector; a filter update rate of the first filter process is different from a filter update rate of the second filter process;
determining that a change in echo path is detected in response to the correlation of the first echo path vector and the second echo path vector not being greater than a preset threshold.
In some embodiments, the first filtering process is a kalman filtering and the second filtering process is an NLMS filtering.
In some embodiments, the kalman filtering is a time domain kalman filtering.
In some embodiments, the performing a first filtering process based on the reference signal and a first audio signal picked up by a microphone to obtain a first echo path vector includes:
determining a first residual signal at the current moment according to the reference signal and the echo path vector at the previous moment;
and updating the echo path vector at the previous moment according to the first residual error signal and the Kalman gain vector at the current moment to obtain the first echo path vector at the current moment.
In some embodiments, the performing a second filtering process based on the reference signal and the first audio signal to obtain a second echo path vector includes:
determining an error signal at the current moment according to the reference signal and the echo path vector at the previous moment;
obtaining a second residual signal at the current moment according to the first audio signal and the error signal;
and updating the echo path vector at the previous moment according to the second residual signal at the current moment and a preset self-adaptive step size parameter to obtain the second echo path vector at the current moment.
In some embodiments, the determining that the echo path is detected to be changed in response to the correlation between the first echo path vector and the second echo path vector not being less than a preset threshold includes:
determining a correlation coefficient of the first echo path vector and the second echo path vector according to the first echo path vector and the second echo path vector;
and determining that the echo path is detected to be changed in response to the correlation coefficient not being greater than a preset correlation threshold.
In some embodiments, after the determining detects that the echo path has changed, the method further comprises:
initializing parameters of the first filtering process and the second filtering process.
In a second aspect, the present disclosure provides an audio signal processing apparatus, including:
a first filtering module configured to perform a first filtering process based on a reference signal and a first audio signal picked up by a microphone, so as to obtain a first echo path vector; wherein the first audio signal comprises an echo signal resulting from the playing of the reference signal by a speaker;
a second filtering module configured to perform second filtering processing based on the reference signal and the first audio signal to obtain a second echo path vector; a filter update rate of the first filter process is different from a filter update rate of the second filter process;
an echo path determination module configured to determine that an echo path change is detected in response to a correlation of the first echo path vector and the second echo path vector not being greater than a preset threshold.
In some embodiments, the first filtering process is a kalman filtering and the second filtering process is an NLMS filtering.
In some embodiments, the kalman filtering is a time domain kalman filtering.
In some embodiments, the first filtering module is specifically configured to:
determining a first residual signal at the current moment according to the reference signal and the echo path vector at the previous moment;
and updating the echo path vector at the previous moment according to the first residual error signal and the Kalman gain vector at the current moment to obtain the first echo path vector at the current moment.
In some embodiments, the second filtering module is specifically configured to:
determining an error signal at the current moment according to the reference signal and the echo path vector at the previous moment;
obtaining a second residual signal at the current moment according to the first audio signal and the error signal;
and updating the echo path vector at the previous moment according to the second residual signal at the current moment and a preset self-adaptive step size parameter to obtain the second echo path vector at the current moment.
In some embodiments, the echo path determination module is specifically configured to:
determining a correlation coefficient of the first echo path vector and the second echo path vector according to the first echo path vector and the second echo path vector;
and determining that the echo path is detected to be changed in response to the correlation coefficient not being greater than a preset correlation threshold.
In some embodiments, the audio signal processing apparatus of the present disclosure further includes:
an initialization module configured to initialize parameters of the first filtering process and the second filtering process.
In a third aspect, the disclosed embodiments provide an electronic device, including:
a microphone and a speaker;
a processor; and
a memory storing computer instructions for causing a processor to perform the method according to any of the embodiments of the first aspect.
In a fourth aspect, the embodiments of the present disclosure provide a storage medium storing computer instructions for causing a computer to execute the method according to any one of the embodiments of the first aspect.
The audio signal processing method of the embodiment of the disclosure includes performing first filtering processing on a reference signal and a first audio signal picked up by a microphone to obtain a first echo path vector, performing second filtering processing on the reference signal and the first audio signal picked up by the microphone to obtain a second echo path vector, and determining that a change of an echo path is detected in response to that a correlation between the first echo path vector and the second echo path vector is not greater than a preset threshold. The method of the embodiment of the disclosure adopts two filtering processes with different update rates, determines whether the echo path changes based on the correlation of two echo roadbed vectors, effectively detects the change of the echo path, has stronger detection universality and robustness, and improves the echo cancellation effect.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flow chart of an audio signal processing method in some embodiments according to the present disclosure.
Figure 2 is a schematic diagram of echo impulse responses before and after an echo path change in some embodiments according to the present disclosure.
Fig. 3 is a block diagram of a voice communication system in accordance with some embodiments of the present disclosure.
Fig. 4 is a flow chart of an audio signal processing method in some embodiments according to the present disclosure.
Fig. 5 is a flow chart of an audio signal processing method in some embodiments according to the present disclosure.
Fig. 6 is a flow chart of an audio signal processing method in some embodiments according to the present disclosure.
Fig. 7 is a block diagram of an audio signal processing apparatus according to some embodiments of the present disclosure.
FIG. 8 is a block diagram of an electronic device suitable for implementing the method of the present disclosure.
Detailed Description
The technical solutions of the present disclosure will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be derived by one of ordinary skill in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure. In addition, technical features involved in different embodiments of the present disclosure described below may be combined with each other as long as they do not conflict with each other.
In a voice communication system, due to the coupling of the speaker and the microphone, a far-end signal played by the speaker is received by the microphone and transmitted to the far-end again, forming an acoustic echo. Acoustic echo can seriously affect the quality of voice communication and simultaneously reduce voice awakening and voice recognition of man-machine interaction, so that echo cancellation needs to be carried out on a voice communication system in order to improve the quality of voice communication.
In the field of echo cancellation, adaptive filtering techniques based on variable step size control are generally used to estimate the echo path for echo cancellation, such as NLMS (Normalized Least Mean square) filters. However, in a Double Talk (Double Talk) scene with a complex acoustic environment, such as a multiplayer online game, the echo path may change frequently, and if the echo path estimation is performed based on the current filter update rate, the filter may diverge, the echo signal may not be estimated accurately, and a large amount of residual echo may be generated.
In the related art, when a scene of echo path change is dealt with, one method is to set an adaptive filter with a smaller update rate, so that when the echo path changes, a poor steady-state residual echo can be obtained. Another type of approach is to set a Double Talk Detector (DTD) to stop the filter update or reduce the update rate of the filter when a Double Talk scene is detected. However, both methods cannot solve the problem fundamentally, and since the filter update rate is slow, the echo signal cannot be estimated quickly at the initial time of the echo path change, resulting in more residual echoes.
Based on the defects in the related art, the embodiments of the present disclosure provide an audio signal processing method, an audio signal processing apparatus, an electronic device, and a storage medium, which aim to accurately detect echo path changes in a complex acoustic scene, thereby improving an echo cancellation effect.
In a first aspect, the embodiments of the present disclosure provide an audio signal processing method, which may be applied to an electronic device with a voice communication system, such as a mobile phone, a tablet computer, a notebook computer, and the like, and the disclosure is not limited thereto.
As shown in fig. 1, in some embodiments, an audio signal processing method of an example of the present disclosure includes:
s110, performing first filtering processing based on the reference signal and a first audio signal picked up by the microphone to obtain a first echo path vector.
And S120, performing second filtering processing based on the reference signal and the first audio signal to obtain a second echo path vector.
Specifically, the voice communication system of the embodiment of the present disclosure includes a speaker and a microphone, and the sound played by the speaker is picked up by the microphone and transmitted to the far end along with the near-end voice, so as to form an acoustic echo.
The reference signal refers to a far-end voice signal received by the system, and taking a mobile phone conversation scene as an example, an audio signal generated by speaking of a far-end speaker is received by the near-end system, namely the reference signal. After the reference signal is played by the loudspeaker, the echo signal reaches the microphone after propagating through an echo path between the loudspeaker and the microphone, so that the microphone picks up the echo signal when the reference signal reaches the microphone.
Meanwhile, for a double-talk scene, the microphone also collects a near-end voice signal generated when a near-end speaker speaks and a near-end background noise signal. That is, the first audio signal picked up by the microphone includes: a near-end speech signal, a background noise signal, and an echo signal.
In the embodiment of the present disclosure, the voice communication system is provided with two filters, that is, a first filter and a second filter, and the update rates of the first filter and the second filter are different. For example, the update rate of the first filter is greater than the update rate of the second filter, and for example, the update rate of the second filter is greater than the update rate of the first filter, which is not limited by the present disclosure.
The first filter performs first filtering processing on the first audio signal, and the second filter performs second filtering processing on the first audio signal. Because the filtering update rates of the two filtering processes are different, the filter with the higher update rate can quickly track the sudden change of the echo path at the initial moment when the echo path changes, and a transient echo path vector, namely a first echo path, is estimated; and the filter with the slower updating speed tracks the change of the echo path relatively slowly, and a steady-state echo path vector, namely a second echo path vector, is obtained through estimation.
In some embodiments, the first filtering process may be iteratively updated based on the reference signal and the first audio signal picked up by the microphone to obtain the first echo path vector. The second filtering process may be iteratively updated based on the reference signal and the first audio signal picked up by the microphone to obtain a second echo path vector.
In some embodiments, the first filtering process may be a kalman filter and the second filtering process may be an NLMS filter.
It will be appreciated that the echo path vector represents the echo path between the microphone to the loudspeaker. In the embodiment of the present disclosure, since the update rates of the two filtering processes are different, for example, in one example, the filtering update rate of the first filtering process is greater than the filtering update rate of the second filtering process. Therefore, at the initial moment when the echo path changes, the first filtering process can quickly track that the echo path changes, and the first echo path vector obtained after the first filtering process can represent the transient echo path at the current moment. And the second filtering process is slower in updating rate, so that the filter is easier to converge, and a second echo vector obtained after the second filtering process can represent a steady-state echo path at the current moment.
In a real scene, when the echo path has a large abrupt change, the first echo path vector and the second echo path vector should generate a large difference; and when the echo path is unchanged or changed little, the first echo path vector and the second echo path vector should not make a significant difference. Based on this principle, the embodiments of the present disclosure can determine whether an echo path change currently occurs. The present disclosure will be specifically explained in the following embodiments, and will not be described in detail here.
For the specific calculation process of the first echo path vector and the second echo path vector, the following embodiments of the present disclosure are explained, and will not be described in detail here.
And S130, determining that the echo path is detected to be changed in response to the fact that the correlation between the first echo path vector and the second echo path vector is not larger than a preset threshold value.
Specifically, based on the foregoing, the first echo path vector and the second echo path vector respectively represent a transient echo path and a steady echo path, and when a sudden change occurs in a real echo path, the two echo paths should have a larger difference, and vice versa. Therefore, in the embodiment of the present disclosure, a preset threshold may be preset, and a relationship between the correlation and the preset threshold may be determined according to the correlation between the first echo path vector and the second echo path vector, so as to determine whether the echo path changes.
The preset threshold refers to a preset threshold representing the change of the echo path, and when the correlation between the first echo path vector and the second echo path vector is greater than the preset threshold, the correlation between the first echo path vector and the second echo path vector is higher, that is, the transient echo path and the steady echo path are closer, so that the echo path can be determined to be unchanged. And when the correlation between the first echo path vector and the second echo path vector is not greater than the preset threshold, the correlation between the first echo path vector and the second echo path vector is low, that is, the difference between the transient echo path and the steady echo path is large, so that the echo path can be determined to be changed.
In some embodiments, after determining that the echo path changes, the filtering processing parameters may be initialized, so as to avoid filter divergence and improve the echo cancellation effect. The present disclosure is described in detail below, and will not be described in detail here.
The first echo path vector and the second echo path vector in the embodiments of the present disclosure represent a transient impulse response and a steady impulse response at a current time. When the echo path changes, the echo impulse response at the current moment and the echo impulse response at the moment before the change have obviously different shapes. For example, the shape change of the echo impulse response before and after the echo path change is shown in fig. 2, and it can be seen that the shape of the echo impulse response before and after the echo path change also changes greatly. Therefore, the change of the echo path can be accurately detected by using the difference between the transient impulse response and the steady impulse response.
In addition, it should be noted that when the echo path changes significantly, the energy difference between the output signals of the two filters with different update rates is also significant, but considering that the speech signal is highly unstable and has a large dynamic range of energy, if the echo path changes based on the energy difference between the output signals of the two filters, a proper and general energy threshold cannot be designed, and thus the robustness is poor. In the embodiment of the disclosure, the echo path change is judged based on the correlation of the echo path vectors of the two filters with different update rates, and the correlation of the echo path vectors does not depend on the energy value of a specific signal, so that the setting of the preset threshold value is independent of the specific signal, and the method of the disclosure has stronger universality and robustness.
As can be seen from the above, in the embodiment of the present disclosure, based on the two filtering processes with different update rates, the echo path change is determined according to the correlation between the obtained first echo path vector and the obtained second echo path vector, the detection result is more accurate, the echo cancellation effect is improved, and the method of the embodiment of the present disclosure has stronger universality and robustness.
A voice communication system in some embodiments of the present disclosure is shown in fig. 3. As shown in fig. 3, the voice communication system includes a microphone 100 and a speaker 200. When the speaker 200 plays the reference signal x (n), the microphone 100 receives the echo signal y (n). Meanwhile, for a double-talk scene, the first audio signal d (n) picked up by the microphone also includes a near-end audio signal s (n). That is, the first audio signal d (n) ═ y (n) + s (n), where s (n) includes a near-end speech signal and a background noise signal generated by the near-end speaker speaking.
In the system according to some embodiments of the present disclosure, two adaptive filters, namely a first filter h1 and a second filter h2, are included. In some embodiments, the first filter h1 is a kalman filter, which is configured to perform a first filtering process on the first audio signal d (n) to obtain a first echo path vector; the second filter h2 is an NLMS filter, and is configured to perform a second filtering process on the first audio signal d (n) to obtain a second echo path vector.
Kalman filters are widely used in many practical applications, and because kalman filters are robust and fast to large interfering signals, in the embodiments of the present disclosure, the kalman filters can be used to quickly track changes in the echo path. That is, in the disclosed embodiment, the update rate of the first filter h1 is greater than the second filter h 2.
The procedure of the first filtering process in the audio signal processing method of the present disclosure is shown in fig. 4, and is specifically described below with reference to fig. 4.
As shown in fig. 4, in some embodiments, an audio signal processing method of an example of the present disclosure includes:
and S410, determining a first residual signal at the current moment according to the reference signal and the echo path vector at the previous moment.
And S420, updating the echo path vector at the previous moment according to the first residual error signal and the Kalman gain vector at the current moment to obtain the first echo path vector at the current moment.
Specifically, in some embodiments of the present disclosure, a time domain kalman filter in a time sample iterative update mode is used, considering that the first filter h1 needs to quickly detect a sudden change of the echo path.
The observation equation for the first filter h1 is expressed as:
d(n)=y(n)+s(n)
wherein d (n) represents the first audio signal, y (n) represents the echo signal, and s (n) represents the near-end audio signal.
First, a first residual signal e1(n) at the current time may be determined according to the reference signal x (n) at the current time and the echo path vector at the previous time, and is represented as:
wherein,representing the echo path vector estimated by the first filter h1 at the previous instant,representing the echo error signal, e1(n) represents the first residual signal at the current time instant.
Next, after obtaining the first residual signal e1(n), the echo path vector of the first filter h1 may be updated. Specifically, for the kalman filter, a kalman gain k (n) is first calculated, which is expressed as:
Rμ(n)=[IL-k(n)xT(n)]Rm(n)
wherein,is the a priori error signal variance and,is the variance of the noise, Rm(n) is the correlation matrix of the a priori misadjustment errors, k (n) is the Kalman gain vector, Rμ(n) is a correlation matrix of a priori error vectors, ILRepresenting an identity matrix.
Then, based on the first residual signal e1(n) and the kalman gain vector, a first echo path vector at the current time is obtained, which is expressed as:
Two parameters need to be estimated in the Kalman filter, the first parameter beingRepresents the state vector h1 uncertainty, which can be represented by computing the norm between two iterations:
the second parameter being the noise energyIt can be assumed that the first filter h1 has converged to a certain extent and is thus obtained by calculating the energy difference between the desired signal and the echo estimate.
Where β is a forgetting factor, 0< β < 1.
Through the process, the first echo path vector of the current moment can be obtainedWhich represents the transient echo path at the current time.
The process of the second filtering process in the audio signal processing method of the present disclosure is shown in fig. 5, and is specifically described below with reference to fig. 5.
As shown in fig. 5, in some embodiments, an audio signal processing method of an example of the present disclosure includes:
and S510, determining an error signal at the current moment according to the reference signal and the echo path vector at the previous moment.
And S520, obtaining a second residual signal at the current moment according to the first audio signal and the error signal.
S530, updating the echo path vector at the previous moment according to the second residual signal at the current moment and a preset self-adaptive step size parameter to obtain the second echo path vector at the current moment.
Specifically, in some embodiments, the second filter h2 may be an NLMS adaptive filter. The second filter h2 first determines the error signal at the current time according to the reference signal x (n) and the echo path vector at the previous timeExpressed as:
wherein,an error signal indicative of the current time of day,representing the echo path vector at the previous time instant. Then, the first audio signal can be usedThe number d (n) and the error signal, and a second residual signal e2(n) at the current time is calculated and expressed as:
then, the echo path vector at the previous time is updated according to the second residual signal, which is represented as:
wherein,and a second echo path vector, μ, representing the updated current time instant, is a preset adaptation step size parameter of the second filter h 2. In some embodiments, the second filter h2 is considered for estimating the stationary echo path, so μ can take a relatively small positive number.
After the first echo path vector and the second echo path vector are obtained through the embodiments of fig. 4 and fig. 5, the correlation between the first echo path vector and the second echo path vector may be determined, so as to determine whether the echo path changes. The following describes the process of determining the echo path change with reference to fig. 6.
As shown in fig. 6, in some embodiments, an audio signal processing method of an example of the present disclosure includes:
s610, determining a correlation coefficient of the first echo path vector and the second echo path vector according to the first echo path vector and the second echo path vector.
And S620, determining that the echo path is detected to be changed in response to the fact that the correlation coefficient is not larger than a preset correlation threshold value.
Referring to fig. 2, the echo path detecting module 300 may calculate a correlation coefficient between a first echo path vector processed by the first filter h1 and a second echo path vector processed by the second filter h 2.
In particular, in some embodiments, the first echo path vector may be calculated based on a method of Pearson correlation analysisAnd a second echo path vectorThe Pearson correlation coefficient of (a) represents the correlation between the two, expressed as:
where ρ represents a correlation coefficient of the first echo path vector and the second echo path vector, and L represents a filter length.
The value of the correlation coefficient p is between-1 and +1, and the properties are as follows:
1) when | ρ | ═ 1, it means that the first echo path vector and the second echo path vector are completely linearly related, that is, they are completely the same.
2) When ρ is 0, it represents that the first echo path vector and the second echo path vector are not related wirelessly, i.e., they are not related at all.
3) When 0< | ρ | <1, it means that there is a certain degree of linear correlation between the first echo path vector and the second echo path vector. The closer | ρ | is to 1, the closer the linear relationship between the | ρ | and the | ρ | is; the closer | ρ | is to 0, the weaker the linear correlation between the two is.
Based on the above properties, a suitable preset correlation threshold value may be set between 0 and 1, where the preset correlation threshold value indicates that the first echo path vector and the second echo path vector have a linear correlation threshold value. When the correlation coefficient | ρ | is greater than the preset correlation threshold, it indicates that the first echo path vector and the second echo path vector are linearly correlated, thereby determining that the echo path is unchanged. And when the correlation coefficient | ρ | is not greater than the preset correlation threshold, it indicates that the first echo path vector and the second echo path vector are wirelessly correlated, thereby determining that the echo path is detected to be changed.
It is understood that the preset correlation threshold may be obtained according to a priori knowledge or a limited number of experiments, and those skilled in the art may set the preset correlation threshold according to specific requirements of a scene, which is not limited by the present disclosure.
As can be seen from the above, in the embodiment of the present disclosure, based on the two filtering processes with different update rates, the echo path change is determined according to the correlation between the obtained first echo path vector and the obtained second echo path vector, the detection result is more accurate, the echo cancellation effect is improved, and the method of the embodiment of the present disclosure has stronger universality and robustness. In addition, compared with the method for detecting the echo path change by utilizing the correlation between the residual signal after the adaptive filtering and the reference signal, the method disclosed by the invention avoids the problem of the echo path change false detection caused by the increase of the correlation due to the residual component in the residual signal, and improves the detection accuracy.
Under the condition that the echo path changes, if the current filter parameters are continuously adopted for iterative updating, the filter diverges, and the changed echo path cannot be accurately estimated. Therefore, in an embodiment of the present disclosure, after determining that the echo path is detected to have changed, the audio signal processing method further includes:
parameters of the first filtering process and the second filtering process are initialized.
Specifically, after detecting that the echo path changes, as shown in fig. 2, the parameters of the first filter h1 and the second filter h2 may be initialized, so that the first filter h1 and the second filter h2 restart iterative convergence based on the initialized parameters, thereby avoiding the problem of filter divergence or long-time incorrect operation caused by the echo path change, and improving the echo cancellation effect in a complex scene.
In some embodiments, as shown in fig. 2, the voice communication system further includes a residual echo suppression module 400, and the residual echo suppression module 400 may suppress the residual echo in the first audio signal after the echo is removed, so as to obtain a cleaner near-end audio signal. In one example, the residual echo suppression module 400 may employ a RES module.
The processes and principles of the residual echo suppression module 400 can be understood and fully implemented by those skilled in the art based on the relevant art, and the present disclosure is not limited thereto.
Therefore, in the embodiment of the present disclosure, two filters with different update rates are used to detect the transient echo path and the steady echo path, respectively, and determine whether the echo path changes based on the correlation coefficients of the transient echo path and the steady echo path, so as to effectively detect the change of the echo path, and the detection is more versatile and robust, thereby improving the echo cancellation effect. Compared with the method for detecting the echo path change by utilizing the correlation between the residual signal after the adaptive filtering and the reference signal, the method disclosed by the invention avoids the problem of the echo path change false detection caused by the increase of the correlation due to the residual component in the residual signal, and improves the detection accuracy.
In a second aspect, the embodiments of the present disclosure provide an audio signal processing apparatus, which may be applied to an electronic device with a voice communication system, such as a mobile phone, a tablet computer, a notebook computer, and the like, and the disclosure is not limited thereto.
As shown in fig. 7, in some embodiments, an audio signal processing apparatus of an example of the present disclosure includes:
a first filtering module 701 configured to perform a first filtering process based on a reference signal and a first audio signal picked up by a microphone, so as to obtain a first echo path vector; wherein the first audio signal comprises an echo signal generated by a loudspeaker playing a reference signal;
a second filtering module 702 configured to perform a second filtering process based on the reference signal and the first audio signal to obtain a second echo path vector; the filter update rate of the first filter processing is different from the filter update rate of the second filter processing;
an echo path determination module 703 configured to determine that the echo path is detected to be changed in response to a correlation of the first echo path vector and the second echo path vector not being greater than a preset threshold.
As can be seen from the above, in the embodiment of the present disclosure, based on the two filtering processes with different update rates, the echo path change is determined according to the correlation between the obtained first echo path vector and the obtained second echo path vector, the detection result is more accurate, the echo cancellation effect is improved, and the method of the embodiment of the present disclosure has stronger universality and robustness.
In some embodiments, the first filtering process is kalman filtering and the second filtering process is NLMS filtering.
In some embodiments, the first filtering module 701 is specifically configured to:
determining a first residual signal at the current moment according to the reference signal and the echo path vector at the previous moment;
and updating the echo path vector at the previous moment according to the first residual error signal and the Kalman gain vector at the current moment to obtain the first echo path vector at the current moment.
In some embodiments, the second filtering module 702 is specifically configured to:
determining an error signal at the current moment according to the reference signal and the echo path vector at the previous moment;
obtaining a second residual signal at the current moment according to the first audio signal and the error signal;
and updating the echo path vector at the previous moment according to the second residual signal at the current moment and a preset self-adaptive step size parameter to obtain the second echo path vector at the current moment.
In some embodiments, the echo path determination module 703 is specifically configured to:
determining a correlation coefficient of the first echo path vector and the second echo path vector according to the first echo path vector and the second echo path vector;
and determining that the echo path is detected to be changed in response to the correlation coefficient not being greater than the preset correlation threshold.
In some embodiments, the audio signal processing apparatus of the present disclosure further includes:
an initialization module configured to initialize parameters of the first filtering process and the second filtering process.
As can be seen from the above, in the embodiment of the present disclosure, based on the two filtering processes with different update rates, the echo path change is determined according to the correlation between the obtained first echo path vector and the obtained second echo path vector, the detection result is more accurate, the echo cancellation effect is improved, and the method of the embodiment of the present disclosure has stronger universality and robustness. In addition, compared with the method for detecting the echo path change by utilizing the correlation between the residual signal after the adaptive filtering and the reference signal, the method disclosed by the invention avoids the problem of the echo path change false detection caused by the increase of the correlation due to the residual component in the residual signal, and improves the detection accuracy.
In a third aspect, the disclosed embodiments provide an electronic device, including:
a processor; and
a memory storing computer instructions for causing the processor to perform the method according to any of the embodiments of the first aspect.
In a fourth aspect, the disclosed embodiments provide a storage medium storing computer instructions for causing a computer to perform the method according to any one of the embodiments of the first aspect.
Fig. 8 is a block diagram of an electronic device according to some embodiments of the present disclosure, and the following describes principles related to the electronic device and a storage medium according to some embodiments of the present disclosure with reference to fig. 8.
Referring to fig. 8, the electronic device 1800 may include one or more of the following components: processing component 1802, memory 1804, power component 1806, multimedia component 1808, audio component 1810, input/output (I/O) interface 1812, sensor component 1816, and communications component 1818.
The processing component 1802 generally controls the overall operation of the electronic device 1800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 1802 may include one or more processors 1820 to execute instructions. Further, the processing component 1802 may include one or more modules that facilitate interaction between the processing component 1802 and other components. For example, the processing component 1802 can include a multimedia module to facilitate interaction between the multimedia component 1808 and the processing component 1802. As another example, the processing component 1802 can read executable instructions from a memory to implement electronic device related functions.
The memory 1804 is configured to store various types of data to support operation at the electronic device 1800. Examples of such data include instructions for any application or method operating on the electronic device 1800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1804 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 1806 provides power to various components of the electronic device 1800. The power components 1806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 1800.
The multimedia component 1808 includes a display screen that provides an output interface between the electronic device 1800 and a user. In some embodiments, the multimedia component 1808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera can receive external multimedia data when the electronic device 1800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
I/O interface 1812 provides an interface between processing component 1802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 1816 includes one or more sensors to provide status evaluations of various aspects for the electronic device 1800. For example, the sensor component 1816 can detect an open/closed state of the electronic device 1800, the relative positioning of components such as a display and keypad of the electronic device 1800, the sensor component 1816 can also detect a change in position of the electronic device 1800 or a component of the electronic device 1800, the presence or absence of user contact with the electronic device 1800, orientation or acceleration/deceleration of the electronic device 1800, and a change in temperature of the electronic device 1800. Sensor assembly 1816 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 1816 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1816 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 1818 is configured to facilitate communications between the electronic device 1800 and other devices in a wired or wireless manner. The electronic device 1800 may access a wireless network based on a communication standard, such as Wi-Fi, 2G, 3G, 4G, 5G, or 6G, or a combination thereof. In an exemplary embodiment, the communication component 1818 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 1818 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 1800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components.
It should be understood that the above embodiments are only examples for clearly illustrating the present invention, and are not intended to limit the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the present disclosure may be made without departing from the scope of the present disclosure.
Claims (15)
1. An audio signal processing method, comprising:
performing first filtering processing on the basis of a reference signal and a first audio signal picked up by a microphone to obtain a first echo path vector; wherein the first audio signal comprises an echo signal resulting from the playing of the reference signal by a speaker;
performing second filtering processing on the basis of the reference signal and the first audio signal to obtain a second echo path vector; a filter update rate of the first filter process is different from a filter update rate of the second filter process;
determining that a change in echo path is detected in response to the correlation of the first echo path vector and the second echo path vector not being greater than a preset threshold.
2. The method of claim 1,
the first filtering process is Kalman filtering, and the second filtering process is NLMS filtering.
3. The method of claim 2,
the Kalman filtering is time domain Kalman filtering.
4. The method according to claim 2 or 3, wherein the performing a first filtering process based on the reference signal and a first audio signal picked up by a microphone to obtain a first echo path vector comprises:
determining a first residual signal at the current moment according to the reference signal and the echo path vector at the previous moment;
and updating the echo path vector at the previous moment according to the first residual error signal and the Kalman gain vector at the current moment to obtain the first echo path vector at the current moment.
5. The method of claim 2 or 3, wherein performing the second filtering process based on the reference signal and the first audio signal to obtain a second echo path vector comprises:
determining an error signal at the current moment according to the reference signal and the echo path vector at the previous moment;
obtaining a second residual signal at the current moment according to the first audio signal and the error signal;
and updating the echo path vector at the previous moment according to the second residual signal at the current moment and a preset self-adaptive step size parameter to obtain the second echo path vector at the current moment.
6. The method of claim 1, wherein the determining that the echo path is detected to have changed in response to the correlation between the first echo path vector and the second echo path vector not being less than a predetermined threshold value comprises:
determining a correlation coefficient of the first echo path vector and the second echo path vector according to the first echo path vector and the second echo path vector;
and determining that the echo path is detected to be changed in response to the correlation coefficient not being greater than a preset correlation threshold.
7. The method of claim 1, wherein after the determining detects a change in the echo path, the method further comprises:
initializing parameters of the first filtering process and the second filtering process.
8. An audio signal processing apparatus, comprising:
a first filtering module configured to perform a first filtering process based on a reference signal and a first audio signal picked up by a microphone, so as to obtain a first echo path vector; wherein the first audio signal comprises an echo signal resulting from the playing of the reference signal by a speaker;
a second filtering module configured to perform second filtering processing based on the reference signal and the first audio signal to obtain a second echo path vector; a filter update rate of the first filter process is different from a filter update rate of the second filter process;
an echo path determination module configured to determine that an echo path change is detected in response to a correlation of the first echo path vector and the second echo path vector not being greater than a preset threshold.
9. The apparatus of claim 8,
the first filtering process is Kalman filtering, and the second filtering process is NLMS filtering.
10. The apparatus of claim 9, wherein the first filtering module is specifically configured to:
determining a first residual signal at the current moment according to the reference signal and the echo path vector at the previous moment;
and updating the echo path vector at the previous moment according to the first residual error signal and the Kalman gain vector at the current moment to obtain the first echo path vector at the current moment.
11. The apparatus of claim 9, wherein the second filtering module is specifically configured to:
determining an error signal at the current moment according to the reference signal and the echo path vector at the previous moment;
obtaining a second residual signal at the current moment according to the first audio signal and the error signal;
and updating the echo path vector at the previous moment according to the second residual signal at the current moment and a preset self-adaptive step size parameter to obtain the second echo path vector at the current moment.
12. The apparatus of claim 8, wherein the echo path determination module is specifically configured to:
determining a correlation coefficient of the first echo path vector and the second echo path vector according to the first echo path vector and the second echo path vector;
and determining that the echo path is detected to be changed in response to the correlation coefficient not being greater than a preset correlation threshold.
13. The apparatus of claim 8, further comprising:
an initialization module configured to initialize parameters of the first filtering process and the second filtering process.
14. An electronic device, comprising:
a speaker and a microphone;
a processor; and
memory storing computer instructions for causing a processor to perform the method according to any one of claims 1 to 7.
15. A storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110739121.8A CN113362842B (en) | 2021-06-30 | 2021-06-30 | Audio signal processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110739121.8A CN113362842B (en) | 2021-06-30 | 2021-06-30 | Audio signal processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113362842A true CN113362842A (en) | 2021-09-07 |
CN113362842B CN113362842B (en) | 2022-11-11 |
Family
ID=77537528
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110739121.8A Active CN113362842B (en) | 2021-06-30 | 2021-06-30 | Audio signal processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113362842B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6226380B1 (en) * | 1998-02-19 | 2001-05-01 | Nortel Networks Limited | Method of distinguishing between echo path change and double talk conditions in an echo canceller |
CN1937432A (en) * | 2006-09-30 | 2007-03-28 | 南京大学 | Sound echo cancellation processing method based on optimized parameter predication |
JP2009033549A (en) * | 2007-07-27 | 2009-02-12 | Toshiba Corp | Speech processor and echo removing method |
CN103179296A (en) * | 2011-12-26 | 2013-06-26 | 中兴通讯股份有限公司 | Echo canceller and echo cancellation method |
US20150181017A1 (en) * | 2013-12-23 | 2015-06-25 | Imagination Technologies Limited | Echo Path Change Detector |
US9602922B1 (en) * | 2013-06-27 | 2017-03-21 | Amazon Technologies, Inc. | Adaptive echo cancellation |
CN109379501A (en) * | 2018-12-17 | 2019-02-22 | 杭州嘉楠耘智信息科技有限公司 | Filtering method, device, equipment and medium for echo cancellation |
CN111755020A (en) * | 2020-08-07 | 2020-10-09 | 南京时保联信息科技有限公司 | Stereo echo cancellation method |
CN112689056A (en) * | 2021-03-12 | 2021-04-20 | 浙江芯昇电子技术有限公司 | Echo cancellation method and echo cancellation device using same |
-
2021
- 2021-06-30 CN CN202110739121.8A patent/CN113362842B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6226380B1 (en) * | 1998-02-19 | 2001-05-01 | Nortel Networks Limited | Method of distinguishing between echo path change and double talk conditions in an echo canceller |
CN1937432A (en) * | 2006-09-30 | 2007-03-28 | 南京大学 | Sound echo cancellation processing method based on optimized parameter predication |
JP2009033549A (en) * | 2007-07-27 | 2009-02-12 | Toshiba Corp | Speech processor and echo removing method |
CN103179296A (en) * | 2011-12-26 | 2013-06-26 | 中兴通讯股份有限公司 | Echo canceller and echo cancellation method |
US9602922B1 (en) * | 2013-06-27 | 2017-03-21 | Amazon Technologies, Inc. | Adaptive echo cancellation |
US20150181017A1 (en) * | 2013-12-23 | 2015-06-25 | Imagination Technologies Limited | Echo Path Change Detector |
CN109379501A (en) * | 2018-12-17 | 2019-02-22 | 杭州嘉楠耘智信息科技有限公司 | Filtering method, device, equipment and medium for echo cancellation |
CN111755020A (en) * | 2020-08-07 | 2020-10-09 | 南京时保联信息科技有限公司 | Stereo echo cancellation method |
CN112689056A (en) * | 2021-03-12 | 2021-04-20 | 浙江芯昇电子技术有限公司 | Echo cancellation method and echo cancellation device using same |
Non-Patent Citations (2)
Title |
---|
王杰等: "具有双方对讲保护的自适应回波消除新算法", 《控制理论与应用》 * |
袁红星等: "一种低延时双端发音检测方法", 《计算机工程与应用》 * |
Also Published As
Publication number | Publication date |
---|---|
CN113362842B (en) | 2022-11-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11587574B2 (en) | Voice processing method, apparatus, electronic device, and storage medium | |
CN109361828B (en) | Echo cancellation method and device, electronic equipment and storage medium | |
EP2783504B1 (en) | Acoustic echo cancellation based on ultrasound motion detection | |
CN111986693B (en) | Audio signal processing method and device, terminal equipment and storage medium | |
CN113362843B (en) | Audio signal processing method and device | |
CN106791245B (en) | Method and device for determining filter coefficients | |
CN109256145B (en) | Terminal-based audio processing method and device, terminal and readable storage medium | |
CN110970015B (en) | Voice processing method and device and electronic equipment | |
WO2020191512A1 (en) | Echo cancellation apparatus, echo cancellation method, signal processing chip and electronic device | |
CN112447184B (en) | Voice signal processing method and device, electronic equipment and storage medium | |
CN111667842B (en) | Audio signal processing method and device | |
CN113362842B (en) | Audio signal processing method and device | |
CN112489653A (en) | Speech recognition method, device and storage medium | |
WO2022198820A1 (en) | Speech processing method and apparatus, and apparatus for speech processing | |
CN111694539B (en) | Method, device and medium for switching between earphone and loudspeaker | |
CN112217948B (en) | Echo processing method, device, equipment and storage medium for voice call | |
CN111294473B (en) | Signal processing method and device | |
CN111292760B (en) | Sounding state detection method and user equipment | |
CN111629104B (en) | Distance determination method, distance determination device, and computer storage medium | |
CN113489855A (en) | Sound processing method, sound processing device, electronic equipment and storage medium | |
CN113345456B (en) | Echo separation method, device and storage medium | |
CN113470675B (en) | Audio signal processing method and device | |
CN113470676B (en) | Sound processing method, device, electronic equipment and storage medium | |
CN115883736A (en) | Echo cancellation method, device and storage medium | |
CN116778943A (en) | Howling suppression method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |