CN109727605B - Method and system for processing sound signal - Google Patents

Method and system for processing sound signal Download PDF

Info

Publication number
CN109727605B
CN109727605B CN201811645765.5A CN201811645765A CN109727605B CN 109727605 B CN109727605 B CN 109727605B CN 201811645765 A CN201811645765 A CN 201811645765A CN 109727605 B CN109727605 B CN 109727605B
Authority
CN
China
Prior art keywords
sound signal
signal
processed
spectral density
power spectral
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811645765.5A
Other languages
Chinese (zh)
Other versions
CN109727605A (en
Inventor
袁斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
AI Speech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AI Speech Ltd filed Critical AI Speech Ltd
Priority to CN201811645765.5A priority Critical patent/CN109727605B/en
Publication of CN109727605A publication Critical patent/CN109727605A/en
Application granted granted Critical
Publication of CN109727605B publication Critical patent/CN109727605B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

The invention discloses a method and a system for processing a sound signal. One embodiment of the method comprises: acquiring a sound signal to be processed, wherein the sound signal to be processed comprises a target sound signal and an interference sound signal; determining the power spectral density of the interference sound signal, and performing weighting processing on the sound signal to be processed according to the power spectral density to obtain a frequency spectrum estimation of a target sound signal; determining a masking threshold from the spectral estimate; and under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold, carrying out filtering processing on the sound signal to be processed. The method can reduce the distortion of the sound signal, sound more naturally, reduce the complexity of algorithm calculation and accelerate the convergence speed of the preposed echo canceller. And the robustness under strong background noise and near-end voice environment can be improved.

Description

Method and system for processing sound signal
Technical Field
The present invention relates to the field of signal processing technologies, and in particular, to a method and a system for processing a sound signal.
Background
In the prior art, the 'music noise' can be reduced in the filtering processing of the sound signal, but the problem that the voice signal after the noise reduction processing of the filter is not natural to a certain extent exists. This phenomenon is called masking effect, because the human ear is likely to be disturbed and suppressed by one sound when receiving another sound. The closer the tones or time of the two sounds are, the more serious the masking effect is, so that the original characteristics of the residual noise after the noise reduction processing by the post filter are lost, and the auditory test is unnatural to a certain extent.
Disclosure of Invention
Embodiments of the present invention provide a method and system for processing an audio signal, so as to solve at least one of the above technical problems.
In a first aspect, an embodiment of the present invention provides a method for processing a sound signal, including: acquiring a sound signal to be processed, wherein the sound signal to be processed comprises a target sound signal and an interference sound signal; determining the power spectral density of the interference sound signal, and performing weighting processing on the sound signal to be processed according to the power spectral density to obtain a frequency spectrum estimation of a target sound signal; determining a masking threshold from the spectral estimate; and under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold, carrying out filtering processing on the sound signal to be processed.
Optionally, the disturbing sound signal comprises a noise signal and an echo signal.
Optionally, the step of performing weighting processing on the sound signal to be processed according to the power spectral density to obtain a spectral estimate of the target sound signal includes:
converting the sound signal to be processed into a frequency domain signal E (omega);
the posterior signal-to-noise ratio PostSNR (Ω) is determined according to the following formula:
PostSNR(Ω)=|E(Ω)|2/(Rbb(Ω)+Rnn(Ω)),
wherein R isbb(omega) is the power spectral density, R, of the echo signalnn(Ω) is the power spectral density of the noise signal;
the a priori signal-to-noise ratio PrioriSNR (Ω) is derived according to the following equation:
PrioriSNR(Ωi)=(1-alpha)*P(PostSNR(Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω);
where alpha is a smoothing factor, p (x) ═ (| x | + x)/2, S' (Ω)i-1) Estimating a frequency spectrum of the last frame of sound signal;
further calculating a weighting factor HLSA(Ω), and obtaining a spectral estimate S' (Ω) of the target sound signal:
Figure BDA0001932029980000021
S’(Ω)=E(Ω)*HLSA(Ω),
where theta is PostSNR (Ω) × PrioriSNR (Ω)/(PrioriSNR (Ω) + 1).
Optionally, in a case that it is determined that a spectral component of an interfering sound signal in the sound signal to be processed is greater than the masking threshold, the step of performing filtering processing on the sound signal to be processed includes:
determining a weighting coefficient H (omega) of the filtering process according to the power spectral density of the echo signal and the power spectral density of the noise signal:
H(Ω)=min(1,sqrt(RTT(Ω)/(Rbb(Ω)+Rnn(Ω))) +(zeta_b*Rbb(Ω)+zeta_n*Rnn(Ω))/(Rbb(Ω)+Rnn(Ω))),
wherein R isbb(omega) is the power spectral density, R, of the echo signalnn(Ω) is the power spectral density of the noise signal, zeta _ b is the echo attenuation coefficient, and zeta _ n is the noise attenuation coefficient.
Optionally, the step of determining a masking threshold from the spectral estimate comprises:
determining, from the spectral estimation, a power spectral density b (k) and an extended critical band spectrum c (k) of a critical band of the sound signal to be processed:
Figure BDA0001932029980000022
C(k)=B(k)*SF(k),
wherein sf (k) ═ 15.81+7.5 × k +0.474-17.5 × sqrt (1+ (k +0.474)2), bh, bl are the upper and lower limit frequencies of each critical band, respectively;
determining a preliminary masking threshold t (k) according to the spread critical band spectrum c (k) and the offset function o (k):
T(k)=10lg(C(k))-(O(k)/10)
wherein the offset function o (k) ═ belta (14.5+ k) + (1-belta) × 5.5; belta is the pitch coefficient;
according to a preliminary masking threshold T (k) and an absolute hearing threshold Tabs(k) Determining a masking threshold RTT(Ω):
RTT(Ω)=min(T(k),Tabs(k)),
Wherein, Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4
Optionally, the step of acquiring the sound signal to be processed includes:
receiving an initial sound signal;
and carrying out echo cancellation on the initial sound signal to obtain the sound signal to be processed.
Optionally, the sound signal to be processed is a speech signal.
In a second aspect, an embodiment of the present invention provides a system for processing a sound signal, including: the system comprises a signal acquisition module, a processing module and a processing module, wherein the signal acquisition module is used for acquiring a sound signal to be processed, and the sound signal to be processed comprises a target sound signal and an interference sound signal; the frequency spectrum estimation determining module is used for determining the power spectral density of the interference sound signal and performing weighting processing on the sound signal to be processed according to the power spectral density to obtain frequency spectrum estimation of a target sound signal; a masking threshold determination module for determining a masking threshold from the spectral estimate; and the filtering processing module is used for performing filtering processing on the sound signal to be processed under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold.
Optionally, the disturbing sound signal comprises a noise signal and an echo signal.
Optionally, the spectrum estimation determining module is further configured to convert the sound signal to be processed into a frequency domain signal E (Ω); and determining the posterior signal-to-noise ratio PostSNR (Ω) according to the following formula:
PostSNR(Ω)=|E(Ω)|2/(Rbb(Ω)+Rnn(Ω)),
wherein R isbb(omega) is the power spectral density, R, of the echo signalnn(Ω) is the power spectral density of the noise signal;
the a priori signal-to-noise ratio PrioriSNR (Ω) is derived according to the following equation:
PrioriSNR(Ωi)=(1-alpha)*P(PostSNR(Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω);
where alpha is a smoothing factor, p (x) ═ (| x | + x)/2, S' (Ω)i-1) Estimating a frequency spectrum of the last frame of sound signal;
further calculating a weighting factor HLSA(omega) and obtaining the target sound messageSpectral estimation of sign S' (Ω):
Figure BDA0001932029980000041
S’(Ω)=E(Ω)*HLSA(Ω),
where theta is PostSNR (Ω) × PrioriSNR (Ω)/(PrioriSNR (Ω) + 1).
Optionally, the masking threshold determining module is further configured to determine, according to the spectrum estimation, a power spectral density b (k) and an extended critical band spectrum c (k) of a critical band of the sound signal to be processed:
Figure BDA0001932029980000042
C(k)=B(k)*SF(k),
wherein sf (k) ═ 15.81+7.5 × k +0.474-17.5 × sqrt (1+ (k +0.474)2), bh, bl are the upper and lower limit frequencies of each critical band, respectively;
determining a preliminary masking threshold t (k) according to the spread critical band spectrum c (k) and the offset function o (k):
T(k)=10lg(C(k))-(O(k)/10)
wherein the offset function o (k) ═ belta (14.5+ k) + (1-belta) × 5.5; belta is the pitch coefficient;
according to a preliminary masking threshold T (k) and an absolute hearing threshold Tabs(k) Determining a masking threshold RTT(Ω):
RTT(Ω)=min(T(k),Tabs(k)),
Wherein, Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4
Optionally, the filtering processing module is further configured to determine a weighting coefficient H (Ω) of the filtering processing according to the power spectral density of the echo signal and the power spectral density of the noise signal:
H(Ω)=min(1,sqrt(RTT(Ω)/(Rbb(Ω)+Rnn(Ω))) +(zeta_b*Rbb(Ω)+zeta_n*Rnn(Ω))/(Rbb(Ω)+Rnn(Ω))),
wherein R isbb(omega) is the power spectral density, R, of the echo signalnn(Ω) is the power spectral density of the noise signal, zeta _ b is the echo attenuation coefficient, and zeta _ n is the noise attenuation coefficient.
Optionally, the signal obtaining module is further configured to receive an initial sound signal; and carrying out echo cancellation on the initial sound signal to obtain the sound signal to be processed.
In a third aspect, embodiments of the present invention provide a storage medium, in which one or more programs including execution instructions are stored, where the execution instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform any one of the above methods for processing a sound signal according to the present invention.
In a fourth aspect, an electronic device is provided, comprising: the system comprises at least one processor and a memory which is connected with the at least one processor in a communication mode, wherein the memory stores instructions which can be executed by the at least one processor, and the instructions are executed by the at least one processor so as to enable the at least one processor to execute any method and system for processing sound signals.
In a fifth aspect, the present invention further provides a computer program product, which includes a computer program stored on a storage medium, the computer program including program instructions, when executed by a computer, cause the computer to execute any one of the above methods and systems for processing a sound signal.
The embodiment of the invention has the beneficial effects that: the sound signal distortion can be reduced, the sound is more natural, the masking threshold value is further determined through the calculated power spectral density PSD of the interference sound signal, and the algorithm calculation complexity is reduced. And the order requirement on the pre-echo cancellation filter is reduced, and the convergence speed of the pre-echo canceller is further accelerated. And the robustness under strong background noise and near-end voice environment can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is a flowchart illustrating a method for processing an audio signal according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating another embodiment of a method for processing an audio signal according to the present invention;
FIG. 3 is a diagram illustrating an embodiment of a system for implementing a method for processing a speech signal according to the present invention;
FIG. 4 is a diagram illustrating an embodiment of a method for processing a speech signal according to the present invention;
FIG. 5 is a diagram illustrating an embodiment of a system for processing an audio signal according to the present invention;
fig. 6 is a schematic structural diagram of an embodiment of an electronic device according to the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
As used in this disclosure, "module," "device," "system," and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, or software in execution. In particular, for example, an element may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. Also, an application or script running on a server, or a server, may be an element. One or more elements may be in a process and/or thread of execution and an element may be localized on one computer and/or distributed between two or more computers and may be operated by various computer-readable media. The elements may also communicate by way of local and/or remote processes in accordance with a signal having one or more data packets, e.g., signals from data interacting with another element in a local system, distributed system, and/or across a network of the internet with other systems by way of the signal.
Finally, it should be further noted that relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
As shown in fig. 1, an embodiment of the present invention provides a method of processing a sound signal, including:
step S11: and acquiring a sound signal to be processed, wherein the sound signal to be processed comprises a target sound signal and an interference sound signal.
Step S12: and determining the power spectral density of the interference sound signal, and performing weighting processing on the sound signal to be processed according to the power spectral density to obtain the frequency spectrum estimation of the target sound signal. Specifically, after the power spectral density of the interfering sound signal is determined, posterior and prior signal-to-noise ratios are determined, a weighting coefficient is calculated according to the signal-to-noise ratios, and the sound signal to be processed is weighted to obtain the frequency spectrum estimation of the target sound information.
Step S13: a masking threshold is determined from the spectral estimate.
Step S14: and under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold, filtering the sound signal to be processed.
And, in the embodiment of the present invention, for the calculation of the masking threshold, specifically:
determining, from the spectral estimation, the power spectral density b (k) and the extended critical band spectrum c (k) of the critical band of the sound signal to be processed:
Figure BDA0001932029980000071
C(k)=B(k)*SF(k),
wherein sf (k) ═ 15.81+7.5 × k +0.474-17.5 × sqrt (1+ (k +0.474)2), bh, bl are the upper and lower limit frequencies of each critical band, respectively;
determining a preliminary masking threshold t (k) according to the spread critical band spectrum c (k) and the offset function o (k):
T(k)=10lg(C(k))-(O(k)/10)
wherein the offset function o (k) ═ belta (14.5+ k) + (1-belta) × 5.5; belta is the pitch coefficient;
according to a preliminary masking threshold T (k) and an absolute hearing threshold Tabs(k) Determining a masking threshold RTT(Ω):
RTT(Ω)=min(T(k),Tabs(k)),
Wherein, Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4
According to the embodiment of the invention, the masking threshold is further determined through the calculated power spectral density PSD of the interference sound signal, and the complexity of algorithm calculation is reduced in the process. And the order requirement on the prepositive echo cancellation filter is reduced, and the convergence speed of the prepositive echo canceller is accelerated. And the robustness under strong background noise and near-end voice environment can be improved.
As shown in fig. 2, an embodiment of the present invention provides a method of processing a sound signal, including:
step S21: an initial sound signal is received. The initial sound signal may be picked up by a sound pickup device such as a microphone.
Step S22: and performing echo cancellation on the initial sound signal through an echo canceller to obtain a sound signal to be processed.
Step S23: and determining the power spectral density of the interference sound signal, and performing weighting processing on the sound signal to be processed according to the power spectral density to obtain the frequency spectrum estimation of the target sound signal.
Step S24: a masking threshold is determined from the spectral estimate.
Step S25: and under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold, filtering the sound signal to be processed.
According to the embodiment of the invention, after the initial signal is received, the echo cancellation is performed on the initial signal initially, so that the processing precision of the sound signal can be improved.
If the sound signal to be processed comprises a noise signal and an echo signal, the sound signal to be processed is weighted according to the power spectral density, so as to obtain the frequency spectrum estimation process of the target sound signal:
converting the sound signal to be processed into a frequency domain signal E (omega);
the posterior signal-to-noise ratio PostSNR (Ω) is determined according to the following formula:
PostSNR(Ω)=|E(Ω)|2/(Rbb(Ω)+Rnn(Ω)),
wherein R isbb(omega) is the power spectral density, R, of the echo signalnn(Ω) is the power spectral density of the noise signal;
the a priori signal-to-noise ratio PrioriSNR (Ω) is derived according to the following equation:
PrioriSNR(Ωi)=(1-alpha)*P(PostSNR(Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω);
where alpha is a smoothing factor, p (x) ═ (| x | + x)/2, S' (Ω)i-1) Estimating a frequency spectrum of the last frame of sound signal;
further calculating a weighting factor HLSA(Ω), and obtaining a spectral estimate S' (Ω) of the target sound signal:
Figure BDA0001932029980000091
S’(Ω)=E(Ω)*HLSA(Ω),
where theta is PostSNR (Ω) × PrioriSNR (Ω)/(PrioriSNR (Ω) + 1).
Under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold, the step of filtering the sound signal to be processed comprises the following steps:
determining a weighting coefficient H (omega) of the filtering process according to the power spectral density of the echo signal and the power spectral density of the noise signal:
H(Ω)=min(1,sqrt(RTT(Ω)/(Rbb(Ω)+Rnn(Ω))) +(zeta_b*Rbb(Ω)+zeta_n*Rnn(Ω))/(Rbb(Ω)+Rnn(Ω))),
wherein R isbb(omega) is the power spectral density, R, of the echo signalnn(Ω) is the power spectral density of the noise signal, zeta _ b is the echo attenuation coefficient, and zeta _ n is the noise attenuation coefficient.
The embodiment of the invention keeps the original background noise characteristic, the residual echo hearing test is more noise-like, and the voice distortion is reduced, so that the sound can be heard more naturally. And the order requirement on the prepositive echo cancellation filter is reduced, so that the convergence speed of the prepositive echo canceller is accelerated, and the algorithm calculation complexity of the echo canceller is reduced. And the robustness under strong background noise and near-end speech environment can be improved.
As shown in fig. 3, in the embodiment of the present invention, the method of processing a speech signal of the present invention implements a speech signal transmitted from a far-end microphone in the system, which is shown by a loudspeaker, and constitutes an initial echo signal d (k). The near-end microphone picks up a voice signal y (k) which includes a pure voice signal s (k) which is a target voice signal, a noise signal n (k), and an initial echo signal d (k) fed back by the speaker through the LRM. First, the echo canceller C performs echo cancellation on the voice signal y (k) picked up by the near-end microphone, and the filter H further performs filtering processing.
As shown in fig. 4, an embodiment of the present invention provides a method for processing a speech signal, including:
the near-end microphone picks up a voice signal y (k) which comprises a pure voice signal s (k), a noise signal n (k) and an initial echo signal d (k) fed back by the LRM through the loudspeaker. In the embodiment of the invention, the pure voice signal is target information.
The echo canceller performs echo cancellation on the voice signal y (k) picked up by the near-end microphone to obtain a voice signal e (k) after echo cancellation. The echo-cancelled speech signal e (k) comprises interfering sound signals, which are noise signals and residual echo signals.
Estimating noise PSD Rnn (omega) and residual echo PSD R by a statistical or autocorrelation methodbb(Ω)。
And the post filter carries out weighting processing on the near-end microphone signal after the echo cancellation to obtain a primary frequency spectrum estimation S' (omega) of the pure speech signal. The specific process comprises the following steps:
a) calculating the posterior signal-to-noise ratio:
PostSNR(Ω)=|E(Ω)|2/(Rbb(Ω)+Rnn(Ω))
b) deriving a priori signal-to-noise ratio according to a decision-directed method:
PrioriSNR(Ωi)=(1-alpha)*P(PostSNR(Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω)
where alpha is a smoothing factor, p (x) ═ (| x | + x)/2, S' (Ω)i-1) Is a preliminary estimate of the previous frame of speech signal.
c) Defining theta ═ PostSNR (Ω) × PrioriSNR (Ω)/(PrioriSNR (Ω) +1), then calculating the weighting coefficients:
Figure BDA0001932029980000101
d) weighting to obtain initial estimation S' (omega) E (omega) H of voice signalLSA(Ω)
Then, a masking threshold R is estimated from the preliminary estimate S' (Ω) of the speech signal spectrumTT(omega). The specific process comprises the following steps:
a) performing critical band analysis on the signal, and regarding the human ear as a discrete band-pass filter bank according to the position theory, wherein one critical band is called a Bark, then
Power spectral density of each critical band
Figure BDA0001932029980000102
Wherein bh and bl are the upper and lower limit frequencies of each critical frequency band, respectively, and k is related to the sampling rate.
b) Calculating the spreading function sf (k):
SF(k)=15.81+7.5*k+0.474-17.5*sqrt(1+(k+0.474)2)
due to the interaction between critical bands, the spread spectrum critical band spectrum can be expressed as c (k) ═ b (k) × sf (k).
c) Calculating a masking threshold R for masking noise and residual echoTT(Ω)。
Since there are two masking thresholds, respectively: the threshold for masking noise and residual echo is C (k) - (14.5+ k) db, and the threshold for masking pure noise and residual echo is C (k) -5.5 db.
Therefore, determining whether the signal resembles a pure tone or noise and a residual echo, and hence the spectral flatness measure SFM needs to be defined:
SFM=10*lg(G/A)
wherein G, a are the geometric mean and the arithmetic mean, respectively, of the power spectral density of the signal.
And defining pitch coefficient belta ═ min (SFM/SFM)max,1)
Calculating the offset function O (k) of the masking energy of each frequency band by belta:
O(k)=belta*(14.5+k)+(1-belta)*5.5
the masking threshold size is then: t (k) ═ 10lg(C(k))-(O(k)/10)
Returning the calculated spreading function threshold to Bark domain
Comparing with an absolute hearing threshold of the human ear, and taking the absolute hearing threshold value if the calculated masking threshold value is lower than the absolute hearing threshold of the human ear, wherein the absolute hearing threshold Tabs (k) is defined as:
Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4
therefore, the final masking threshold is RTT(Ω)=min(T(k),Tabs(k))。
Further, psychoacoustic weighted filtering is performed on the frequency domain microphone signal E (Ω) after echo cancellation. The FFT (fast Fourier transform) can be used for converting the digital signal of the time domain into a frequency domain signal, judging whether the noise frequency spectrum component in the frequency domain microphone signal E (omega) after echo cancellation is smaller than a masking threshold value or not, and if so, reserving and not processing the signal; if not, the corresponding noise frequency spectrum component is attenuated according to the traditional MMSE-LSA.
The specific derivation process of the psychoacoustic weighting filter coefficients is as follows:
the design goal of psycho-acoustic adaptive weighted filtering is to minimize the distortion of the near-end speech signal when the sum of the residual echo distortion and the noise distortion is equal to the masking threshold, so the optimal psycho-acoustic weighted filter coefficients H (Ω) satisfy:
[zeta_b–H(Ω)]2Rbb(Ω)+[zeta_n–H(Ω)]2Rnn(Ω)=RTT(Ω)
where, zeta _ b is a residual echo attenuation coefficient, and is usually 20lg (zeta _ b) — 35;
the zeta _ n is a noise attenuation coefficient, and is usually 20lg (zeta _ n) — 15.
Since 0< ═ H (Ω) < ═ 1, solving the above quadratic equation H (Ω) takes a positive value:
H(Ω)=min(1,[zeta_b*Rbb(Ω)+zeta_n*Rnn(Ω)+
sqrt([Rbb(Ω)+Rnn(Ω)]*RTT(Ω)-[zeta_b-zeta_n]2*Rbb(Ω)*Rbb(Ω))]/(Rbb(Ω)+ Rnn(Ω)))
because of zeta _ b, zeta _ n is much less than 1 and is generally relative to Rbb(omega) and Rbb(omega) for RTT(Ω) is not too small, so the above formula can be simplified to:
H(Ω)=min(1,sqrt(RTT(Ω)/(Rbb(Ω)+Rnn(Ω))) +(zeta_b*Rbb(Ω)+zeta_n*Rnn(Ω))/(Rbb(Ω)+Rnn(Ω)))
in the embodiment of the invention, because the psychoacoustic post filter can also reduce the order requirement on the pre echo cancellation self-adaptive filter, the convergence speed of the echo canceller can be accelerated, the computational complexity of the algorithm is reduced, and the robustness of the echo canceller in the environment of strong background noise and near-end voice can be improved.
And fusing residual echo cancellation in the post-psychoacoustic weighting filter, and adaptively updating the weighting coefficient of the filter by using the residual echo to further cancel the acoustic echo. In addition, the noise spectrum and the residual echo component below the masking threshold are inaudible due to the human ear masking effect, so the noise spectrum and the residual echo component do not need to be attenuated, and the traditional post-adaptive filtering method is only needed to be used for attenuating the noise spectrum and the residual echo component which are not masked by the voice signal, so the original background noise characteristic is well reserved, the residual echo hearing test is more noise-like, the voice distortion is reduced, and the hearing is more natural.
It should be noted that for simplicity of explanation, the foregoing method embodiments are described as a series of acts or combination of acts, but those skilled in the art will appreciate that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention. In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
As shown in fig. 5, an embodiment of the present invention further provides a system 500 for processing a sound signal, including:
the signal obtaining module 510 is configured to obtain a sound signal to be processed, where the sound signal to be processed includes a target sound signal and an interfering sound signal.
The spectrum estimation determining module 520 is configured to determine a power spectral density of the interfering sound signal, and perform weighting processing on the sound signal to be processed according to the power spectral density to obtain a spectrum estimation of the target sound signal.
A masking threshold determination module 530 for determining a masking threshold from the spectral estimate.
And the filtering processing module 540 is configured to perform filtering processing on the sound signal to be processed when it is determined that the frequency spectrum component of the interfering sound signal in the sound signal to be processed is greater than the masking threshold.
Further, the disturbing sound signal includes a noise signal and an echo signal.
The frequency spectrum estimation determining module is further used for converting the sound signal to be processed into a frequency domain signal E (omega); and determining the posterior signal-to-noise ratio PostSNR (Ω) according to the following formula:
PostSNR(Ω)=|E(Ω)|2/(Rbb(Ω)+Rnn(Ω)),
wherein R isbb(omega) is the power spectral density, R, of the echo signalnn(omega) isA power spectral density of the noise signal;
the a priori signal-to-noise ratio PrioriSNR (Ω) is derived according to the following equation:
PrioriSNR(Ωi)=(1-alpha)*P(PostSNR(Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω);
where alpha is a smoothing factor, p (x) ═ (| x | + x)/2, S' (Ω)i-1) Estimating a frequency spectrum of the last frame of sound signal;
further calculating a weighting factor HLSA(Ω), and obtaining a spectral estimate S' (Ω) of the target sound signal:
Figure BDA0001932029980000131
S’(Ω)=E(Ω)*HLSA(Ω),
where theta is PostSNR (Ω) × PrioriSNR (Ω)/(PrioriSNR (Ω) + 1).
The masking threshold determining module is further configured to determine, according to the spectrum estimation, a power spectral density b (k) of a critical band and an extended critical band spectrum c (k) of the sound signal to be processed:
Figure BDA0001932029980000132
C(k)=B(k)*SF(k),
wherein sf (k) ═ 15.81+7.5 × k +0.474-17.5 × sqrt (1+ (k +0.474)2), bh, bl are the upper and lower limit frequencies of each critical band, respectively;
determining a preliminary masking threshold t (k) according to the spread critical band spectrum c (k) and the offset function o (k):
T(k)=10lg(C(k))-(O(k)/10)
wherein the offset function o (k) ═ belta (14.5+ k) + (1-belta) × 5.5; belta is the pitch coefficient;
according to a preliminary masking threshold T (k) and an absolute hearing threshold Tabs(k) Determining a masking threshold RTT(Ω):
RTT(Ω)=min(T(k),Tabs(k)),
Wherein, Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4
The filtering processing module is further configured to determine a weighting coefficient H (Ω) of the filtering processing according to the power spectral density of the echo signal and the power spectral density of the noise signal:
H(Ω)=min(1,sqrt(RTT(Ω)/(Rbb(Ω)+Rnn(Ω))) +(zeta_b*Rbb(Ω)+zeta_n*Rnn(Ω))/(Rbb(Ω)+Rnn(Ω))),
wherein R isbb(omega) is the power spectral density, R, of the echo signalnn(Ω) is the power spectral density of the noise signal, zeta _ b is the echo attenuation coefficient, and zeta _ n is the noise attenuation coefficient.
The signal acquisition module is also used for receiving an initial sound signal; and carrying out echo cancellation on the initial sound signal to obtain a sound signal to be processed.
According to the embodiment of the invention, the masking threshold is further determined through the calculated power spectral density PSD of the interference sound signal, and the complexity of algorithm calculation is reduced in the process. And the order requirement on the prepositive echo cancellation filter is reduced, and the convergence speed of the prepositive echo canceller is accelerated. And the robustness under strong background noise and near-end voice environment can be improved.
In some embodiments, the present invention provides a non-transitory computer readable storage medium, in which one or more programs including executable instructions are stored, the executable instructions being readable and executable by an electronic device (including but not limited to a computer, a server, or a network device, etc.) for executing any of the above methods of processing a sound signal of the present invention.
In some embodiments, the present invention further provides a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform any of the above methods of processing sound signals.
In some embodiments, an embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of processing sound signals.
In some embodiments, the present invention further provides a storage medium having a computer program stored thereon, wherein the program is a method of processing a sound signal when executed by a processor.
The system for processing a sound signal according to the above embodiment of the present invention can be used to execute the method for processing a sound signal according to the above embodiment of the present invention, and accordingly achieve the technical effect achieved by the method for processing a sound signal according to the above embodiment of the present invention, and will not be described again here. In the embodiment of the present invention, the relevant functional module may be implemented by a hardware processor (hardware processor).
Fig. 6 is a schematic diagram of a hardware structure of an electronic device for executing a method of processing a sound signal according to another embodiment of the present application, where as shown in fig. 6, the electronic device includes:
one or more processors 610 and a memory 620, with one processor 610 being an example in fig. 6.
The apparatus performing the method of processing a sound signal may further include: an input device 630 and an output device 640.
The processor 610, the memory 620, the input device 630, and the output device 640 may be connected by a bus or other means, such as being connected by a bus in fig. 6.
The memory 620, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the method of processing sound signals in the embodiments of the present application. The processor 610 executes various functional applications of the server and data processing, i.e., implements the method of processing sound signals of the above-described method embodiments, by executing the nonvolatile software programs, instructions, and modules stored in the memory 620.
The memory 620 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the apparatus for processing a sound signal, and the like. Further, the memory 620 may include high speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 620 optionally includes memory located remotely from the processor 610, which may be connected to a device that processes sound signals via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 630 may receive input numeric or character information and generate signals related to user settings and function control of the device that processes the sound signals. The output device 640 may include a display device such as a display screen.
The one or more modules are stored in the memory 620 and, when executed by the one or more processors 610, perform a method of processing a sound signal in any of the method embodiments described above.
The product can execute the method provided by the embodiment of the application, and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.
The electronic device of the embodiments of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.
(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.
(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.
(5) And other electronic devices with data interaction functions.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
Through the above description of the embodiments, those skilled in the art will clearly understand that the embodiments may be implemented by software plus a general hardware platform, and may also be implemented by hardware. Based on such understanding, the technical solutions mentioned above may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute the method according to the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (11)

1. A method of processing a sound signal, comprising:
acquiring a sound signal to be processed, wherein the sound signal to be processed comprises a target sound signal and an interference sound signal, and the interference sound signal comprises a noise signal and an echo signal;
determining a power spectral density of the interfering sound signal;
converting the sound signal to be processed into a frequency domain signal E (omega);
the posterior signal-to-noise ratio PostSNR (Ω) is determined according to the following formula:
PostSNR(Ω)=|E(Ω)|2/(Rbb(Ω)+Rnn(Ω)),
wherein R isbb(omega) is the power spectral density, R, of the echo signalnn(Ω) is the power spectral density of the noise signal;
the a priori signal-to-noise ratio PrioriSNR (Ω) is derived according to the following equation:
PrioriSNR(Ωi)=(1-alpha)*P(PostSNR(Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω);
where alpha is a smoothing factor, p (x) ═ (| x | + x)/2, S' (Ω)i-1) Estimating the frequency spectrum of the sound signal of the previous frame;
further calculating a weighting factor HLSA(Ω), and obtaining a spectral estimate S' (Ω) of the target sound signal:
Figure FDA0002472970470000011
S’(Ω)=E(Ω)*HLSA(Ω),
wherein theta is PostSNR (Ω) × PrioriSNR (Ω)/(PrioriSNR (Ω) + 1);
determining a masking threshold from the spectral estimate;
and under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold, carrying out filtering processing on the sound signal to be processed.
2. The method according to claim 1, wherein the step of performing filtering processing on the sound signal to be processed when it is determined that the spectral component of the interfering sound signal in the sound signal to be processed is greater than the masking threshold comprises:
determining a weighting coefficient H (omega) of the filtering process according to the power spectral density of the echo signal and the power spectral density of the noise signal:
H(Ω)=min(1,sqrt(RTT(Ω)/(Rbb(Ω)+Rnn(Ω)))+(zeta_b*Rbb(Ω)+zeta_n*Rnn(Ω))/(Rbb(Ω)+Rnn(Ω))),
wherein R isbb(omega) is the power spectral density, R, of the echo signalnn(Ω) is the power spectral density of the noise signal, zeta _ b is the echo attenuation coefficient, and zeta _ n is the noise attenuation coefficient.
3. The method of claim 1, wherein the step of determining a masking threshold based on the spectral estimate comprises:
determining, from the spectral estimation, a power spectral density b (k) and an extended critical band spectrum c (k) of a critical band of the sound signal to be processed:
Figure FDA0002472970470000021
C(k)=B(k)*SF(k),
wherein sf (k) ═ 15.81+7.5 × k +0.474-17.5 × sqrt (1+ (k +0.474)2), bh, bl are the upper and lower limit frequencies of each critical band, respectively;
determining a preliminary masking threshold t (k) according to the spread critical band spectrum c (k) and the offset function o (k):
T(k)=10lg(C(k))-(O(k)/10)
wherein the offset function o (k) ═ belta (14.5+ k) + (1-belta) × 5.5; belta is the pitch coefficient;
according to a preliminary masking threshold T (k) and an absolute hearing threshold Tabs(k) Determining a masking threshold RTT(Ω):
RTT(Ω)=min(T(k),Tabs(k)),
Wherein, Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4
4. The method of claim 1, wherein the step of obtaining the sound signal to be processed comprises:
receiving an initial sound signal;
and carrying out echo cancellation on the initial sound signal to obtain the sound signal to be processed.
5. The method according to claim 1, characterized in that the sound signal to be processed is a speech signal.
6. A system for processing a sound signal, comprising:
the device comprises a signal acquisition module, a processing module and a processing module, wherein the signal acquisition module is used for acquiring a sound signal to be processed, the sound signal to be processed comprises a target sound signal and an interference sound signal, and the interference sound signal comprises a noise signal and an echo signal;
the frequency spectrum estimation determining module is used for determining the power spectral density of the interference sound signal and carrying out weighting processing on the sound signal to be processed according to the power spectral density to obtain the frequency spectrum estimation of the target sound signal;
a masking threshold determination module for determining a masking threshold from the spectral estimate;
the filtering processing module is used for performing filtering processing on the sound signal to be processed under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold;
the frequency spectrum estimation determining module is further used for converting the sound signal to be processed into a frequency domain signal E (omega); and determining the posterior signal-to-noise ratio PostSNR (Ω) according to the following formula:
PostSNR(Ω)=|E(Ω)|2/(Rbb(Ω)+Rnn(Ω)),
wherein R isbb(omega) is the power spectral density, R, of the echo signalnn(Ω) is the power spectral density of the noise signal;
the a priori signal-to-noise ratio PrioriSNR (Ω) is derived according to the following equation:
PrioriSNR(Ωi)=(1-alpha)*P(PostSNR(Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω);
where alpha is a smoothing factor, p (x) ═ (| x | + x)/2, S' (Ω)i-1) Estimating the frequency spectrum of the sound signal of the previous frame;
further calculating a weighting factor HLSA(Ω), and obtaining a spectral estimate S' (Ω) of the target sound signal:
Figure FDA0002472970470000031
S’(Ω)=E(Ω)*HLSA(Ω),
where theta is PostSNR (Ω) × PrioriSNR (Ω)/(PrioriSNR (Ω) + 1).
7. The system according to claim 6, wherein the masking threshold determination module is further configured to determine, according to the spectral estimation, the power spectral density B (k) and the spectrum C (k) of the critical band of the sound signal to be processed:
Figure FDA0002472970470000032
C(k)=B(k)*SF(k),
wherein sf (k) ═ 15.81+7.5 ═ k +0.474 to 17.5 · sqrt (1+ (k +0.474)2) Bh and bl are the upper and lower limit frequencies of each critical frequency band respectively;
determining a preliminary masking threshold t (k) according to the spread critical band spectrum c (k) and the offset function o (k):
T(k)=10lg(C(k))-(O(k)/10)
wherein the offset function o (k) ═ belta (14.5+ k) + (1-belta) × 5.5; belta is the pitch coefficient;
according to a preliminary masking threshold T (k) and an absolute hearing threshold Tabs(k) Determining a masking threshold RTT(Ω):
RTT(Ω)=min(T(k),Tabs(k)),
Wherein, Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4
8. The system of claim 6, wherein the filtering module is further configured to determine a weighting coefficient H (Ω) of the filtering process according to the power spectral density of the echo signal and the power spectral density of the noise signal:
H(Ω)=min(1,sqrt(RTT(Ω)/(Rbb(Ω)+Rnn(Ω)))+(zeta_b*Rbb(Ω)+zeta_n*Rnn(Ω))/(Rbb(Ω)+Rnn(Ω))),
wherein R isbb(omega) is the power spectral density, R, of the echo signalnn(Ω) is the power spectral density of the noise signal, zeta _ b is the echo attenuation coefficient, and zeta _ n is the noise attenuation coefficient.
9. The system of claim 6, wherein the signal acquisition module is further configured to receive an initial sound signal; and carrying out echo cancellation on the initial sound signal to obtain the sound signal to be processed.
10. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any one of claims 1-5.
11. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.
CN201811645765.5A 2018-12-29 2018-12-29 Method and system for processing sound signal Active CN109727605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811645765.5A CN109727605B (en) 2018-12-29 2018-12-29 Method and system for processing sound signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811645765.5A CN109727605B (en) 2018-12-29 2018-12-29 Method and system for processing sound signal

Publications (2)

Publication Number Publication Date
CN109727605A CN109727605A (en) 2019-05-07
CN109727605B true CN109727605B (en) 2020-06-12

Family

ID=66298550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811645765.5A Active CN109727605B (en) 2018-12-29 2018-12-29 Method and system for processing sound signal

Country Status (1)

Country Link
CN (1) CN109727605B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110931007B (en) * 2019-12-04 2022-07-12 思必驰科技股份有限公司 Voice recognition method and system
CN111524498B (en) * 2020-04-10 2023-06-16 维沃移动通信有限公司 Filtering method and device and electronic equipment
CN116320123B (en) * 2022-08-11 2024-03-08 荣耀终端有限公司 Voice signal output method and electronic equipment
CN117392994B (en) * 2023-12-12 2024-03-01 腾讯科技(深圳)有限公司 Audio signal processing method, device, equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107993670A (en) * 2017-11-23 2018-05-04 华南理工大学 Microphone array voice enhancement method based on statistical model

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2226794B1 (en) * 2009-03-06 2017-11-08 Harman Becker Automotive Systems GmbH Background noise estimation
EP2284831B1 (en) * 2009-07-30 2012-03-21 Nxp B.V. Method and device for active noise reduction using perceptual masking
CN101777349B (en) * 2009-12-08 2012-04-11 中国科学院自动化研究所 Auditory perception property-based signal subspace microphone array voice enhancement method
CN101894563B (en) * 2010-07-15 2013-03-20 瑞声声学科技(深圳)有限公司 Voice enhancing method
CN103824564A (en) * 2014-03-17 2014-05-28 上海申磬产业有限公司 Voice enhancement method for use in voice identification process of electric wheelchair
CN105280195B (en) * 2015-11-04 2018-12-28 腾讯科技(深圳)有限公司 The processing method and processing device of voice signal
CN107393550B (en) * 2017-07-14 2021-03-19 深圳永顺智信息科技有限公司 Voice processing method and device
US10079026B1 (en) * 2017-08-23 2018-09-18 Cirrus Logic, Inc. Spatially-controlled noise reduction for headsets with variable microphone array orientation
CN108564963B (en) * 2018-04-23 2019-10-18 百度在线网络技术(北京)有限公司 Method and apparatus for enhancing voice
CN108735225A (en) * 2018-04-28 2018-11-02 南京邮电大学 It is a kind of based on human ear masking effect and Bayesian Estimation improvement spectrum subtract method
CN108735229B (en) * 2018-06-12 2020-06-19 华南理工大学 Amplitude and phase joint compensation anti-noise voice enhancement method based on signal-to-noise ratio weighting

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107993670A (en) * 2017-11-23 2018-05-04 华南理工大学 Microphone array voice enhancement method based on statistical model

Also Published As

Publication number Publication date
CN109727605A (en) 2019-05-07

Similar Documents

Publication Publication Date Title
CN109727605B (en) Method and system for processing sound signal
CN109727604B (en) Frequency domain echo cancellation method for speech recognition front end and computer storage medium
CN107123430B (en) Echo cancellation method, device, conference tablet and computer storage medium
CN111341336B (en) Echo cancellation method, device, terminal equipment and medium
JP3568922B2 (en) Echo processing device
CN111951819A (en) Echo cancellation method, device and storage medium
EP2761617B1 (en) Processing audio signals
CN111768796B (en) Acoustic echo cancellation and dereverberation method and device
CN110176244B (en) Echo cancellation method, device, storage medium and computer equipment
US20160066087A1 (en) Joint noise suppression and acoustic echo cancellation
US8306821B2 (en) Sub-band periodic signal enhancement system
CN111524498B (en) Filtering method and device and electronic equipment
US11349525B2 (en) Double talk detection method, double talk detection apparatus and echo cancellation system
CN109102821B (en) Time delay estimation method, time delay estimation system, storage medium and electronic equipment
US11380312B1 (en) Residual echo suppression for keyword detection
KR102190833B1 (en) Echo suppression
CN108922517A (en) The method, apparatus and storage medium of training blind source separating model
US20160073209A1 (en) Maintaining spatial stability utilizing common gain coefficient
CN111756906B (en) Echo suppression method and device for voice signal and computer readable medium
CN103370741B (en) Process audio signal
CN106297816B (en) Echo cancellation nonlinear processing method and device and electronic equipment
CN111445916B (en) Audio dereverberation method, device and storage medium in conference system
CN111989934B (en) Echo cancellation device, echo cancellation method, signal processing chip, and electronic apparatus
CN115620737A (en) Voice signal processing device, method, electronic equipment and sound amplification system
CN112489669B (en) Audio signal processing method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Patentee after: Sipic Technology Co.,Ltd.

Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Patentee before: AI SPEECH Ltd.

CP01 Change in the name or title of a patent holder