CN109727605B

CN109727605B - Method and system for processing sound signal

Info

Publication number: CN109727605B
Application number: CN201811645765.5A
Authority: CN
Inventors: 袁斌
Original assignee: AI Speech Ltd
Current assignee: Sipic Technology Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2020-06-12
Anticipated expiration: 2038-12-29
Also published as: CN109727605A

Abstract

The invention discloses a method and a system for processing a sound signal. One embodiment of the method comprises: acquiring a sound signal to be processed, wherein the sound signal to be processed comprises a target sound signal and an interference sound signal; determining the power spectral density of the interference sound signal, and performing weighting processing on the sound signal to be processed according to the power spectral density to obtain a frequency spectrum estimation of a target sound signal; determining a masking threshold from the spectral estimate; and under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold, carrying out filtering processing on the sound signal to be processed. The method can reduce the distortion of the sound signal, sound more naturally, reduce the complexity of algorithm calculation and accelerate the convergence speed of the preposed echo canceller. And the robustness under strong background noise and near-end voice environment can be improved.

Description

Method and system for processing sound signal

Technical Field

The present invention relates to the field of signal processing technologies, and in particular, to a method and a system for processing a sound signal.

Background

In the prior art, the 'music noise' can be reduced in the filtering processing of the sound signal, but the problem that the voice signal after the noise reduction processing of the filter is not natural to a certain extent exists. This phenomenon is called masking effect, because the human ear is likely to be disturbed and suppressed by one sound when receiving another sound. The closer the tones or time of the two sounds are, the more serious the masking effect is, so that the original characteristics of the residual noise after the noise reduction processing by the post filter are lost, and the auditory test is unnatural to a certain extent.

Disclosure of Invention

Embodiments of the present invention provide a method and system for processing an audio signal, so as to solve at least one of the above technical problems.

In a first aspect, an embodiment of the present invention provides a method for processing a sound signal, including: acquiring a sound signal to be processed, wherein the sound signal to be processed comprises a target sound signal and an interference sound signal; determining the power spectral density of the interference sound signal, and performing weighting processing on the sound signal to be processed according to the power spectral density to obtain a frequency spectrum estimation of a target sound signal; determining a masking threshold from the spectral estimate; and under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold, carrying out filtering processing on the sound signal to be processed.

Optionally, the disturbing sound signal comprises a noise signal and an echo signal.

Optionally, the step of performing weighting processing on the sound signal to be processed according to the power spectral density to obtain a spectral estimate of the target sound signal includes:

converting the sound signal to be processed into a frequency domain signal E (omega);

the posterior signal-to-noise ratio PostSNR (Ω) is determined according to the following formula:

PostSNR(Ω)＝|E(Ω)|²/(R_bb(Ω)+R_nn(Ω))，

wherein R is_bb(omega) is the power spectral density, R, of the echo signal_nn(Ω) is the power spectral density of the noise signal;

the a priori signal-to-noise ratio PrioriSNR (Ω) is derived according to the following equation:

PrioriSNR(Ω_i)＝(1-alpha)*P(PostSNR(Ω_i)-1)+alpha*|S’(Ω_i-1)|²/R_bb(Ω)；

where alpha is a smoothing factor, p (x) ═ (| x | + x)/2, S' (Ω)_i-1) Estimating a frequency spectrum of the last frame of sound signal;

further calculating a weighting factor H_LSA(Ω), and obtaining a spectral estimate S' (Ω) of the target sound signal:

S’(Ω)＝E(Ω)*H_LSA(Ω)，

where theta is PostSNR (Ω) × PrioriSNR (Ω)/(PrioriSNR (Ω) + 1).

Optionally, in a case that it is determined that a spectral component of an interfering sound signal in the sound signal to be processed is greater than the masking threshold, the step of performing filtering processing on the sound signal to be processed includes:

determining a weighting coefficient H (omega) of the filtering process according to the power spectral density of the echo signal and the power spectral density of the noise signal:

H(Ω)＝min(1,sqrt(R_TT(Ω)/(R_bb(Ω)+R_nn(Ω))) +(zeta_b*R_bb(Ω)+zeta_n*R_nn(Ω))/(R_bb(Ω)+R_nn(Ω)))，

wherein R is_bb(omega) is the power spectral density, R, of the echo signal_nn(Ω) is the power spectral density of the noise signal, zeta _ b is the echo attenuation coefficient, and zeta _ n is the noise attenuation coefficient.

Optionally, the step of determining a masking threshold from the spectral estimate comprises:

determining, from the spectral estimation, a power spectral density b (k) and an extended critical band spectrum c (k) of a critical band of the sound signal to be processed:

C(k)＝B(k)*SF(k)，

wherein sf (k) ═ 15.81+7.5 × k +0.474-17.5 × sqrt (1+ (k +0.474)2), bh, bl are the upper and lower limit frequencies of each critical band, respectively;

determining a preliminary masking threshold t (k) according to the spread critical band spectrum c (k) and the offset function o (k):

T(k)＝10^{lg(C(k))-(O(k)/10)}，

wherein the offset function o (k) ═ belta (14.5+ k) + (1-belta) × 5.5; belta is the pitch coefficient;

according to a preliminary masking threshold T (k) and an absolute hearing threshold T_abs(k) Determining a masking threshold R_TT(Ω)：

R_TT(Ω)＝min(T(k),T_abs(k))，

Wherein, T_abs(k)＝3.64f^-0.8-6.5exp(f-3.3)²+10^-3f⁴。

Optionally, the step of acquiring the sound signal to be processed includes:

receiving an initial sound signal;

and carrying out echo cancellation on the initial sound signal to obtain the sound signal to be processed.

Optionally, the sound signal to be processed is a speech signal.

In a second aspect, an embodiment of the present invention provides a system for processing a sound signal, including: the system comprises a signal acquisition module, a processing module and a processing module, wherein the signal acquisition module is used for acquiring a sound signal to be processed, and the sound signal to be processed comprises a target sound signal and an interference sound signal; the frequency spectrum estimation determining module is used for determining the power spectral density of the interference sound signal and performing weighting processing on the sound signal to be processed according to the power spectral density to obtain frequency spectrum estimation of a target sound signal; a masking threshold determination module for determining a masking threshold from the spectral estimate; and the filtering processing module is used for performing filtering processing on the sound signal to be processed under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold.

Optionally, the spectrum estimation determining module is further configured to convert the sound signal to be processed into a frequency domain signal E (Ω); and determining the posterior signal-to-noise ratio PostSNR (Ω) according to the following formula:

PostSNR(Ω)＝|E(Ω)|²/(R_bb(Ω)+R_nn(Ω))，

further calculating a weighting factor H_LSA(omega) and obtaining the target sound messageSpectral estimation of sign S' (Ω):

S’(Ω)＝E(Ω)*H_LSA(Ω)，

where theta is PostSNR (Ω) × PrioriSNR (Ω)/(PrioriSNR (Ω) + 1).

Optionally, the masking threshold determining module is further configured to determine, according to the spectrum estimation, a power spectral density b (k) and an extended critical band spectrum c (k) of a critical band of the sound signal to be processed:

C(k)＝B(k)*SF(k)，

T(k)＝10^{lg(C(k))-(O(k)/10)}，

R_TT(Ω)＝min(T(k),T_abs(k))，

Wherein, T_abs(k)＝3.64f^-0.8-6.5exp(f-3.3)²+10^-3f⁴。

Optionally, the filtering processing module is further configured to determine a weighting coefficient H (Ω) of the filtering processing according to the power spectral density of the echo signal and the power spectral density of the noise signal:

Optionally, the signal obtaining module is further configured to receive an initial sound signal; and carrying out echo cancellation on the initial sound signal to obtain the sound signal to be processed.

In a third aspect, embodiments of the present invention provide a storage medium, in which one or more programs including execution instructions are stored, where the execution instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform any one of the above methods for processing a sound signal according to the present invention.

In a fourth aspect, an electronic device is provided, comprising: the system comprises at least one processor and a memory which is connected with the at least one processor in a communication mode, wherein the memory stores instructions which can be executed by the at least one processor, and the instructions are executed by the at least one processor so as to enable the at least one processor to execute any method and system for processing sound signals.

In a fifth aspect, the present invention further provides a computer program product, which includes a computer program stored on a storage medium, the computer program including program instructions, when executed by a computer, cause the computer to execute any one of the above methods and systems for processing a sound signal.

The embodiment of the invention has the beneficial effects that: the sound signal distortion can be reduced, the sound is more natural, the masking threshold value is further determined through the calculated power spectral density PSD of the interference sound signal, and the algorithm calculation complexity is reduced. And the order requirement on the pre-echo cancellation filter is reduced, and the convergence speed of the pre-echo canceller is further accelerated. And the robustness under strong background noise and near-end voice environment can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1 is a flowchart illustrating a method for processing an audio signal according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating another embodiment of a method for processing an audio signal according to the present invention;

FIG. 3 is a diagram illustrating an embodiment of a system for implementing a method for processing a speech signal according to the present invention;

FIG. 4 is a diagram illustrating an embodiment of a method for processing a speech signal according to the present invention;

FIG. 5 is a diagram illustrating an embodiment of a system for processing an audio signal according to the present invention;

fig. 6 is a schematic structural diagram of an embodiment of an electronic device according to the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

As used in this disclosure, "module," "device," "system," and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, or software in execution. In particular, for example, an element may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. Also, an application or script running on a server, or a server, may be an element. One or more elements may be in a process and/or thread of execution and an element may be localized on one computer and/or distributed between two or more computers and may be operated by various computer-readable media. The elements may also communicate by way of local and/or remote processes in accordance with a signal having one or more data packets, e.g., signals from data interacting with another element in a local system, distributed system, and/or across a network of the internet with other systems by way of the signal.

Finally, it should be further noted that relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

As shown in fig. 1, an embodiment of the present invention provides a method of processing a sound signal, including:

step S11: and acquiring a sound signal to be processed, wherein the sound signal to be processed comprises a target sound signal and an interference sound signal.

Step S12: and determining the power spectral density of the interference sound signal, and performing weighting processing on the sound signal to be processed according to the power spectral density to obtain the frequency spectrum estimation of the target sound signal. Specifically, after the power spectral density of the interfering sound signal is determined, posterior and prior signal-to-noise ratios are determined, a weighting coefficient is calculated according to the signal-to-noise ratios, and the sound signal to be processed is weighted to obtain the frequency spectrum estimation of the target sound information.

Step S13: a masking threshold is determined from the spectral estimate.

Step S14: and under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold, filtering the sound signal to be processed.

And, in the embodiment of the present invention, for the calculation of the masking threshold, specifically:

determining, from the spectral estimation, the power spectral density b (k) and the extended critical band spectrum c (k) of the critical band of the sound signal to be processed:

C(k)＝B(k)*SF(k)，

T(k)＝10^{lg(C(k))-(O(k)/10)}，

R_TT(Ω)＝min(T(k),T_abs(k))，

Wherein, T_abs(k)＝3.64f^-0.8-6.5exp(f-3.3)²+10^-3f⁴。

According to the embodiment of the invention, the masking threshold is further determined through the calculated power spectral density PSD of the interference sound signal, and the complexity of algorithm calculation is reduced in the process. And the order requirement on the prepositive echo cancellation filter is reduced, and the convergence speed of the prepositive echo canceller is accelerated. And the robustness under strong background noise and near-end voice environment can be improved.

As shown in fig. 2, an embodiment of the present invention provides a method of processing a sound signal, including:

step S21: an initial sound signal is received. The initial sound signal may be picked up by a sound pickup device such as a microphone.

Step S22: and performing echo cancellation on the initial sound signal through an echo canceller to obtain a sound signal to be processed.

Step S23: and determining the power spectral density of the interference sound signal, and performing weighting processing on the sound signal to be processed according to the power spectral density to obtain the frequency spectrum estimation of the target sound signal.

Step S24: a masking threshold is determined from the spectral estimate.

Step S25: and under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold, filtering the sound signal to be processed.

According to the embodiment of the invention, after the initial signal is received, the echo cancellation is performed on the initial signal initially, so that the processing precision of the sound signal can be improved.

If the sound signal to be processed comprises a noise signal and an echo signal, the sound signal to be processed is weighted according to the power spectral density, so as to obtain the frequency spectrum estimation process of the target sound signal:

PostSNR(Ω)＝|E(Ω)|²/(R_bb(Ω)+R_nn(Ω))，

S’(Ω)＝E(Ω)*H_LSA(Ω)，

where theta is PostSNR (Ω) × PrioriSNR (Ω)/(PrioriSNR (Ω) + 1).

Under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold, the step of filtering the sound signal to be processed comprises the following steps:

The embodiment of the invention keeps the original background noise characteristic, the residual echo hearing test is more noise-like, and the voice distortion is reduced, so that the sound can be heard more naturally. And the order requirement on the prepositive echo cancellation filter is reduced, so that the convergence speed of the prepositive echo canceller is accelerated, and the algorithm calculation complexity of the echo canceller is reduced. And the robustness under strong background noise and near-end speech environment can be improved.

As shown in fig. 3, in the embodiment of the present invention, the method of processing a speech signal of the present invention implements a speech signal transmitted from a far-end microphone in the system, which is shown by a loudspeaker, and constitutes an initial echo signal d (k). The near-end microphone picks up a voice signal y (k) which includes a pure voice signal s (k) which is a target voice signal, a noise signal n (k), and an initial echo signal d (k) fed back by the speaker through the LRM. First, the echo canceller C performs echo cancellation on the voice signal y (k) picked up by the near-end microphone, and the filter H further performs filtering processing.

As shown in fig. 4, an embodiment of the present invention provides a method for processing a speech signal, including:

the near-end microphone picks up a voice signal y (k) which comprises a pure voice signal s (k), a noise signal n (k) and an initial echo signal d (k) fed back by the LRM through the loudspeaker. In the embodiment of the invention, the pure voice signal is target information.

The echo canceller performs echo cancellation on the voice signal y (k) picked up by the near-end microphone to obtain a voice signal e (k) after echo cancellation. The echo-cancelled speech signal e (k) comprises interfering sound signals, which are noise signals and residual echo signals.

Estimating noise PSD Rnn (omega) and residual echo PSD R by a statistical or autocorrelation method_bb(Ω)。

And the post filter carries out weighting processing on the near-end microphone signal after the echo cancellation to obtain a primary frequency spectrum estimation S' (omega) of the pure speech signal. The specific process comprises the following steps:

a) calculating the posterior signal-to-noise ratio:

PostSNR(Ω)＝|E(Ω)|²/(R_bb(Ω)+R_nn(Ω))

b) deriving a priori signal-to-noise ratio according to a decision-directed method:

PrioriSNR(Ω_i)＝(1-alpha)*P(PostSNR(Ω_i)-1)+alpha*|S’(Ω_i-1)|²/R_bb(Ω)

where alpha is a smoothing factor, p (x) ═ (| x | + x)/2, S' (Ω)_i-1) Is a preliminary estimate of the previous frame of speech signal.

c) Defining theta ═ PostSNR (Ω) × PrioriSNR (Ω)/(PrioriSNR (Ω) +1), then calculating the weighting coefficients:

d) weighting to obtain initial estimation S' (omega) E (omega) H of voice signal_LSA(Ω)

Then, a masking threshold R is estimated from the preliminary estimate S' (Ω) of the speech signal spectrum_TT(omega). The specific process comprises the following steps:

a) performing critical band analysis on the signal, and regarding the human ear as a discrete band-pass filter bank according to the position theory, wherein one critical band is called a Bark, then

Power spectral density of each critical band

Wherein bh and bl are the upper and lower limit frequencies of each critical frequency band, respectively, and k is related to the sampling rate.

b) Calculating the spreading function sf (k):

SF(k)＝15.81+7.5*k+0.474-17.5*sqrt(1+(k+0.474)2)

due to the interaction between critical bands, the spread spectrum critical band spectrum can be expressed as c (k) ═ b (k) × sf (k).

c) Calculating a masking threshold R for masking noise and residual echo_TT(Ω)。

Since there are two masking thresholds, respectively: the threshold for masking noise and residual echo is C (k) - (14.5+ k) db, and the threshold for masking pure noise and residual echo is C (k) -5.5 db.

Therefore, determining whether the signal resembles a pure tone or noise and a residual echo, and hence the spectral flatness measure SFM needs to be defined:

SFM＝10*lg(G/A)

wherein G, a are the geometric mean and the arithmetic mean, respectively, of the power spectral density of the signal.

And defining pitch coefficient belta ═ min (SFM/SFM)_max,1)

Calculating the offset function O (k) of the masking energy of each frequency band by belta:

O(k)＝belta*(14.5+k)+(1-belta)*5.5

the masking threshold size is then: t (k) ═ 10^{lg(C(k))-(O(k)/10)}

Returning the calculated spreading function threshold to Bark domain

Comparing with an absolute hearing threshold of the human ear, and taking the absolute hearing threshold value if the calculated masking threshold value is lower than the absolute hearing threshold of the human ear, wherein the absolute hearing threshold Tabs (k) is defined as:

Tabs(k)＝3.64f^-0.8-6.5exp(f-3.3)²+10^-3f⁴

therefore, the final masking threshold is R_TT(Ω)＝min(T(k),T_abs(k))。

Further, psychoacoustic weighted filtering is performed on the frequency domain microphone signal E (Ω) after echo cancellation. The FFT (fast Fourier transform) can be used for converting the digital signal of the time domain into a frequency domain signal, judging whether the noise frequency spectrum component in the frequency domain microphone signal E (omega) after echo cancellation is smaller than a masking threshold value or not, and if so, reserving and not processing the signal; if not, the corresponding noise frequency spectrum component is attenuated according to the traditional MMSE-LSA.

The specific derivation process of the psychoacoustic weighting filter coefficients is as follows:

the design goal of psycho-acoustic adaptive weighted filtering is to minimize the distortion of the near-end speech signal when the sum of the residual echo distortion and the noise distortion is equal to the masking threshold, so the optimal psycho-acoustic weighted filter coefficients H (Ω) satisfy:

[zeta_b–H(Ω)]2R_bb(Ω)+[zeta_n–H(Ω)]2R_nn(Ω)＝R_TT(Ω)

where, zeta _ b is a residual echo attenuation coefficient, and is usually 20lg (zeta _ b) — 35;

the zeta _ n is a noise attenuation coefficient, and is usually 20lg (zeta _ n) — 15.

Since 0< ═ H (Ω) < ═ 1, solving the above quadratic equation H (Ω) takes a positive value:

H(Ω)＝min(1,[zeta_b*R_bb(Ω)+zeta_n*R_nn(Ω)+

sqrt([R_bb(Ω)+R_nn(Ω)]*R_TT(Ω)-[zeta_b-zeta_n]²*R_bb(Ω)*R_bb(Ω))]/(R_bb(Ω)+ R_nn(Ω)))

because of zeta _ b, zeta _ n is much less than 1 and is generally relative to R_bb(omega) and R_bb(omega) for R_TT(Ω) is not too small, so the above formula can be simplified to:

H(Ω)＝min(1,sqrt(R_TT(Ω)/(R_bb(Ω)+R_nn(Ω))) +(zeta_b*R_bb(Ω)+zeta_n*R_nn(Ω))/(R_bb(Ω)+R_nn(Ω)))

in the embodiment of the invention, because the psychoacoustic post filter can also reduce the order requirement on the pre echo cancellation self-adaptive filter, the convergence speed of the echo canceller can be accelerated, the computational complexity of the algorithm is reduced, and the robustness of the echo canceller in the environment of strong background noise and near-end voice can be improved.

And fusing residual echo cancellation in the post-psychoacoustic weighting filter, and adaptively updating the weighting coefficient of the filter by using the residual echo to further cancel the acoustic echo. In addition, the noise spectrum and the residual echo component below the masking threshold are inaudible due to the human ear masking effect, so the noise spectrum and the residual echo component do not need to be attenuated, and the traditional post-adaptive filtering method is only needed to be used for attenuating the noise spectrum and the residual echo component which are not masked by the voice signal, so the original background noise characteristic is well reserved, the residual echo hearing test is more noise-like, the voice distortion is reduced, and the hearing is more natural.

It should be noted that for simplicity of explanation, the foregoing method embodiments are described as a series of acts or combination of acts, but those skilled in the art will appreciate that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention. In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

As shown in fig. 5, an embodiment of the present invention further provides a system 500 for processing a sound signal, including:

the signal obtaining module 510 is configured to obtain a sound signal to be processed, where the sound signal to be processed includes a target sound signal and an interfering sound signal.

The spectrum estimation determining module 520 is configured to determine a power spectral density of the interfering sound signal, and perform weighting processing on the sound signal to be processed according to the power spectral density to obtain a spectrum estimation of the target sound signal.

A masking threshold determination module 530 for determining a masking threshold from the spectral estimate.

And the filtering processing module 540 is configured to perform filtering processing on the sound signal to be processed when it is determined that the frequency spectrum component of the interfering sound signal in the sound signal to be processed is greater than the masking threshold.

Further, the disturbing sound signal includes a noise signal and an echo signal.

The frequency spectrum estimation determining module is further used for converting the sound signal to be processed into a frequency domain signal E (omega); and determining the posterior signal-to-noise ratio PostSNR (Ω) according to the following formula:

PostSNR(Ω)＝|E(Ω)|²/(R_bb(Ω)+R_nn(Ω))，

wherein R is_bb(omega) is the power spectral density, R, of the echo signal_nn(omega) isA power spectral density of the noise signal;

S’(Ω)＝E(Ω)*H_LSA(Ω)，

where theta is PostSNR (Ω) × PrioriSNR (Ω)/(PrioriSNR (Ω) + 1).

The masking threshold determining module is further configured to determine, according to the spectrum estimation, a power spectral density b (k) of a critical band and an extended critical band spectrum c (k) of the sound signal to be processed:

C(k)＝B(k)*SF(k)，

T(k)＝10^{lg(C(k))-(O(k)/10)}，

R_TT(Ω)＝min(T(k),T_abs(k))，

Wherein, T_abs(k)＝3.64f^-0.8-6.5exp(f-3.3)²+10^-3f⁴。

The filtering processing module is further configured to determine a weighting coefficient H (Ω) of the filtering processing according to the power spectral density of the echo signal and the power spectral density of the noise signal:

The signal acquisition module is also used for receiving an initial sound signal; and carrying out echo cancellation on the initial sound signal to obtain a sound signal to be processed.

In some embodiments, the present invention provides a non-transitory computer readable storage medium, in which one or more programs including executable instructions are stored, the executable instructions being readable and executable by an electronic device (including but not limited to a computer, a server, or a network device, etc.) for executing any of the above methods of processing a sound signal of the present invention.

In some embodiments, the present invention further provides a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform any of the above methods of processing sound signals.

In some embodiments, an embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of processing sound signals.

In some embodiments, the present invention further provides a storage medium having a computer program stored thereon, wherein the program is a method of processing a sound signal when executed by a processor.

The system for processing a sound signal according to the above embodiment of the present invention can be used to execute the method for processing a sound signal according to the above embodiment of the present invention, and accordingly achieve the technical effect achieved by the method for processing a sound signal according to the above embodiment of the present invention, and will not be described again here. In the embodiment of the present invention, the relevant functional module may be implemented by a hardware processor (hardware processor).

Fig. 6 is a schematic diagram of a hardware structure of an electronic device for executing a method of processing a sound signal according to another embodiment of the present application, where as shown in fig. 6, the electronic device includes:

one or more processors 610 and a memory 620, with one processor 610 being an example in fig. 6.

The apparatus performing the method of processing a sound signal may further include: an input device 630 and an output device 640.

The processor 610, the memory 620, the input device 630, and the output device 640 may be connected by a bus or other means, such as being connected by a bus in fig. 6.

The memory 620, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the method of processing sound signals in the embodiments of the present application. The processor 610 executes various functional applications of the server and data processing, i.e., implements the method of processing sound signals of the above-described method embodiments, by executing the nonvolatile software programs, instructions, and modules stored in the memory 620.

The memory 620 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the apparatus for processing a sound signal, and the like. Further, the memory 620 may include high speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 620 optionally includes memory located remotely from the processor 610, which may be connected to a device that processes sound signals via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 630 may receive input numeric or character information and generate signals related to user settings and function control of the device that processes the sound signals. The output device 640 may include a display device such as a display screen.

The one or more modules are stored in the memory 620 and, when executed by the one or more processors 610, perform a method of processing a sound signal in any of the method embodiments described above.

The product can execute the method provided by the embodiment of the application, and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.

(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.

(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.

(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.

(5) And other electronic devices with data interaction functions.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

Through the above description of the embodiments, those skilled in the art will clearly understand that the embodiments may be implemented by software plus a general hardware platform, and may also be implemented by hardware. Based on such understanding, the technical solutions mentioned above may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute the method according to the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method of processing a sound signal, comprising:

acquiring a sound signal to be processed, wherein the sound signal to be processed comprises a target sound signal and an interference sound signal, and the interference sound signal comprises a noise signal and an echo signal;

determining a power spectral density of the interfering sound signal;

PostSNR(Ω)＝|E(Ω)|²/(R_bb(Ω)+R_nn(Ω))，

where alpha is a smoothing factor, p (x) ═ (| x | + x)/2, S' (Ω)_i-1) Estimating the frequency spectrum of the sound signal of the previous frame;

S’(Ω)＝E(Ω)*H_LSA(Ω)，

wherein theta is PostSNR (Ω) × PrioriSNR (Ω)/(PrioriSNR (Ω) + 1);

determining a masking threshold from the spectral estimate;

and under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold, carrying out filtering processing on the sound signal to be processed.

2. The method according to claim 1, wherein the step of performing filtering processing on the sound signal to be processed when it is determined that the spectral component of the interfering sound signal in the sound signal to be processed is greater than the masking threshold comprises:

H(Ω)＝min(1,sqrt(R_TT(Ω)/(R_bb(Ω)+R_nn(Ω)))+(zeta_b*R_bb(Ω)+zeta_n*R_nn(Ω))/(R_bb(Ω)+R_nn(Ω)))，

3. The method of claim 1, wherein the step of determining a masking threshold based on the spectral estimate comprises:

C(k)＝B(k)*SF(k)，

T(k)＝10^{lg(C(k))-(O(k)/10)}，

R_TT(Ω)＝min(T(k),T_abs(k))，

Wherein, T_abs(k)＝3.64f^-0.8-6.5exp(f-3.3)²+10^-3f⁴。

4. The method of claim 1, wherein the step of obtaining the sound signal to be processed comprises:

receiving an initial sound signal;

5. The method according to claim 1, characterized in that the sound signal to be processed is a speech signal.

6. A system for processing a sound signal, comprising:

the device comprises a signal acquisition module, a processing module and a processing module, wherein the signal acquisition module is used for acquiring a sound signal to be processed, the sound signal to be processed comprises a target sound signal and an interference sound signal, and the interference sound signal comprises a noise signal and an echo signal;

the frequency spectrum estimation determining module is used for determining the power spectral density of the interference sound signal and carrying out weighting processing on the sound signal to be processed according to the power spectral density to obtain the frequency spectrum estimation of the target sound signal;

a masking threshold determination module for determining a masking threshold from the spectral estimate;

the filtering processing module is used for performing filtering processing on the sound signal to be processed under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold;

PostSNR(Ω)＝|E(Ω)|²/(R_bb(Ω)+R_nn(Ω))，

S’(Ω)＝E(Ω)*H_LSA(Ω)，

where theta is PostSNR (Ω) × PrioriSNR (Ω)/(PrioriSNR (Ω) + 1).

7. The system according to claim 6, wherein the masking threshold determination module is further configured to determine, according to the spectral estimation, the power spectral density B (k) and the spectrum C (k) of the critical band of the sound signal to be processed:

C(k)＝B(k)*SF(k)，

wherein sf (k) ═ 15.81+7.5 ═ k +0.474 to 17.5 · sqrt (1+ (k +0.474)²) Bh and bl are the upper and lower limit frequencies of each critical frequency band respectively;

T(k)＝10^{lg(C(k))-(O(k)/10)}，

R_TT(Ω)＝min(T(k),T_abs(k))，

Wherein, T_abs(k)＝3.64f^-0.8-6.5exp(f-3.3)²+10^-3f⁴。

8. The system of claim 6, wherein the filtering module is further configured to determine a weighting coefficient H (Ω) of the filtering process according to the power spectral density of the echo signal and the power spectral density of the noise signal:

9. The system of claim 6, wherein the signal acquisition module is further configured to receive an initial sound signal; and carrying out echo cancellation on the initial sound signal to obtain the sound signal to be processed.

10. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any one of claims 1-5.

11. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.