CN115798501A

CN115798501A - Voice noise reduction method and device and electronic equipment

Info

Publication number: CN115798501A
Application number: CN202211565712.9A
Authority: CN
Inventors: 阎张懿
Original assignee: Shenzhen Zhongke Lanxun Technology Co ltd
Current assignee: Shenzhen Zhongke Lanxun Technology Co ltd
Priority date: 2022-12-07
Filing date: 2022-12-07
Publication date: 2023-03-14

Abstract

The embodiment of the invention discloses a voice noise reduction method, a voice noise reduction device and electronic equipment. The method comprises the following steps: obtaining a preselected noise reduction model; acquiring a signal with noise; carrying out noise reduction processing on the signal with noise according to the preselected noise reduction model, and acquiring a signal-to-noise parameter of the signal with noise; and estimating the prior signal-to-noise ratio according to the signal-to-noise parameters to realize noise reduction. Through the mode, the embodiment of the invention can combine the traditional noise reduction and the artificial intelligence noise reduction, improves the noise reduction effect on steady noise and non-steady noise, and reduces the requirements on the calculation power and the storage volume of the terminal compared with the simple artificial intelligence noise reduction, thereby being applied to light terminals with limited resources.

Description

Voice noise reduction method and device and electronic equipment

Technical Field

The embodiment of the invention relates to the field of voice noise reduction, in particular to a voice noise reduction method and device and electronic equipment.

Background

In practical applications, the collected original speech signal usually has noise, and before further use of the speech signal (for example, before speech recognition), the original speech signal often needs to be subjected to noise reduction processing to improve the reliability of subsequent processing.

In the prior art, a plurality of microphones are usually used for positioning sound sources, noise reduction processing on original voice signals is realized by combining a noise reduction algorithm, the noise reduction effect is poor, and the noise reduction algorithm has high processing difficulty on the voice signals and is not ideal due to the low signal-to-noise ratio of a far-field sound source. In addition, conventional noise reduction has a good effect of eliminating stationary noise, but has a general effect of eliminating non-stationary noise. In addition to traditional noise reduction methods, AI noise reduction is currently also commonly applied, and AI noise reduction is also applicable to non-stationary noise. However, AI noise reduction involves a lot of machine training, so applying AI noise reduction has a high requirement on the computational power of the terminal. AI denoising is therefore not applicable to computationally resource-constrained lightweight terminals.

Disclosure of Invention

In order to solve the above technical problem, one technical solution adopted by the embodiment of the present invention is: provided is a voice noise reduction method, comprising: obtaining a preselected noise reduction model; acquiring a signal with noise; carrying out noise reduction processing on the noisy signal according to the preselected noise reduction model, and acquiring a signal-to-noise parameter of the noisy signal; and estimating the prior signal-to-noise ratio according to the signal-to-noise parameters to realize noise reduction.

In some embodiments, after acquiring the noisy signal, the method further comprises: acquiring a training noisy signal sample; and processing the training noisy signal sample according to the label.

In some embodiments, said obtaining a preselected noise reduction model comprises: obtaining a plurality of noise reduction models; presenting options of the number of noise reduction models in an interactive interface; and responding to the selection of the option of the interactive interface, and taking the noise reduction model corresponding to the option as the preselected noise reduction model.

In some embodiments, said noise reducing said noisy signal according to said preselected noise reduction model comprises: acquiring signal characteristics from the signal with the noise, wherein the signal characteristics are the amplitude spectrum of the signal with the noise; training the preselected noise reduction model by using the signal features; and acquiring the signal-to-noise parameters of the signal with noise according to the trained preselected noise reduction model.

In some embodiments, said estimating an a priori signal-to-noise ratio from said signal-to-noise parameter to achieve noise reduction comprises: acquiring a signal-to-noise ratio gain function according to the signal-to-noise parameters; and estimating the prior signal-to-noise ratio according to the signal-to-noise ratio gain function.

In some embodiments, the processing the training noisy signal samples according to labels includes: performing label processing on the training noisy signal sample by adopting a priori signal-to-noise ratio after special normalization; or performing label processing on the training noisy signal sample by adopting a posterior signal-to-noise ratio after special normalization.

In some embodiments, the number of noise reduction models includes a deep neural network, a convolutional neural network, and a cyclic neural network with gaussian masks and self-attention mechanisms.

In order to solve the above technical problem, another technical solution adopted by the embodiment of the present invention is: provided is a voice noise reduction device including: the model acquisition module is used for acquiring a preselected noise reduction model; the signal acquisition module is used for acquiring a signal with noise; the sample acquisition module is used for acquiring a training noisy signal sample; the label processing module is used for processing the training noisy signal sample according to a label; the signal processing module is used for carrying out noise reduction processing on the signal with noise according to the preselected noise reduction model and acquiring a signal-to-noise parameter of the signal with noise; and the parameter estimation module is used for estimating the prior signal-to-noise ratio according to the signal-to-noise parameters to realize noise reduction.

In order to solve the above technical problem, another technical solution adopted by the embodiment of the present invention is: provided is an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a voice noise reduction method as described above.

In order to solve the above technical problem, another technical solution adopted by the embodiment of the present invention is: a non-transitory computer storage medium is provided that stores computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform ^{As above} The voice noise reduction method is described.

The implementation mode of the invention has the beneficial effects that: compared with the simple artificial intelligence noise reduction, the method and the device for reducing the noise of the terminal reduce the requirements on the computational power and the storage volume of the terminal, and can be applied to light terminals with limited resources.

Drawings

FIG. 1 is a schematic diagram of an application scenario of an embodiment of the present invention;

FIG. 2 is a flow chart of a method for reducing noise in speech according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of a speech noise reduction method step S100 according to an embodiment of the present invention;

fig. 4 is a flowchart illustrating a step S500 of a voice denoising method according to an embodiment of the present invention;

fig. 5 is a flowchart illustrating a step S600 of a voice denoising method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a speech noise reduction device according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

Steady state noise: the characteristics of loudness, frequency distribution and the like are always present and do not change or change slowly with time. Such as the noise of the bottom of the equipment such as mobile phones and computers, the noise of fans of the heat dissipation rack of the computers, and the like.

Unsteady-state noise: these noise and other statistical characteristics vary with time, such as door opening and closing sounds, door ring tones, background human voices, and the like. Non-stationary noise is further classified into continuous non-stationary noise (continuous background human voice) and transient noise (tapping sound) according to whether it is continuous or not.

The related technology comprises a common noise reduction algorithm, and the common noise reduction algorithm mainly comprises the following steps: the general noise reduction algorithm of the type usually assumes that noise is additive random stationary noise, and voice has short-time stationary characteristics, and can estimate noise spectrum characteristics through various statistical methods, and then carry out noise suppression on the voice with noise according to the signal-to-noise ratio result obtained by calculation, in the enhanced voice signal after noise suppression, the proportion of noise components is reduced, the voice signal-to-noise ratio is improved, and the voice is clearer and understandable, but the general noise reduction algorithm assumes that noise has the additive random stationary characteristics, but under the actual constraint condition, the noise is divided into stationary noise and non-stationary noise, and for the non-stationary noise, part of noise reduction means in the related technology cannot effectively suppress, and has obvious shortcuts.

The method comprises a training stage and a reasoning stage, wherein the training stage firstly extracts relevant voice time-frequency domain characteristic data such as amplitude spectrum, power spectrum, pitch period, voice endpoint and the like from a large number of voice and noise samples, and trains a designed deep network model (usually a multi-layer network structure such as a deep neural network, a convolutional neural network, a gate control loop unit, a long-short term memory network and the like) by using the large number of data.

The embodiment of the invention provides a voice noise reduction method, a voice noise reduction device and electronic equipment, which are used for improving the noise reduction effect on stable noise and non-stable noise by combining traditional noise reduction and artificial intelligence noise reduction, and reducing the requirements on the computing power and the storage volume of a terminal compared with the simple artificial intelligence noise reduction, so that the voice noise reduction method, the voice noise reduction device and the electronic equipment can be applied to light terminals with limited resources. An exemplary application of the electronic device provided by the embodiment of the present invention is described below, and the device provided by the embodiment of the present invention may be implemented as various types of user terminals such as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, and a portable game device), and may also be implemented as a server.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an application scenario provided in an embodiment of the present application, in order to support a voice call application, a first terminal 310 and a second terminal 320 both establish a communication connection with a server 100 through a network 200. It should be noted that the network 200 may be a wide area network or a local area network, or a combination of both. After the first terminal 310 initiates a voice call request to the second terminal 320 through the server 100, a voice call is established, a voice signal is generated in the voice call process, and the voice signal inevitably carries a noise signal, after the voice signal is sent from the first terminal 310 to the second terminal 320 by the server 100, the server 100 executes a voice noise reduction method provided by the embodiment of the invention to perform noise reduction processing on the voice message, obtain a noise reduction voice signal, and return the noise reduction voice signal to the second terminal 320 for playing. Similarly, after the second terminal 320 initiates the voice call request to the first terminal 310 through the server 100, the voice call may be established, and the specific signal transmission and signal processing are the same as those described above, and are not described herein again.

In some embodiments, a voice noise reduction method provided by the embodiments of the present invention is applied to a voice call application, where the voice call application includes a real-time voice call application and a non-real-time voice call application, and the real-time voice call includes: the process of making a call, the process of making a voice call, the non-real-time voice call comprises: the first terminal 310 and the second terminal 320 used by the user may both receive the voice signal and may also send the voice signal, the first terminal 310 may be an sending end of the voice signal, for the sent voice signal, the noise reduction model running in the server 100 may be invoked to perform noise reduction on the sent voice signal to obtain a noise reduction voice signal, and the noise reduction voice signal is sent to the second terminal 320, for the received voice signal, the second terminal 320 may invoke the noise reduction model running in the server 100 to perform noise reduction on the received voice signal, and the noise reduction voice signal is played in the second terminal 320. Similarly, the second terminal 320 may also be used as a sending end of the voice signal, and the specific process is the same as described above and is not described herein again.

In some embodiments, the voice noise reduction method provided by the embodiment of the present invention is applied to an application having a voice interaction function, where a first terminal 310 used by a user receives a voice signal of the user, the voice signal carries a content of a control instruction, and for the voice signal of the user, a noise reduction model running in the server 100 may be called to perform noise reduction processing on the sent voice signal, perform voice recognition processing on the noise-reduced voice signal, recognize the control instruction, and return the control instruction to the first terminal 310, so that the first terminal 310 continues to respond to the recognized control instruction. The second terminal has the same structure, and is not described herein again.

In some embodiments, a voice noise reduction method provided by the embodiment of the present invention may be directly applied to a terminal, a first terminal 310 used by a user receives a voice signal from a second terminal 320 of another user, and the first terminal 310 directly invokes an adapted local noise reduction model to perform noise reduction processing on the voice signal according to constraints (geographical location information and time information) of the received voice signal, and plays a noise-reduced voice signal corresponding to the voice signal on the first terminal 310. In some embodiments, the server 100 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The first terminal 310 and the second terminal 320 may be, but are not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiment of the present invention.

Based on the application scenario, an embodiment of the present invention provides a speech construction method, a flow diagram of which is shown in fig. 2, and the method includes:

step S100: obtaining a preselected noise reduction model;

in the above process, the implementation of artificial intelligence noise reduction is probably known, and in the artificial intelligence noise reduction process, an essential loop is model training. Therefore, selecting an appropriate noise reduction model for subsequent training is critical to artificial intelligence noise reduction. In some embodiments, step S100 further comprises the following steps, as shown in fig. 3:

step S110: obtaining a plurality of noise reduction models;

specifically, a plurality of noise reduction models are preset, such as a deep neural network, a convolutional neural network, a cyclic neural network, and the like.

In the computation of Deep Neural Networks (DNNs), each neuron is a weighted average of the previous layer. A multi-layer linear network is obtained to model complex signal processing.

Convolutional Neural Networks (CNN) are a class of feed forward Neural Networks (fed forward Neural Networks) that contain convolution computations and have a deep structure, and are one of the representative algorithms for deep learning (deep learning). Convolutional Neural Networks have a representation learning (representation learning) capability, and are capable of performing Shift-Invariant classification (Shift-Invariant classification) on input information according to their hierarchical structure, and are therefore also referred to as "Shift-Invariant Artificial Neural Networks (SIANN)".

A Recurrent Neural Network (RNN) is a type of Recurrent Neural Network (Recurrent Neural Network) in which sequence data is input, recursion is performed in the direction of evolution of the sequence, and all nodes (Recurrent units) are connected in a chain.

Step S120: presenting the selected lines of a plurality of noise reduction models in an interactive interface;

and displaying a plurality of preset noise reduction models on a human-computer interaction interface, and displaying a plurality of options corresponding to the plurality of noise reduction models for a user to select.

Step S130: and in response to the selection of the option of the interactive interface, taking the noise reduction model corresponding to the option as a preselected noise reduction model.

The user can directly select the options on the human-computer interaction interface, and the noise reduction model corresponding to the selected options is used as the preselected noise reduction model in response to the selection of the user on the human-computer interaction interface.

The above embodiment is used to increase the selection space of the noise reduction model, and in addition, a noise reduction model can be preset as a preselected noise reduction model. Preferably, a deep neural network with a Gaussian mask and a self-attention mechanism is used as the pre-selected noise reduction model.

The self-attention (self-attention) mechanism solves the situation: the inputs received by the neural network are vectors with different sizes, and different vectors have a certain relationship, but the relationship between the inputs cannot be fully played in actual training, so that the model training result is extremely poor. For example, the word processing problems include a machine translation problem (sequence-to-sequence problem, how many labels the machine determines), a part-of-speech tagging problem (one label corresponds to one vector), a semantic analysis problem (multiple vectors correspond to one label), and the like. This problem of a fully-connected neural network failing to establish correlations for multiple correlated inputs is addressed by a self-attention mechanism that actually wants the machine to notice correlations between different parts of the overall input.

Step S200: acquiring a signal with noise;

taking the above voice communication as an example, the noisy signal may be a voice signal collected in real time during a real-time voice call, and the real-time voice call includes: the process of making a call, the process of making a voice call, the voice signal can be the voice signal gathered in the non-real-time voice call, the non-real-time voice call includes: the process of sending voice messages, voice messages and the like, and the voice signals can be collected from human-computer interaction voice commands sent by a user aiming at the human-computer interaction interface.

Step S300: acquiring a training noisy signal sample;

the above-mentioned obtaining of the training noisy signal sample can be realized by the following technical scheme: acquiring a plurality of noises; obtaining a pure voice signal sample; and (4) overlapping the noise of each noise set and the pure voice signal sample to obtain a training signal sample with noise.

Step S400: processing the training noisy signal sample according to the label;

firstly, a training noisy signal sample for training is obtained, the training noisy signal sample needs to be labeled in advance, namely, the attribute of each frame of signal needs to be confirmed, such as noise or voice, and the training process needs to give an expected output result (namely, pure voice with noise suppressed under an ideal condition) as a guide for a noise reduction model to perform parameter optimization.

Preferably, label processing is carried out on the training noisy signal sample in advance according to the prior signal-to-noise ratio after special normalization as a label; or the training noisy signal samples are subjected to label processing in advance according to the posterior signal-to-noise ratio after special normalization as a label.

Step S500: carrying out noise reduction processing on the signal with noise according to a preselected noise reduction model;

in some embodiments, step S500 includes the following steps, as shown in fig. 4:

step S510: acquiring signal characteristics from a signal with noise;

preferably, the amplitude spectrum of the noise signal after the interval is taken as the signal characteristic, and the noise signal is processed into the input characteristic input to the preselected noise reduction model through preprocessing. The modulus of the complex number field is the amplitude spectrum which represents the energy distribution of different frequency points.

Step S520: training a preselected noise reduction model by using signal characteristics;

and predicting the estimated characteristics through a preselected noise reduction model, calculating the difference between the estimated characteristics and the label characteristics and the Loss, loss back propagation and updating model parameters by combining gradient descent, and repeating the process until the Loss converges, namely the Loss does not descend. It should be noted that, because a posterior signal-to-noise ratio or a prior signal-to-noise ratio after special normalization is used as a label, the posterior signal-to-noise ratio or the prior signal-to-noise ratio can limit a range, and has strong correlation with an input characteristic (i.e., a range spectrum after a noisy signal is divided), training with too many weights is not required.

Step S530: and acquiring the signal-to-noise parameters of the signal with noise according to the trained preselected noise reduction model.

In the above process, after the training of the preselected noise reduction model is completed, the signal-to-noise parameter of the signal with noise is output. It should be noted that the signal-to-noise parameters of the noisy signal include an a priori signal-to-noise ratio and an a posteriori signal-to-noise ratio.

Step S600: and estimating the prior signal-to-noise ratio according to the signal-to-noise parameters to realize noise reduction.

In some embodiments, step S600 includes the following steps, as shown in fig. 5:

step S610: acquiring a signal-to-noise ratio gain function according to the signal-to-noise parameters;

since most speech noise reduction can be represented by a gain function of an a priori snr parameter, it is shown that the overall performance of speech noise reduction depends to a large extent on the accuracy of the a priori snr estimation.

Let y (t), s (t) and n (t) be the observation signal mixed with interference, the pure speech signal and the noise signal, respectively, if it is considered that the actual speech s (t) and the noise signal n (t) are not related to each other. The noisy speech can be expressed as:

y (t) = s (t) and n (t), (1)

Fourier transform is performed on both sides of equation 1:

Y(m，k)＝S(m，k)+N(m，k)， (2)

where Y (m, k) represents a real signal, S (m, k) a clean signal, N (m, k) a noise signal, and m a frame index, k a frequency point index. Recovering an original voice signal S (m, k) from a received noise-containing signal, performing IFFT on the original voice signal S (m, k) to convert the original voice signal S (m, k) into a corresponding time domain to finally obtain an estimated value S (m, k) of the original voice signal S (m, k), selecting different S (m, k) estimation algorithms according to different distortion functions to obtain different S (m, k) estimation algorithms, such as a wiener filter algorithm, a minimum mean square error algorithm, a logarithmic magnitude spectrum estimation algorithm and the like, wherein the algorithms have a common characteristic, and output results of the algorithms can be represented by a product of a gain function and an observation signal Y (m, k) containing noise:

s(m,k)＝H(m,k)* Y(m，k)， (3)

where H (m, k) is a gain function, often expressed as a function of the a priori signal-to-noise ratio. We provide a method for deriving the gain function of the wiener filter algorithm, defining the expression of the error between the speech signal S (m, k) and its estimated value S (m, k) as follows:

e(m,k)＝s(m,k)-S(m,k)＝[H(m,k)-1]*S(m,k)

+H(m,k)*N(m，k),(4)

where [ H (m, k) -1] × S (m, k) is the FFT transform coefficient of the speech distortion component, and H (m, k) × N (m, k) is the FFT transform coefficient of the residual noise component.

Step S620: the prior signal-to-noise ratio is estimated from a signal-to-noise ratio gain function.

Since the noise signal is uncorrelated with the speech signal contaminated by it, there are

e ² (m,k)＝{[H(m,k)-1]*S(m,k)+H(m,k)*N(m，k)} ²

＝[H(m,k)-1] ² *S(m,k) ² +H(m,k) ² *N(m，k) ² +2*H(m,k)*

[H(m,k)-1]*H(m,k)*N(m，k)

＝[H(m,k)-1] ² * S(m,k) ²² + H(m,k) ² * N(m，k) ² ， (5)

For mean square error E { E } ² (m, k) } after the minimization process, the wiener gain equation is derived:

H(m,k)＝E[S ² (m,k)]/ (E[S ² (m,k)]+ E[N ² (m,k)]), (6)

apriori signal-to-noise ratio xi (m, k) = E [ S = ² (m,k)]/E[N ² (m,k)]In which E [ N ] ² (m,k)]For noise power, the equation for wiener gain translates to:

H(m,k)＝ξ(m,k)/ξ(m,k)+1， (7)

it is common for the noise power to be treated as a known condition. And the a priori signal-to-noise ratio function ξ (m, k), it can be estimated using the following equation:

ξ(m,k)＝a/[|s(m-1,k)| ² / E[N ² (m,k)]]+(1-a)max(r(m,k)-1,0), (8)

where a is the weighting coefficient, S (m-1,k) is the estimate of the clean speech spectrum in the previous frame, g (m, k) is the a posteriori signal-to-noise ratio, which is the direct decision DD.

The estimated value of the pure speech frequency domain can be estimated by the following formula:

s(m,k)＝ H(m,k)*Y(m,k)＝(ξ(m,k)/ξ(m,k)+1)* Y(m,k)， (9)

compared with the simple artificial intelligence noise reduction, the method and the device for reducing the noise of the terminal reduce the requirements on the computational power and the storage volume of the terminal, and can be applied to light terminals with limited resources.

Based on the above voice noise reduction method, an embodiment of the present invention further provides a voice noise reduction device, a schematic structural diagram of which is shown in fig. 6, and the device includes: a model obtaining module 110, configured to obtain a preselected noise reduction model; a signal obtaining module 120, configured to obtain a signal with noise; a sample obtaining module 130, configured to obtain a training noisy signal sample; the label processing module 140 is configured to process the training noisy signal sample according to the label; the signal processing module 150 is configured to perform noise reduction processing on the signal with noise according to the preselected noise reduction model, and obtain a signal-to-noise parameter of the signal with noise; and a parameter estimation module 160, configured to estimate the prior snr according to the snr parameter to implement noise reduction.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device 400 according to an embodiment of the present invention, where the electronic device 400 includes:

one or more processors 401 and a memory 402, one processor 401 being exemplified in fig. 7.

The processor 401 and the memory 402 may be connected by a bus or other means, such as the bus connection in fig. 7.

Memory 402, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The processor 401 executes various functional applications and data processing of the electronic device, i.e. implements a voice noise reduction method of the above-described method embodiments, by running non-volatile software programs, instructions and units stored in the memory 402.

The memory 402 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the electronic device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 402 may optionally include memory located remotely from processor 401, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more units are stored in the memory 402 and when executed by the one or more processors 401, perform a method of voice noise reduction in any of the above-described method embodiments, e.g., performing the functions of the various blocks in the apparatus of fig. 6 or the method steps S100-S600 of fig. 1 described above.

The electronic device can execute the voice noise reduction method provided by the embodiment of the invention, and has the corresponding program module and the beneficial effects of the execution method. For technical details that are not described in detail in the embodiment of the electronic device, reference may be made to a method for reducing noise in a voice provided by the embodiment of the present invention.

An embodiment of the present invention further provides a nonvolatile computer-readable storage medium, which may be included in the device described in the above embodiment; or may be separate and not incorporated into the device. The non-transitory computer readable storage medium carries one or more programs which, when executed, implement the methods of embodiments of the present disclosure.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for speech noise reduction, comprising:

obtaining a preselected noise reduction model;

acquiring a signal with noise;

carrying out noise reduction processing on the signal with noise according to the preselected noise reduction model, and acquiring a signal-to-noise parameter of the signal with noise;

and estimating the prior signal-to-noise ratio according to the signal-to-noise parameter to realize noise reduction.

2. The method of claim 1, wherein after acquiring the noisy signal, the method further comprises:

acquiring a training noisy signal sample;

and processing the training noisy signal sample according to the label.

3. The method of claim 2, wherein said obtaining a preselected noise reduction model comprises:

obtaining a plurality of noise reduction models;

presenting options of the number of noise reduction models in an interactive interface;

and responding to the selection of the option of the interactive interface, and taking the noise reduction model corresponding to the option as the preselected noise reduction model.

4. The method of claim 3, wherein said denoising said noisy signal according to said preselected denoising model comprises:

acquiring signal characteristics from the signal with the noise, wherein the signal characteristics are the amplitude spectrum of the signal with the noise;

training the preselected noise reduction model by using the signal features;

and acquiring the signal-to-noise parameters of the signal with noise according to the trained preselected noise reduction model.

5. The method of claim 4, wherein estimating the a priori signal-to-noise ratio according to the signal-to-noise parameter to achieve noise reduction comprises:

acquiring a signal-to-noise ratio gain function according to the signal-to-noise parameters;

and estimating the prior signal-to-noise ratio according to the signal-to-noise ratio gain function.

6. The method of claim 5, wherein the processing the training noisy signal samples according to labels comprises:

performing label processing on the training noisy signal sample by adopting a priori signal-to-noise ratio after special normalization; or

And performing label processing on the training noisy signal sample by adopting a posterior signal-to-noise ratio after special normalization.

7. The method of claim 6, wherein the number of noise reduction models comprises a deep neural network, a convolutional neural network, and a cyclic neural network with a Gaussian mask and a self-attention mechanism.

8. A speech noise reduction apparatus, comprising:

the model acquisition module is used for acquiring a preselected noise reduction model;

the signal acquisition module is used for acquiring a signal with noise;

the sample acquisition module is used for acquiring a training noisy signal sample;

the label processing module is used for processing the training noisy signal sample according to a label;

the signal processing module is used for carrying out noise reduction processing on the signal with noise according to the preselected noise reduction model and acquiring a signal-to-noise parameter of the signal with noise;

and the parameter estimation module is used for estimating the prior signal-to-noise ratio according to the signal-to-noise parameters to realize noise reduction.

9. An electronic device, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of speech noise reduction according to any of claims 1-7.

10. A non-transitory computer storage medium storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform a method of speech noise reduction according to any one of claims 1-7.