CN112002339A

CN112002339A - Voice noise reduction method and device, computer-readable storage medium and electronic device

Info

Publication number: CN112002339A
Application number: CN202010713823.4A
Authority: CN
Inventors: 马路; 赵培; 苏腾荣
Original assignee: Haier Uplus Intelligent Technology Beijing Co Ltd
Current assignee: Haier Uplus Intelligent Technology Beijing Co Ltd
Priority date: 2020-07-22
Filing date: 2020-07-22
Publication date: 2020-11-27
Anticipated expiration: 2040-07-22
Also published as: CN112002339B

Abstract

The invention discloses a voice noise reduction method and device, a computer readable storage medium and an electronic device. Wherein, the method comprises the following steps: performing voice separation on voice data to be denoised to obtain first voice data and second voice data of the voice data, and determining a first probability function corresponding to the first voice data and a second probability function corresponding to the second voice data; determining target noise reduction data of the voice data through a first probability function and a second probability function; carry out noise reduction processing through the data of making an uproar that fall to voice data, obtain target voice data, reached the voice data separation that will fall the noise and be two branches, first voice data and second voice data promptly, recycle the purpose of making an uproar falls to the voice data who mixes the noise through the noise data that separates in two branches, and then solved prior art, fallen the technical problem that the accuracy nature is not high of making an uproar to the voice.

Description

Voice noise reduction method and device, computer-readable storage medium and electronic device

Technical Field

The present invention relates to the field of speech processing, and in particular, to a method and an apparatus for speech noise reduction, a computer-readable storage medium, and an electronic apparatus.

Background

The speech signal processing technology is a key technology in the field of human-computer interaction at present, and speech noise reduction can realize enhancement of input speech to obtain relatively pure audio, has an extremely important effect on speech recognition at the rear end, and is a key technology for speech signal processing.

The current speech noise reduction method mainly adopts a noise reduction method in an open source tool WebRTC, namely: calculating the spectral flatness, the log-Likelihood Ratio (LRT) characteristic and the spectral difference characteristic of the input audio, updating the probability function of voice/noise according to the characteristics, updating the noise estimation according to the probability function, obtaining a wiener filter according to the noise estimation, and realizing the noise reduction of the input audio by using the wiener filter. The method directly estimates the noise and the signal in the current input signal with the noise, so when the noise is estimated, the signal component necessarily influences the accurate estimation of the noise, the estimation of the noise also necessarily influences the estimation of the noise, and the final noise reduction effect is influenced.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a voice noise reduction method and device, a computer readable storage medium and an electronic device, which at least solve the technical problem that in the prior art, the accuracy of voice noise reduction is not high.

According to an aspect of an embodiment of the present invention, there is provided a speech noise reduction method, including: performing voice separation on voice data to be denoised to obtain first voice data and second voice data of the voice data, wherein the proportion of voice signals in the first voice data is greater than a first threshold value, and the proportion of noise signals in the second voice data is greater than the voice data of a second threshold value; respectively carrying out time-frequency transformation on the first voice data and the second voice data, and determining a first probability function corresponding to the first voice data and a second probability function corresponding to the second voice data; determining target noise reduction data for the speech data by the first probability function and the second probability function; and carrying out noise reduction processing on the voice data through the noise reduction data to obtain target voice data.

According to another aspect of the embodiments of the present invention, there is also provided a speech noise reduction apparatus, including: the voice processing device comprises a separation unit and a processing unit, wherein the separation unit is used for carrying out voice separation on voice data to be denoised to obtain first voice data and second voice data of the voice data, the proportion of the voice data in the first voice data is greater than a first threshold value, and the proportion of the noise data in the second voice data is greater than the voice data of a second threshold value; a first determining unit, configured to perform time-frequency transformation on the first voice data to determine a corresponding first probability function, and perform time-frequency transformation on the second voice data to determine a corresponding second probability function; a second determining unit configured to determine target noise reduction data of the speech data by the first probability function and the second probability function; and the noise reduction unit is used for carrying out noise reduction processing on the voice data through the target noise reduction data to obtain target voice data.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned voice noise reduction method when running.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the voice noise reduction method through the computer program.

In the embodiment of the invention, voice separation is carried out on voice data to be denoised to obtain first voice data and second voice data of the voice data, wherein the proportion of the voice data in the first voice data is greater than a first threshold value, and the proportion of the noise data in the second voice data is greater than the voice data of a second threshold value; performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function; determining target noise reduction data of the voice data through a first probability function and a second probability function; the target noise reduction data is used for carrying out noise reduction on the voice data to obtain the target voice data, the voice data which is used for reducing the noise is separated into two branches, namely the first voice data and the second voice data, and the purpose of reducing the noise of the voice data mixed with the noise by using the noise data separated from the two branches is achieved, so that the technical effect of carrying out the noise reduction processing on the voice according to the noise data in the voice data is achieved, and the technical problem that in the prior art, the accuracy of noise reduction on the voice is not high is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic diagram of an application environment of an alternative speech noise reduction method according to an embodiment of the present invention;

FIG. 2 is a flow diagram of an alternative method of speech noise reduction according to an embodiment of the present invention;

FIG. 3 is a flow diagram of an alternative method of speech noise reduction according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an alternative speech noise reduction apparatus according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an alternative voice noise reduction method in an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of the embodiments of the present invention, a voice noise reduction method is provided, and optionally, as an optional implementation, the voice noise reduction method may be applied, but not limited, to a hardware environment as shown in fig. 1, where the environment may include, but is not limited to, the user equipment 102, the network 110, and the server 112.

The user equipment 102 may include, but is not limited to: a display 104, a processor 106, and a memory 108. The voice denoising 104 is used for acquiring voice data to be denoised through a human-computer interaction interface; the processor 106 is configured to respond to the human-computer interaction instruction, and separate the voice data to be denoised to obtain first voice data and second voice data, where a ratio of voice data included in the first voice data is greater than a first threshold, and a ratio of noise data included in the second voice data is greater than a second threshold. The memory 108 is used for storing information such as voice data to be denoised, first voice data, and second voice data. Here, the server may include but is not limited to: the database 114 and the processing engine 116, the processing engine 116 is configured to call the first voice data and the second voice data stored in the database 114, perform time-frequency transformation on the first voice data to determine a corresponding first probability function, and perform time-frequency transformation on the second voice data to determine a corresponding second probability function; determining target noise reduction data of the voice data through a first probability function and a second probability function; the target noise reduction data is used for carrying out noise reduction on the voice data to obtain the target voice data, the voice data which is used for reducing the noise is separated into two branches, namely the first voice data and the second voice data, and the purpose of reducing the noise of the voice data mixed with the noise by using the noise data separated from the two branches is achieved, so that the technical effect of carrying out the noise reduction processing on the voice according to the noise data in the voice data is achieved, and the technical problem that in the prior art, the accuracy of noise reduction on the voice is not high is solved.

The specific process comprises the following steps: in the terminal device 102. In steps S102-S110, the voice data to be denoised is separated to obtain the first voice data and the second voice data, and the first voice data and the second voice data are sent to the server 112 through the network 110. Performing time-frequency transformation on the first voice data at the server 112 to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function; determining target noise reduction data of the voice data through a first probability function and a second probability function; and performing noise reduction processing on the voice data through the target noise reduction data to obtain target voice data. And then returns the determined result to the terminal device 102.

Then, in step S114-S116, the terminal device 102 performs voice separation on the voice data to be denoised to obtain first voice data and second voice data of the voice data, where the voice data in the first voice data is greater than a first threshold, and the second voice data includes voice data whose noise data is greater than a second threshold; performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function; determining target noise reduction data of the voice data through a first probability function and a second probability function; the target noise reduction data is used for carrying out noise reduction on the voice data to obtain the target voice data, the voice data which is used for reducing the noise is separated into two branches, namely the first voice data and the second voice data, and the purpose of reducing the noise of the voice data mixed with the noise by using the noise data separated from the two branches is achieved, so that the technical effect of carrying out the noise reduction processing on the voice according to the noise data in the voice data is achieved, and the technical problem that in the prior art, the accuracy of noise reduction on the voice is not high is solved.

Optionally, in this embodiment, the voice denoising method may be applied, but not limited to, in the server 112 for assisting the application client to perform denoising processing on the voice data acquired. The application client may be but not limited to run in the user equipment 102, and the user equipment 102 may be but not limited to a mobile phone, a tablet computer, a notebook computer, a PC, and other terminal equipment supporting running of the application client. The server 112 and the user device 102 may implement data interaction through a network, which may include but is not limited to a wireless network or a wired network. Wherein, this wireless network includes: bluetooth, WIFI, and other networks that enable wireless communication. Such wired networks may include, but are not limited to: wide area networks, metropolitan area networks, and local area networks. The above is merely an example, and this is not limited in this embodiment.

Optionally, as an optional implementation manner, as shown in fig. 2, the voice noise reduction method includes:

step S202, voice separation is carried out on voice data to be denoised to obtain first voice data and second voice data of the voice data, wherein the voice data proportion in the first voice data is larger than a first threshold value, and the second voice data comprises voice data of which the noise data proportion is larger than a second threshold value.

Step S204, performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function.

Step S206, determining target noise reduction data of the voice data through the first probability function and the second probability function.

And S208, performing noise reduction processing on the voice data through the target noise reduction data to obtain target voice data.

Optionally, in this embodiment, the voice data to be denoised may include, but is not limited to, voice data including noise data sent by a human being, and voice data including noise data sent by an animal. That is, the voice data to be denoised is voice for which noise in a sound source needs to be removed.

In this embodiment, the voice data needs to be divided into two branches for noise reduction. That is, the sound source separation module separates the input voice with noise into a voice branch (corresponding to the first voice data) and a noise branch (corresponding to the second voice data). The voice branch circuit is characterized in that voice signals account for main components and have a small amount of noise; and a noise branch circuit, wherein noise accounts for a main component and has a small amount of voice signals.

It should be noted that, after performing time-frequency transformation on the first voice data and the second voice data, the method may further include:

s1, calculating a first characteristic parameter in the first voice data, wherein the first characteristic parameter comprises a spectrum flatness characteristic parameter, a log likelihood characteristic parameter and a spectrum difference characteristic parameter;

s2, calculating second characteristic parameters in the second voice data, wherein the second characteristic parameters comprise a spectrum flatness characteristic parameter, a log likelihood characteristic parameter and a spectrum difference characteristic parameter;

s3, determining a first probability function and a second probability function according to the first characteristic parameter and the second characteristic parameter.

In practical application, after separating the voice, the first voice data and the second voice data are respectively subjected to time-frequency transformation, that is, the first voice data and the second voice data are respectively transformed from time frequency to frequency domain. Spectral flatness characteristics, log likelihood bits, and spectral difference characteristics of the first voice data and the second voice data are calculated, respectively, and probability functions (i.e., first probability function/second probability function) of voice (first voice data)/noise (second voice data) are updated based on the three characteristics. The first voice data or the second voice data can carry out voice activity detection according to the probability function, and further judge whether the voice data is noise information or a voice signal. For example, the first voice data is determined as voice information.

Optionally, in this embodiment, determining the target noise reduction data of the speech data by using the first probability function and the second probability function may include:

s1, determining first noise data of the first speech data according to the first probability function;

s2, determining the first noise data as the target noise data when the voice activity detection is performed based on the first probability function and the first voice data is determined to be voice data;

s3, determining second noise data of the second voice data according to the second probability function;

s4, when the voice activity detection is performed based on the first probability function and the second voice data is determined to be noise data, the second noise data is determined to be target noise data.

Optionally, in this embodiment, before determining the first noise data of the first speech data according to the first probability function, the method may further include:

under the condition that the probability value of the first voice data determined according to the first probability function is larger than a threshold value, determining the first voice data as voice data;

and under the condition that the probability value of the first voice data determined according to the first probability function is smaller than the threshold value, determining the first voice data as noise data.

According to the embodiment provided by the application, voice separation is carried out on voice data to be denoised to obtain first voice data and second voice data of the voice data, wherein the voice data in the first voice data is larger than a first threshold value, and the second voice data comprises the voice data of which the noise data is larger than a second threshold value; performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function; determining target noise reduction data of the voice data through a first probability function and a second probability function; the target noise reduction data is used for carrying out noise reduction on the voice data to obtain the target voice data, the voice data which is used for reducing the noise is separated into two branches, namely the first voice data and the second voice data, and the purpose of reducing the noise of the voice data mixed with the noise by using the noise data separated from the two branches is achieved, so that the technical effect of carrying out the noise reduction processing on the voice according to the noise data in the voice data is achieved, and the technical problem that in the prior art, the accuracy of noise reduction on the voice is not high is solved.

As an alternative embodiment, after performing noise reduction processing on the speech data through the noise reduction data to obtain target speech data, the method may further include:

and transforming the target voice data from the frequency domain to the time domain by using short-time Fourier transform to obtain reconstructed target voice data.

Optionally, in this embodiment, the target speech data is transformed from the frequency domain to the time domain to obtain reconstructed target speech data, and the speech recognition is performed on the reconstructed target speech data, so that the problem of low speech recognition rate caused by poor speech noise reduction performance at present is solved.

As an alternative embodiment, the present application further provides a voice noise reduction method based on sound source separation.

In order to improve the estimation of the signal and noise and further improve the noise reduction effect, in this embodiment, a noise reduction method based on sound source separation is proposed, that is: the method comprises the steps of firstly separating an input signal with noise from a sound source to obtain a signal component with a small amount of noise and a noise component with almost pure noise, respectively estimating the noise and the signal of two paths, and finally selecting the noise estimation required by a wiener filter according to the end point Detection (VAD) of a signal branch. And if the VAD is judged to be noise, carrying out wiener filtering by adopting the noise estimation of the noise branch, and if the VAD is judged to be voice, carrying out wiener filtering by adopting the noise of the signal branch.

As shown in fig. 3, the algorithm flow chart of the voice noise reduction method based on sound source separation. The specific algorithm flow is as follows:

1. and separating sound sources. And separating the input voice with noise into a voice branch and a noise branch by using a sound source separation module. A voice branch (equivalent to first voice data), in which a voice signal occupies a major component and has a small amount of noise; the noise branch (corresponding to the second voice data) is mainly composed of noise with a small amount of voice signals.

2. And (5) time-frequency transformation. And respectively carrying out time-frequency transformation on the voice branch signal and the noise branch signal to a frequency domain (equivalent to respectively carrying out time-frequency transformation on the first voice data and the second voice data).

3. And (5) feature extraction. And respectively calculating the spectrum flatness characteristic, the log-likelihood bit characteristic and the spectrum difference characteristic, and updating the probability function of the voice/noise according to the three characteristics.

VAD calculation. For the voice branch, comparing a probability function with a threshold to perform Voice Activity Detection (VAD), judging as voice if the probability is greater than the threshold, and judging as noise if the probability is less than the threshold; and the two branches respectively obtain the noise estimation of each branch according to the respective probability function.

5. And (4) carrying out wiener filtering. According to the VAD result obtained in the step 4, if the voice branch VAD is judged to be voice, calculating a frequency domain wiener filter coefficient by using the noise estimation result of the voice branch; and if the voice branch VAD judges that the voice branch is noise, calculating a frequency domain wiener filter coefficient by using the noise calculated by the noise branch.

6. And (5) signal reconstruction. The signal is transformed from the frequency domain to the time domain using a short-time fourier transform.

The embodiment provided by the application has the following advantages: better noise reduction performance: because the invention separates the input audio into two branches of voice and noise, and estimates the noise and signal of two branches, the accuracy of noise and voice estimation is higher, and the noise reduction performance is better. The algorithm complexity is low: the invention can be obtained by adding the sound source separation on the basis of the open source code directly, so the algorithm is low in difficulty and complexity.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

According to another aspect of the embodiment of the present invention, there is also provided a voice noise reduction apparatus for implementing the voice noise reduction method. As shown in fig. 4, the voice noise reduction apparatus includes: a separation unit 41, a first determination unit 43, a second determination unit 45, and a noise reduction unit 47.

A separation unit 41, configured to perform voice separation on voice data to be denoised to obtain first voice data and second voice data of the voice data, where a proportion of the voice data in the first voice data is greater than a first threshold, and a proportion of the noise data in the second voice data is greater than a second threshold;

a first determining unit 43, configured to perform time-frequency transformation on the first voice data to determine a corresponding first probability function, and perform time-frequency transformation on the second voice data to determine a corresponding second probability function;

a second determining unit 45, configured to determine target noise reduction data of the voice data through the first probability function and the second probability function;

and the noise reduction unit 47 is configured to perform noise reduction processing on the speech data through the target noise reduction data to obtain target speech data.

Optionally, the first determining unit 43 may include:

a first determining module for determining first noise data of the first speech data according to a first probability function;

the second determining module is used for determining the first noise data as the target noise data under the condition that the voice activity detection is carried out according to the first probability function to determine that the first voice data is the voice data;

a third determining module, configured to determine second noise data of the second voice data according to the second probability function;

and the fourth determining module is used for determining the second noise data as the target noise data under the condition that the voice activity detection is carried out according to the first probability function to determine that the second voice data is the noise data.

According to the embodiment provided by the application, the separation unit 41 performs voice separation on the voice data to be denoised to obtain first voice data and second voice data of the voice data, wherein the voice data in the first voice data is greater than a first threshold value, and the noise data included in the second voice data is greater than the voice data of a second threshold value; the first determining unit 43 performs time-frequency transformation on the first voice data to determine a corresponding first probability function, and performs time-frequency transformation on the second voice data to determine a corresponding second probability function; the second determining unit 45 determines target noise reduction data of the voice data by the first probability function and the second probability function; the noise reduction unit 47 performs noise reduction processing on the speech data by using the target noise reduction data to obtain target speech data.

As an alternative embodiment, the apparatus may further include:

the first calculation unit is used for calculating a first characteristic parameter in the first voice data after time-frequency transformation is respectively carried out on the first voice data and the second voice data, wherein the first characteristic parameter comprises a spectrum flatness characteristic parameter, a log-likelihood bit characteristic parameter and a spectrum difference characteristic parameter;

the second calculation unit is used for calculating a second characteristic parameter in the second voice data, wherein the second characteristic parameter comprises a spectrum flatness characteristic parameter, a log likelihood bit characteristic parameter and a spectrum difference characteristic parameter;

a first probability function and a second probability function are determined from the first characteristic parameter and the second characteristic parameter.

As an alternative embodiment, the apparatus may further include:

a third determining unit, configured to determine that the first voice data is voice data when a probability value of the first voice data determined according to the first probability function is greater than a threshold value before determining the first noise data of the first voice data according to the first probability function;

and the fourth determining unit is used for determining the first voice data as noise data under the condition that the probability value of the first voice data determined according to the first probability function is smaller than the threshold value.

As an alternative embodiment, the apparatus may further include:

and the third obtaining unit is used for carrying out noise reduction processing on the voice data through the noise reduction data to obtain target voice data, and then transforming the target voice data from a frequency domain to a time domain by using short-time Fourier transform to obtain reconstructed target voice data.

According to yet another aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the above-mentioned voice noise reduction method, as shown in fig. 5, the electronic device includes a memory 502 and a processor 504, the memory 502 stores a computer program therein, and the processor 504 is configured to execute the steps in any one of the above-mentioned method embodiments through the computer program.

Optionally, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, performing voice separation on voice data to be denoised to obtain first voice data and second voice data of the voice data, wherein the proportion of the voice data in the first voice data is greater than a first threshold value, and the proportion of the noise data in the second voice data is greater than a second threshold value;

s2, performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function;

s3, determining target noise reduction data of the voice data through the first probability function and the second probability function;

and S4, carrying out noise reduction processing on the voice data through the target noise reduction data to obtain target voice data.

Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 5 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 5 is a diagram illustrating a structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 5, or have a different configuration than shown in FIG. 5.

The memory 502 may be used to store software programs and modules, such as program instructions/modules corresponding to the voice noise reduction method and apparatus in the embodiment of the present invention, and the processor 504 executes various functional applications and data processing by running the software programs and modules stored in the memory 502, that is, implements the voice noise reduction method. The memory 502 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 502 may further include memory located remotely from the processor 504, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 502 may be, but not limited to, specifically used for voice data to be denoised, separated first voice data, second voice data, and other information. As an example, as shown in fig. 5, the memory 502 may include, but is not limited to, the separation unit 41, the first determination unit 43, the second determination unit 45, and the noise reduction unit 47 in the voice noise reduction apparatus. In addition, other module units in the voice noise reduction apparatus may also be included, but are not limited to these, and are not described in this example again.

Optionally, the transmission device 506 is used for receiving or sending data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 506 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices to communicate with the internet or a local area Network. In one example, the transmission device 506 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

According to a further aspect of an embodiment of the present invention, there is also provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for speech noise reduction, comprising:

performing voice separation on voice data to be denoised to obtain first voice data and second voice data, wherein the proportion of the voice data in the first voice data is greater than a first threshold value, and the proportion of the noise data in the second voice data is greater than a second threshold value;

performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function;

determining target noise reduction data for the speech data by the first probability function and the second probability function;

and carrying out noise reduction processing on the voice data through the target noise reduction data to obtain target voice data.

2. The method of claim 1, wherein determining target noise reduction data for the speech data by the first probability function and the second probability function comprises:

determining first noise data for the first speech data according to the first probability function;

determining the first noise data as the target noise reduction data when voice activity detection is performed according to the first probability function and the first voice data is determined to be voice data;

determining second noise data of the second voice data according to the second probability function;

and determining the second noise data as the target noise reduction data when the second voice data is determined to be noise data by voice activity detection according to the first probability function.

3. The method of claim 2, wherein prior to determining first noise data for the first speech data according to the first probability function, the method comprises:

determining the first voice data as the voice data under the condition that the probability value of the first voice data determined according to the first probability function is larger than a threshold value;

determining the first voice data as the noise data in case the first voice data probability value determined according to the first probability function is smaller than a threshold value.

4. The method of claim 1, wherein after performing time-frequency transformation on the first speech data and the second speech data, respectively, the method comprises:

calculating a first feature parameter in the first voice data, wherein the first feature parameter comprises a spectral flatness feature parameter, a log-likelihood feature parameter and a spectral difference feature parameter;

calculating second feature parameters in the second voice data, wherein the second feature parameters comprise a spectral flatness feature parameter, a log-likelihood feature parameter and a spectral difference feature parameter;

determining the first probability function and the second probability function from the first characteristic parameter and the second characteristic parameter.

5. The method according to claim 1, wherein after the speech data is subjected to noise reduction processing by the noise reduction data to obtain target speech data, the method comprises:

and transforming the target voice data from a frequency domain to a time domain by using short-time Fourier transform to obtain reconstructed target voice data.

6. A speech noise reduction apparatus, comprising:

the voice processing device comprises a separation unit and a processing unit, wherein the separation unit is used for carrying out voice separation on voice data to be denoised to obtain first voice data and second voice data of the voice data, the proportion of the voice data in the first voice data is greater than a first threshold value, and the proportion of the noise data in the second voice data is greater than the voice data of a second threshold value;

a first determining unit, configured to perform time-frequency transformation on the first voice data to determine a corresponding first probability function, and perform time-frequency transformation on the second voice data to determine a corresponding second probability function;

a second determining unit configured to determine target noise reduction data of the speech data by the first probability function and the second probability function;

and the noise reduction unit is used for carrying out noise reduction processing on the voice data through the target noise reduction data to obtain target voice data.

7. The apparatus of claim 6, wherein the first determining unit comprises:

a first determining module for determining first noise data of the first speech data according to the first probability function;

a second determining module, configured to determine the first noise data as the target noise reduction data when performing voice activity detection according to the first probability function and determining that the first voice data is voice data;

and a fourth determining module, configured to determine the second noise data as the target noise reduction data when performing voice activity detection according to the first probability function and determining that the second voice data is noise data.

8. The apparatus of claim 7, wherein the apparatus comprises:

the first calculation unit is configured to calculate a first feature parameter in the first voice data after performing time-frequency transformation on the first voice data and the second voice data, where the first feature parameter includes a spectral flatness feature parameter, a log-likelihood feature parameter, and a spectral difference feature parameter;

a second calculating unit, configured to calculate a second feature parameter in the second speech data, where the second feature parameter includes a spectral flatness feature parameter, a log-likelihood feature parameter, and a spectral difference feature parameter;

9. A computer-readable storage medium comprising a stored program, wherein the program when executed performs the method of any of claims 1 to 5.

10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 5 by means of the computer program.