CN112002339A - Voice noise reduction method and device, computer-readable storage medium and electronic device - Google Patents

Voice noise reduction method and device, computer-readable storage medium and electronic device Download PDF

Info

Publication number
CN112002339A
CN112002339A CN202010713823.4A CN202010713823A CN112002339A CN 112002339 A CN112002339 A CN 112002339A CN 202010713823 A CN202010713823 A CN 202010713823A CN 112002339 A CN112002339 A CN 112002339A
Authority
CN
China
Prior art keywords
voice data
data
voice
probability function
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010713823.4A
Other languages
Chinese (zh)
Other versions
CN112002339B (en
Inventor
马路
赵培
苏腾荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haier Uplus Intelligent Technology Beijing Co Ltd
Original Assignee
Haier Uplus Intelligent Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haier Uplus Intelligent Technology Beijing Co Ltd filed Critical Haier Uplus Intelligent Technology Beijing Co Ltd
Priority to CN202010713823.4A priority Critical patent/CN112002339B/en
Publication of CN112002339A publication Critical patent/CN112002339A/en
Application granted granted Critical
Publication of CN112002339B publication Critical patent/CN112002339B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a voice noise reduction method and device, a computer readable storage medium and an electronic device. Wherein, the method comprises the following steps: performing voice separation on voice data to be denoised to obtain first voice data and second voice data of the voice data, and determining a first probability function corresponding to the first voice data and a second probability function corresponding to the second voice data; determining target noise reduction data of the voice data through a first probability function and a second probability function; carry out noise reduction processing through the data of making an uproar that fall to voice data, obtain target voice data, reached the voice data separation that will fall the noise and be two branches, first voice data and second voice data promptly, recycle the purpose of making an uproar falls to the voice data who mixes the noise through the noise data that separates in two branches, and then solved prior art, fallen the technical problem that the accuracy nature is not high of making an uproar to the voice.

Description

Voice noise reduction method and device, computer-readable storage medium and electronic device
Technical Field
The present invention relates to the field of speech processing, and in particular, to a method and an apparatus for speech noise reduction, a computer-readable storage medium, and an electronic apparatus.
Background
The speech signal processing technology is a key technology in the field of human-computer interaction at present, and speech noise reduction can realize enhancement of input speech to obtain relatively pure audio, has an extremely important effect on speech recognition at the rear end, and is a key technology for speech signal processing.
The current speech noise reduction method mainly adopts a noise reduction method in an open source tool WebRTC, namely: calculating the spectral flatness, the log-Likelihood Ratio (LRT) characteristic and the spectral difference characteristic of the input audio, updating the probability function of voice/noise according to the characteristics, updating the noise estimation according to the probability function, obtaining a wiener filter according to the noise estimation, and realizing the noise reduction of the input audio by using the wiener filter. The method directly estimates the noise and the signal in the current input signal with the noise, so when the noise is estimated, the signal component necessarily influences the accurate estimation of the noise, the estimation of the noise also necessarily influences the estimation of the noise, and the final noise reduction effect is influenced.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a voice noise reduction method and device, a computer readable storage medium and an electronic device, which at least solve the technical problem that in the prior art, the accuracy of voice noise reduction is not high.
According to an aspect of an embodiment of the present invention, there is provided a speech noise reduction method, including: performing voice separation on voice data to be denoised to obtain first voice data and second voice data of the voice data, wherein the proportion of voice signals in the first voice data is greater than a first threshold value, and the proportion of noise signals in the second voice data is greater than the voice data of a second threshold value; respectively carrying out time-frequency transformation on the first voice data and the second voice data, and determining a first probability function corresponding to the first voice data and a second probability function corresponding to the second voice data; determining target noise reduction data for the speech data by the first probability function and the second probability function; and carrying out noise reduction processing on the voice data through the noise reduction data to obtain target voice data.
According to another aspect of the embodiments of the present invention, there is also provided a speech noise reduction apparatus, including: the voice processing device comprises a separation unit and a processing unit, wherein the separation unit is used for carrying out voice separation on voice data to be denoised to obtain first voice data and second voice data of the voice data, the proportion of the voice data in the first voice data is greater than a first threshold value, and the proportion of the noise data in the second voice data is greater than the voice data of a second threshold value; a first determining unit, configured to perform time-frequency transformation on the first voice data to determine a corresponding first probability function, and perform time-frequency transformation on the second voice data to determine a corresponding second probability function; a second determining unit configured to determine target noise reduction data of the speech data by the first probability function and the second probability function; and the noise reduction unit is used for carrying out noise reduction processing on the voice data through the target noise reduction data to obtain target voice data.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned voice noise reduction method when running.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the voice noise reduction method through the computer program.
In the embodiment of the invention, voice separation is carried out on voice data to be denoised to obtain first voice data and second voice data of the voice data, wherein the proportion of the voice data in the first voice data is greater than a first threshold value, and the proportion of the noise data in the second voice data is greater than the voice data of a second threshold value; performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function; determining target noise reduction data of the voice data through a first probability function and a second probability function; the target noise reduction data is used for carrying out noise reduction on the voice data to obtain the target voice data, the voice data which is used for reducing the noise is separated into two branches, namely the first voice data and the second voice data, and the purpose of reducing the noise of the voice data mixed with the noise by using the noise data separated from the two branches is achieved, so that the technical effect of carrying out the noise reduction processing on the voice according to the noise data in the voice data is achieved, and the technical problem that in the prior art, the accuracy of noise reduction on the voice is not high is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a schematic diagram of an application environment of an alternative speech noise reduction method according to an embodiment of the present invention;
FIG. 2 is a flow diagram of an alternative method of speech noise reduction according to an embodiment of the present invention;
FIG. 3 is a flow diagram of an alternative method of speech noise reduction according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an alternative speech noise reduction apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an alternative voice noise reduction method in an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to an aspect of the embodiments of the present invention, a voice noise reduction method is provided, and optionally, as an optional implementation, the voice noise reduction method may be applied, but not limited, to a hardware environment as shown in fig. 1, where the environment may include, but is not limited to, the user equipment 102, the network 110, and the server 112.
The user equipment 102 may include, but is not limited to: a display 104, a processor 106, and a memory 108. The voice denoising 104 is used for acquiring voice data to be denoised through a human-computer interaction interface; the processor 106 is configured to respond to the human-computer interaction instruction, and separate the voice data to be denoised to obtain first voice data and second voice data, where a ratio of voice data included in the first voice data is greater than a first threshold, and a ratio of noise data included in the second voice data is greater than a second threshold. The memory 108 is used for storing information such as voice data to be denoised, first voice data, and second voice data. Here, the server may include but is not limited to: the database 114 and the processing engine 116, the processing engine 116 is configured to call the first voice data and the second voice data stored in the database 114, perform time-frequency transformation on the first voice data to determine a corresponding first probability function, and perform time-frequency transformation on the second voice data to determine a corresponding second probability function; determining target noise reduction data of the voice data through a first probability function and a second probability function; the target noise reduction data is used for carrying out noise reduction on the voice data to obtain the target voice data, the voice data which is used for reducing the noise is separated into two branches, namely the first voice data and the second voice data, and the purpose of reducing the noise of the voice data mixed with the noise by using the noise data separated from the two branches is achieved, so that the technical effect of carrying out the noise reduction processing on the voice according to the noise data in the voice data is achieved, and the technical problem that in the prior art, the accuracy of noise reduction on the voice is not high is solved.
The specific process comprises the following steps: in the terminal device 102. In steps S102-S110, the voice data to be denoised is separated to obtain the first voice data and the second voice data, and the first voice data and the second voice data are sent to the server 112 through the network 110. Performing time-frequency transformation on the first voice data at the server 112 to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function; determining target noise reduction data of the voice data through a first probability function and a second probability function; and performing noise reduction processing on the voice data through the target noise reduction data to obtain target voice data. And then returns the determined result to the terminal device 102.
Then, in step S114-S116, the terminal device 102 performs voice separation on the voice data to be denoised to obtain first voice data and second voice data of the voice data, where the voice data in the first voice data is greater than a first threshold, and the second voice data includes voice data whose noise data is greater than a second threshold; performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function; determining target noise reduction data of the voice data through a first probability function and a second probability function; the target noise reduction data is used for carrying out noise reduction on the voice data to obtain the target voice data, the voice data which is used for reducing the noise is separated into two branches, namely the first voice data and the second voice data, and the purpose of reducing the noise of the voice data mixed with the noise by using the noise data separated from the two branches is achieved, so that the technical effect of carrying out the noise reduction processing on the voice according to the noise data in the voice data is achieved, and the technical problem that in the prior art, the accuracy of noise reduction on the voice is not high is solved.
Optionally, in this embodiment, the voice denoising method may be applied, but not limited to, in the server 112 for assisting the application client to perform denoising processing on the voice data acquired. The application client may be but not limited to run in the user equipment 102, and the user equipment 102 may be but not limited to a mobile phone, a tablet computer, a notebook computer, a PC, and other terminal equipment supporting running of the application client. The server 112 and the user device 102 may implement data interaction through a network, which may include but is not limited to a wireless network or a wired network. Wherein, this wireless network includes: bluetooth, WIFI, and other networks that enable wireless communication. Such wired networks may include, but are not limited to: wide area networks, metropolitan area networks, and local area networks. The above is merely an example, and this is not limited in this embodiment.
Optionally, as an optional implementation manner, as shown in fig. 2, the voice noise reduction method includes:
step S202, voice separation is carried out on voice data to be denoised to obtain first voice data and second voice data of the voice data, wherein the voice data proportion in the first voice data is larger than a first threshold value, and the second voice data comprises voice data of which the noise data proportion is larger than a second threshold value.
Step S204, performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function.
Step S206, determining target noise reduction data of the voice data through the first probability function and the second probability function.
And S208, performing noise reduction processing on the voice data through the target noise reduction data to obtain target voice data.
Optionally, in this embodiment, the voice data to be denoised may include, but is not limited to, voice data including noise data sent by a human being, and voice data including noise data sent by an animal. That is, the voice data to be denoised is voice for which noise in a sound source needs to be removed.
In this embodiment, the voice data needs to be divided into two branches for noise reduction. That is, the sound source separation module separates the input voice with noise into a voice branch (corresponding to the first voice data) and a noise branch (corresponding to the second voice data). The voice branch circuit is characterized in that voice signals account for main components and have a small amount of noise; and a noise branch circuit, wherein noise accounts for a main component and has a small amount of voice signals.
It should be noted that, after performing time-frequency transformation on the first voice data and the second voice data, the method may further include:
s1, calculating a first characteristic parameter in the first voice data, wherein the first characteristic parameter comprises a spectrum flatness characteristic parameter, a log likelihood characteristic parameter and a spectrum difference characteristic parameter;
s2, calculating second characteristic parameters in the second voice data, wherein the second characteristic parameters comprise a spectrum flatness characteristic parameter, a log likelihood characteristic parameter and a spectrum difference characteristic parameter;
s3, determining a first probability function and a second probability function according to the first characteristic parameter and the second characteristic parameter.
In practical application, after separating the voice, the first voice data and the second voice data are respectively subjected to time-frequency transformation, that is, the first voice data and the second voice data are respectively transformed from time frequency to frequency domain. Spectral flatness characteristics, log likelihood bits, and spectral difference characteristics of the first voice data and the second voice data are calculated, respectively, and probability functions (i.e., first probability function/second probability function) of voice (first voice data)/noise (second voice data) are updated based on the three characteristics. The first voice data or the second voice data can carry out voice activity detection according to the probability function, and further judge whether the voice data is noise information or a voice signal. For example, the first voice data is determined as voice information.
Optionally, in this embodiment, determining the target noise reduction data of the speech data by using the first probability function and the second probability function may include:
s1, determining first noise data of the first speech data according to the first probability function;
s2, determining the first noise data as the target noise data when the voice activity detection is performed based on the first probability function and the first voice data is determined to be voice data;
s3, determining second noise data of the second voice data according to the second probability function;
s4, when the voice activity detection is performed based on the first probability function and the second voice data is determined to be noise data, the second noise data is determined to be target noise data.
Optionally, in this embodiment, before determining the first noise data of the first speech data according to the first probability function, the method may further include:
under the condition that the probability value of the first voice data determined according to the first probability function is larger than a threshold value, determining the first voice data as voice data;
and under the condition that the probability value of the first voice data determined according to the first probability function is smaller than the threshold value, determining the first voice data as noise data.
According to the embodiment provided by the application, voice separation is carried out on voice data to be denoised to obtain first voice data and second voice data of the voice data, wherein the voice data in the first voice data is larger than a first threshold value, and the second voice data comprises the voice data of which the noise data is larger than a second threshold value; performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function; determining target noise reduction data of the voice data through a first probability function and a second probability function; the target noise reduction data is used for carrying out noise reduction on the voice data to obtain the target voice data, the voice data which is used for reducing the noise is separated into two branches, namely the first voice data and the second voice data, and the purpose of reducing the noise of the voice data mixed with the noise by using the noise data separated from the two branches is achieved, so that the technical effect of carrying out the noise reduction processing on the voice according to the noise data in the voice data is achieved, and the technical problem that in the prior art, the accuracy of noise reduction on the voice is not high is solved.
As an alternative embodiment, after performing noise reduction processing on the speech data through the noise reduction data to obtain target speech data, the method may further include:
and transforming the target voice data from the frequency domain to the time domain by using short-time Fourier transform to obtain reconstructed target voice data.
Optionally, in this embodiment, the target speech data is transformed from the frequency domain to the time domain to obtain reconstructed target speech data, and the speech recognition is performed on the reconstructed target speech data, so that the problem of low speech recognition rate caused by poor speech noise reduction performance at present is solved.
As an alternative embodiment, the present application further provides a voice noise reduction method based on sound source separation.
In order to improve the estimation of the signal and noise and further improve the noise reduction effect, in this embodiment, a noise reduction method based on sound source separation is proposed, that is: the method comprises the steps of firstly separating an input signal with noise from a sound source to obtain a signal component with a small amount of noise and a noise component with almost pure noise, respectively estimating the noise and the signal of two paths, and finally selecting the noise estimation required by a wiener filter according to the end point Detection (VAD) of a signal branch. And if the VAD is judged to be noise, carrying out wiener filtering by adopting the noise estimation of the noise branch, and if the VAD is judged to be voice, carrying out wiener filtering by adopting the noise of the signal branch.
As shown in fig. 3, the algorithm flow chart of the voice noise reduction method based on sound source separation. The specific algorithm flow is as follows:
1. and separating sound sources. And separating the input voice with noise into a voice branch and a noise branch by using a sound source separation module. A voice branch (equivalent to first voice data), in which a voice signal occupies a major component and has a small amount of noise; the noise branch (corresponding to the second voice data) is mainly composed of noise with a small amount of voice signals.
2. And (5) time-frequency transformation. And respectively carrying out time-frequency transformation on the voice branch signal and the noise branch signal to a frequency domain (equivalent to respectively carrying out time-frequency transformation on the first voice data and the second voice data).
3. And (5) feature extraction. And respectively calculating the spectrum flatness characteristic, the log-likelihood bit characteristic and the spectrum difference characteristic, and updating the probability function of the voice/noise according to the three characteristics.
VAD calculation. For the voice branch, comparing a probability function with a threshold to perform Voice Activity Detection (VAD), judging as voice if the probability is greater than the threshold, and judging as noise if the probability is less than the threshold; and the two branches respectively obtain the noise estimation of each branch according to the respective probability function.
5. And (4) carrying out wiener filtering. According to the VAD result obtained in the step 4, if the voice branch VAD is judged to be voice, calculating a frequency domain wiener filter coefficient by using the noise estimation result of the voice branch; and if the voice branch VAD judges that the voice branch is noise, calculating a frequency domain wiener filter coefficient by using the noise calculated by the noise branch.
6. And (5) signal reconstruction. The signal is transformed from the frequency domain to the time domain using a short-time fourier transform.
The embodiment provided by the application has the following advantages: better noise reduction performance: because the invention separates the input audio into two branches of voice and noise, and estimates the noise and signal of two branches, the accuracy of noise and voice estimation is higher, and the noise reduction performance is better. The algorithm complexity is low: the invention can be obtained by adding the sound source separation on the basis of the open source code directly, so the algorithm is low in difficulty and complexity.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
According to another aspect of the embodiment of the present invention, there is also provided a voice noise reduction apparatus for implementing the voice noise reduction method. As shown in fig. 4, the voice noise reduction apparatus includes: a separation unit 41, a first determination unit 43, a second determination unit 45, and a noise reduction unit 47.
A separation unit 41, configured to perform voice separation on voice data to be denoised to obtain first voice data and second voice data of the voice data, where a proportion of the voice data in the first voice data is greater than a first threshold, and a proportion of the noise data in the second voice data is greater than a second threshold;
a first determining unit 43, configured to perform time-frequency transformation on the first voice data to determine a corresponding first probability function, and perform time-frequency transformation on the second voice data to determine a corresponding second probability function;
a second determining unit 45, configured to determine target noise reduction data of the voice data through the first probability function and the second probability function;
and the noise reduction unit 47 is configured to perform noise reduction processing on the speech data through the target noise reduction data to obtain target speech data.
Optionally, the first determining unit 43 may include:
a first determining module for determining first noise data of the first speech data according to a first probability function;
the second determining module is used for determining the first noise data as the target noise data under the condition that the voice activity detection is carried out according to the first probability function to determine that the first voice data is the voice data;
a third determining module, configured to determine second noise data of the second voice data according to the second probability function;
and the fourth determining module is used for determining the second noise data as the target noise data under the condition that the voice activity detection is carried out according to the first probability function to determine that the second voice data is the noise data.
According to the embodiment provided by the application, the separation unit 41 performs voice separation on the voice data to be denoised to obtain first voice data and second voice data of the voice data, wherein the voice data in the first voice data is greater than a first threshold value, and the noise data included in the second voice data is greater than the voice data of a second threshold value; the first determining unit 43 performs time-frequency transformation on the first voice data to determine a corresponding first probability function, and performs time-frequency transformation on the second voice data to determine a corresponding second probability function; the second determining unit 45 determines target noise reduction data of the voice data by the first probability function and the second probability function; the noise reduction unit 47 performs noise reduction processing on the speech data by using the target noise reduction data to obtain target speech data.
As an alternative embodiment, the apparatus may further include:
the first calculation unit is used for calculating a first characteristic parameter in the first voice data after time-frequency transformation is respectively carried out on the first voice data and the second voice data, wherein the first characteristic parameter comprises a spectrum flatness characteristic parameter, a log-likelihood bit characteristic parameter and a spectrum difference characteristic parameter;
the second calculation unit is used for calculating a second characteristic parameter in the second voice data, wherein the second characteristic parameter comprises a spectrum flatness characteristic parameter, a log likelihood bit characteristic parameter and a spectrum difference characteristic parameter;
a first probability function and a second probability function are determined from the first characteristic parameter and the second characteristic parameter.
As an alternative embodiment, the apparatus may further include:
a third determining unit, configured to determine that the first voice data is voice data when a probability value of the first voice data determined according to the first probability function is greater than a threshold value before determining the first noise data of the first voice data according to the first probability function;
and the fourth determining unit is used for determining the first voice data as noise data under the condition that the probability value of the first voice data determined according to the first probability function is smaller than the threshold value.
As an alternative embodiment, the apparatus may further include:
and the third obtaining unit is used for carrying out noise reduction processing on the voice data through the noise reduction data to obtain target voice data, and then transforming the target voice data from a frequency domain to a time domain by using short-time Fourier transform to obtain reconstructed target voice data.
According to yet another aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the above-mentioned voice noise reduction method, as shown in fig. 5, the electronic device includes a memory 502 and a processor 504, the memory 502 stores a computer program therein, and the processor 504 is configured to execute the steps in any one of the above-mentioned method embodiments through the computer program.
Optionally, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of a computer network.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, performing voice separation on voice data to be denoised to obtain first voice data and second voice data of the voice data, wherein the proportion of the voice data in the first voice data is greater than a first threshold value, and the proportion of the noise data in the second voice data is greater than a second threshold value;
s2, performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function;
s3, determining target noise reduction data of the voice data through the first probability function and the second probability function;
and S4, carrying out noise reduction processing on the voice data through the target noise reduction data to obtain target voice data.
Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 5 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 5 is a diagram illustrating a structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 5, or have a different configuration than shown in FIG. 5.
The memory 502 may be used to store software programs and modules, such as program instructions/modules corresponding to the voice noise reduction method and apparatus in the embodiment of the present invention, and the processor 504 executes various functional applications and data processing by running the software programs and modules stored in the memory 502, that is, implements the voice noise reduction method. The memory 502 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 502 may further include memory located remotely from the processor 504, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 502 may be, but not limited to, specifically used for voice data to be denoised, separated first voice data, second voice data, and other information. As an example, as shown in fig. 5, the memory 502 may include, but is not limited to, the separation unit 41, the first determination unit 43, the second determination unit 45, and the noise reduction unit 47 in the voice noise reduction apparatus. In addition, other module units in the voice noise reduction apparatus may also be included, but are not limited to these, and are not described in this example again.
Optionally, the transmission device 506 is used for receiving or sending data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 506 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices to communicate with the internet or a local area Network. In one example, the transmission device 506 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
According to a further aspect of an embodiment of the present invention, there is also provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:
s1, performing voice separation on voice data to be denoised to obtain first voice data and second voice data of the voice data, wherein the proportion of the voice data in the first voice data is greater than a first threshold value, and the proportion of the noise data in the second voice data is greater than a second threshold value;
s2, performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function;
s3, determining target noise reduction data of the voice data through the first probability function and the second probability function;
and S4, carrying out noise reduction processing on the voice data through the target noise reduction data to obtain target voice data.
Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A method for speech noise reduction, comprising:
performing voice separation on voice data to be denoised to obtain first voice data and second voice data, wherein the proportion of the voice data in the first voice data is greater than a first threshold value, and the proportion of the noise data in the second voice data is greater than a second threshold value;
performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function;
determining target noise reduction data for the speech data by the first probability function and the second probability function;
and carrying out noise reduction processing on the voice data through the target noise reduction data to obtain target voice data.
2. The method of claim 1, wherein determining target noise reduction data for the speech data by the first probability function and the second probability function comprises:
determining first noise data for the first speech data according to the first probability function;
determining the first noise data as the target noise reduction data when voice activity detection is performed according to the first probability function and the first voice data is determined to be voice data;
determining second noise data of the second voice data according to the second probability function;
and determining the second noise data as the target noise reduction data when the second voice data is determined to be noise data by voice activity detection according to the first probability function.
3. The method of claim 2, wherein prior to determining first noise data for the first speech data according to the first probability function, the method comprises:
determining the first voice data as the voice data under the condition that the probability value of the first voice data determined according to the first probability function is larger than a threshold value;
determining the first voice data as the noise data in case the first voice data probability value determined according to the first probability function is smaller than a threshold value.
4. The method of claim 1, wherein after performing time-frequency transformation on the first speech data and the second speech data, respectively, the method comprises:
calculating a first feature parameter in the first voice data, wherein the first feature parameter comprises a spectral flatness feature parameter, a log-likelihood feature parameter and a spectral difference feature parameter;
calculating second feature parameters in the second voice data, wherein the second feature parameters comprise a spectral flatness feature parameter, a log-likelihood feature parameter and a spectral difference feature parameter;
determining the first probability function and the second probability function from the first characteristic parameter and the second characteristic parameter.
5. The method according to claim 1, wherein after the speech data is subjected to noise reduction processing by the noise reduction data to obtain target speech data, the method comprises:
and transforming the target voice data from a frequency domain to a time domain by using short-time Fourier transform to obtain reconstructed target voice data.
6. A speech noise reduction apparatus, comprising:
the voice processing device comprises a separation unit and a processing unit, wherein the separation unit is used for carrying out voice separation on voice data to be denoised to obtain first voice data and second voice data of the voice data, the proportion of the voice data in the first voice data is greater than a first threshold value, and the proportion of the noise data in the second voice data is greater than the voice data of a second threshold value;
a first determining unit, configured to perform time-frequency transformation on the first voice data to determine a corresponding first probability function, and perform time-frequency transformation on the second voice data to determine a corresponding second probability function;
a second determining unit configured to determine target noise reduction data of the speech data by the first probability function and the second probability function;
and the noise reduction unit is used for carrying out noise reduction processing on the voice data through the target noise reduction data to obtain target voice data.
7. The apparatus of claim 6, wherein the first determining unit comprises:
a first determining module for determining first noise data of the first speech data according to the first probability function;
a second determining module, configured to determine the first noise data as the target noise reduction data when performing voice activity detection according to the first probability function and determining that the first voice data is voice data;
a third determining module, configured to determine second noise data of the second voice data according to the second probability function;
and a fourth determining module, configured to determine the second noise data as the target noise reduction data when performing voice activity detection according to the first probability function and determining that the second voice data is noise data.
8. The apparatus of claim 7, wherein the apparatus comprises:
the first calculation unit is configured to calculate a first feature parameter in the first voice data after performing time-frequency transformation on the first voice data and the second voice data, where the first feature parameter includes a spectral flatness feature parameter, a log-likelihood feature parameter, and a spectral difference feature parameter;
a second calculating unit, configured to calculate a second feature parameter in the second speech data, where the second feature parameter includes a spectral flatness feature parameter, a log-likelihood feature parameter, and a spectral difference feature parameter;
determining the first probability function and the second probability function from the first characteristic parameter and the second characteristic parameter.
9. A computer-readable storage medium comprising a stored program, wherein the program when executed performs the method of any of claims 1 to 5.
10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 5 by means of the computer program.
CN202010713823.4A 2020-07-22 2020-07-22 Speech noise reduction method and device, computer-readable storage medium and electronic device Active CN112002339B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010713823.4A CN112002339B (en) 2020-07-22 2020-07-22 Speech noise reduction method and device, computer-readable storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010713823.4A CN112002339B (en) 2020-07-22 2020-07-22 Speech noise reduction method and device, computer-readable storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN112002339A true CN112002339A (en) 2020-11-27
CN112002339B CN112002339B (en) 2024-01-26

Family

ID=73467756

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010713823.4A Active CN112002339B (en) 2020-07-22 2020-07-22 Speech noise reduction method and device, computer-readable storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN112002339B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112652324A (en) * 2020-12-28 2021-04-13 深圳万兴软件有限公司 Speech enhancement optimization method, speech enhancement optimization system and readable storage medium

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0738454A (en) * 1993-05-19 1995-02-07 N T T Idou Tsuushinmou Kk Noise reduction method
US20060053002A1 (en) * 2002-12-11 2006-03-09 Erik Visser System and method for speech processing using independent component analysis under stability restraints
CN1809105A (en) * 2006-01-13 2006-07-26 北京中星微电子有限公司 Dual-microphone speech enhancement method and system applicable to mini-type mobile communication devices
US20090089054A1 (en) * 2007-09-28 2009-04-02 Qualcomm Incorporated Apparatus and method of noise and echo reduction in multiple microphone audio systems
KR20100072751A (en) * 2008-12-22 2010-07-01 한국전자통신연구원 Method and apparatus for reduction of noise
US20120179458A1 (en) * 2011-01-07 2012-07-12 Oh Kwang-Cheol Apparatus and method for estimating noise by noise region discrimination
CN103295580A (en) * 2013-05-13 2013-09-11 北京百度网讯科技有限公司 Method and device for suppressing noise of voice signals
CN103650040A (en) * 2011-05-16 2014-03-19 谷歌公司 Noise supression method and apparatus using multiple feature modeling for speech/noise likelihood
CN103813251A (en) * 2014-03-03 2014-05-21 深圳市微纳集成电路与系统应用研究院 Hearing-aid denoising device and method allowable for adjusting denoising degree
CN104103278A (en) * 2013-04-02 2014-10-15 北京千橡网景科技发展有限公司 Real time voice denoising method and device
JP2015219316A (en) * 2014-05-15 2015-12-07 株式会社リコー Device, method, and program
CN106486131A (en) * 2016-10-14 2017-03-08 上海谦问万答吧云计算科技有限公司 A kind of method and device of speech de-noising
KR101874946B1 (en) * 2017-02-02 2018-07-05 인성 엔프라 주식회사 home network system
US20180350381A1 (en) * 2017-05-31 2018-12-06 Apple Inc. System and method of noise reduction for a mobile device
US20190080710A1 (en) * 2017-09-12 2019-03-14 Board Of Trustees Of Michigan State University System and apparatus for real-time speech enhancement in noisy environments
CN109817234A (en) * 2019-03-06 2019-05-28 哈尔滨工业大学(深圳) Targeted voice signal Enhancement Method, system and storage medium based on continuing noise tracking
CN109859768A (en) * 2019-03-12 2019-06-07 上海力声特医学科技有限公司 Artificial cochlea's sound enhancement method
CN110379440A (en) * 2019-07-19 2019-10-25 宁波奥克斯电气股份有限公司 Voice de-noising method, device, voice air conditioner and computer readable storage medium

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0738454A (en) * 1993-05-19 1995-02-07 N T T Idou Tsuushinmou Kk Noise reduction method
US20060053002A1 (en) * 2002-12-11 2006-03-09 Erik Visser System and method for speech processing using independent component analysis under stability restraints
CN1809105A (en) * 2006-01-13 2006-07-26 北京中星微电子有限公司 Dual-microphone speech enhancement method and system applicable to mini-type mobile communication devices
US20090089054A1 (en) * 2007-09-28 2009-04-02 Qualcomm Incorporated Apparatus and method of noise and echo reduction in multiple microphone audio systems
KR20100072751A (en) * 2008-12-22 2010-07-01 한국전자통신연구원 Method and apparatus for reduction of noise
US20120179458A1 (en) * 2011-01-07 2012-07-12 Oh Kwang-Cheol Apparatus and method for estimating noise by noise region discrimination
CN103650040A (en) * 2011-05-16 2014-03-19 谷歌公司 Noise supression method and apparatus using multiple feature modeling for speech/noise likelihood
CN104103278A (en) * 2013-04-02 2014-10-15 北京千橡网景科技发展有限公司 Real time voice denoising method and device
CN103295580A (en) * 2013-05-13 2013-09-11 北京百度网讯科技有限公司 Method and device for suppressing noise of voice signals
CN103813251A (en) * 2014-03-03 2014-05-21 深圳市微纳集成电路与系统应用研究院 Hearing-aid denoising device and method allowable for adjusting denoising degree
JP2015219316A (en) * 2014-05-15 2015-12-07 株式会社リコー Device, method, and program
CN106486131A (en) * 2016-10-14 2017-03-08 上海谦问万答吧云计算科技有限公司 A kind of method and device of speech de-noising
KR101874946B1 (en) * 2017-02-02 2018-07-05 인성 엔프라 주식회사 home network system
US20180350381A1 (en) * 2017-05-31 2018-12-06 Apple Inc. System and method of noise reduction for a mobile device
US20190080710A1 (en) * 2017-09-12 2019-03-14 Board Of Trustees Of Michigan State University System and apparatus for real-time speech enhancement in noisy environments
CN109817234A (en) * 2019-03-06 2019-05-28 哈尔滨工业大学(深圳) Targeted voice signal Enhancement Method, system and storage medium based on continuing noise tracking
CN109859768A (en) * 2019-03-12 2019-06-07 上海力声特医学科技有限公司 Artificial cochlea's sound enhancement method
CN110379440A (en) * 2019-07-19 2019-10-25 宁波奥克斯电气股份有限公司 Voice de-noising method, device, voice air conditioner and computer readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112652324A (en) * 2020-12-28 2021-04-13 深圳万兴软件有限公司 Speech enhancement optimization method, speech enhancement optimization system and readable storage medium

Also Published As

Publication number Publication date
CN112002339B (en) 2024-01-26

Similar Documents

Publication Publication Date Title
CN109685202B (en) Data processing method and device, storage medium and electronic device
JP6393730B2 (en) Voice identification method and apparatus
CN110956957B (en) Training method and system of speech enhancement model
CN106486130B (en) Noise elimination and voice recognition method and device
CN107305774B (en) Voice detection method and device
CN108038546B (en) Method and apparatus for compressing neural networks
CN109658953A (en) A kind of vagitus recognition methods, device and equipment
CN111863014A (en) Audio processing method and device, electronic equipment and readable storage medium
CN111564161B (en) Sound processing device and method for intelligently suppressing noise, terminal equipment and readable medium
JP5018120B2 (en) Mobile terminal, program, and display screen control method for mobile terminal
JP2020068973A (en) Emotion estimation and integration device, and emotion estimation and integration method and program
CN112002339A (en) Voice noise reduction method and device, computer-readable storage medium and electronic device
CN111339511A (en) Identity validity verification method and device and terminal equipment
CN113823313A (en) Voice processing method, device, equipment and storage medium
CN115670397B (en) PPG artifact identification method and device, storage medium and electronic equipment
US20230186933A1 (en) Voice noise reduction method, electronic device, non-transitory computer-readable storage medium
CN111144347A (en) Data processing method, device, platform and storage medium
CN112331187B (en) Multi-task speech recognition model training method and multi-task speech recognition method
CN108958699A (en) Voice pick-up method and Related product
CN113178204B (en) Single-channel noise reduction low-power consumption method, device and storage medium
CN114759904A (en) Data processing method, device, equipment, readable storage medium and program product
CN110232393B (en) Data processing method and device, storage medium and electronic device
CN113160850A (en) Audio feature extraction method and device based on re-parameterization decoupling mode
CN113314147A (en) Training method and device of audio processing model and audio processing method and device
CN112489678A (en) Scene recognition method and device based on channel characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant