CN111968662A - Audio signal processing method and device and storage medium - Google Patents

Audio signal processing method and device and storage medium Download PDF

Info

Publication number
CN111968662A
CN111968662A CN202010796977.4A CN202010796977A CN111968662A CN 111968662 A CN111968662 A CN 111968662A CN 202010796977 A CN202010796977 A CN 202010796977A CN 111968662 A CN111968662 A CN 111968662A
Authority
CN
China
Prior art keywords
noise
frequency
signal
power spectrum
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010796977.4A
Other languages
Chinese (zh)
Inventor
何梦楠
王林章
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Pinecone Electronic Co Ltd
Original Assignee
Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Pinecone Electronic Co Ltd filed Critical Beijing Xiaomi Pinecone Electronic Co Ltd
Priority to CN202010796977.4A priority Critical patent/CN111968662A/en
Publication of CN111968662A publication Critical patent/CN111968662A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Noise Elimination (AREA)

Abstract

The disclosure relates to a method and a device for processing an audio signal and a storage medium. The method comprises the following steps: acquiring a to-be-processed frequency power spectrum with noise and a noise power spectrum; determining a first noise component according to the power characteristics of the power spectrum with the noise frequency at each frequency point; the first noise component is a steady-state noise component contained in both the frequency power spectrum with noise and the noise power spectrum; subtracting the first noise component from the noisy frequency power spectrum to obtain a noisy frequency component; subtracting the first noise component from the noise power spectrum to obtain a second noise component; determining a frequency domain estimation signal according to the frequency component with the noise and the second noise component; and performing time-frequency conversion based on the frequency domain estimation signal to obtain a noise reduction audio signal. According to the technical scheme of the embodiment of the disclosure, the common steady-state noise component of the band-noise frequency power spectrum and the noise power spectrum is removed, and then the noise reduction processing is performed on the audio signal, so that the processing deviation caused by steady-state noise can be reduced, and the noise reduction effect is improved.

Description

Audio signal processing method and device and storage medium
Technical Field
The present disclosure relates to signal processing technologies, and in particular, to a method and an apparatus for processing an audio signal, and a storage medium.
Background
With the continuous development of communication technology and internet technology, the processing of multimedia information becomes an important research direction for information communication. In order to realize clearer and higher-quality communication or data transmission, noise reduction processing needs to be performed on the audio signal. The noise reduction process may generally adopt a filtering manner to remove noise components in the signal, thereby improving the signal quality. However, in the process of noise reduction, it is often difficult to separate the noise component in the signal from the clean audio signal, so that the noise reduction effect is difficult to meet the increasing user requirements.
Disclosure of Invention
The disclosure provides a method and a device for processing an audio signal and a storage medium.
According to a first aspect of the embodiments of the present disclosure, there is provided a method for processing an audio signal, including:
acquiring a to-be-processed frequency power spectrum with noise and a noise power spectrum;
determining a first noise component according to the power characteristics of the power spectrum with the noise at each frequency point; wherein the first noise component is a steady-state noise component contained in both the noisy frequency power spectrum and the noise power spectrum;
subtracting the first noise component from the noisy frequency power spectrum to obtain a noisy frequency component;
subtracting the first noise component from the noise power spectrum to obtain a second noise component;
determining a frequency domain estimation signal according to the noisy frequency component and the second noise component;
and performing time-frequency conversion based on the frequency domain estimation signal to obtain a noise reduction audio signal.
In some embodiments, the determining a first noise component according to the power characteristics of the noisy frequency power spectrum at each frequency point includes:
performing band-pass filtering processing on the band-noise frequency power spectrum in a preset frequency band to obtain a frequency smooth power spectrum;
according to the time smoothing parameter, smoothing the frequency smoothing power spectrum to obtain a time smoothing power spectrum;
and determining the first noise component according to the frequency points meeting the power spectral density condition in the time smooth power spectrum on each frequency point.
In some embodiments, the determining the first noise component according to the frequency points satisfying the power spectral density condition in the time-smoothed power spectrum at each frequency point includes:
for each frame in the time domain, determining the frequency point with the minimum power spectral density of each frame according to the filtered power spectrum with the noise on each frequency point, and obtaining the minimum frequency point of each frame;
and selecting the frequency point with the minimum power spectral density from the minimum frequency points of each frame in the time domain as the first noise component.
In some embodiments, said determining a frequency domain estimation signal based on said noisy frequency component and said second noise component comprises:
determining a posterior signal-to-noise ratio according to the noisy frequency component and the second noise component;
determining a prior signal-to-noise ratio according to the posterior signal-to-noise ratio;
and determining the frequency domain estimation signal according to the prior signal-to-noise ratio and the to-be-processed noisy frequency power spectrum.
In some embodiments, said determining an a priori signal-to-noise ratio from said a posteriori signal-to-noise ratio comprises:
determining the prior signal-to-noise ratio of the current frame according to the posterior signal-to-noise ratio of the current frame and the posterior signal-to-noise ratio of the previous frame.
In some embodiments, the determining the prior snr of the current frame according to the a posteriori snr of the current frame and the a posteriori snr of the previous frame includes:
determining the prior signal-to-noise ratio of the current frame according to the posterior signal-to-noise ratio of the current frame with the first weight and the posterior signal-to-noise ratio of the last frame with the second weight; wherein the first weight is determined according to the prior signal-to-noise ratio and a predetermined smoothing coefficient; the second weight is determined according to the smoothing coefficient.
In some embodiments, the determining the frequency-domain estimation signal according to the a priori signal-to-noise ratio and the to-be-processed noisy frequency-spectrum power spectrum includes:
determining a gain function according to the prior signal-to-noise ratio;
and multiplying the to-be-processed noisy frequency power spectrum by the gain function to obtain the frequency domain estimation signal.
In some embodiments, the method further comprises:
acquiring an audio signal to be processed;
performing time-frequency conversion on the audio signal to obtain a frequency domain signal with noise;
and determining the frequency power spectrum with the noise to be processed according to the frequency domain signal with the noise.
According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for processing an audio signal, including:
the first acquisition module is used for acquiring a band noise frequency power spectrum and a noise power spectrum to be processed;
the first determining module is used for determining a first noise component according to the power characteristics of the power spectrum with the noise at each frequency point; wherein the first noise component is a steady-state noise component contained in both the noisy frequency power spectrum and the noise power spectrum;
the first calculation module is used for subtracting the first noise component from the noisy frequency power spectrum to obtain a noisy frequency component;
the second calculation module is used for subtracting the first noise component from the noise power spectrum to obtain a second noise component;
a second determining module, configured to determine a frequency domain estimation signal according to the noisy frequency component and the second noise component;
and the first conversion module is used for carrying out time-frequency conversion on the basis of the frequency domain estimation signal to obtain a noise reduction audio signal.
In some embodiments, the first determining module comprises:
the filtering submodule is used for carrying out band-pass filtering processing on the band-noise frequency power spectrum in a preset frequency band to obtain a frequency smooth power spectrum;
the smoothing submodule is used for smoothing the frequency smoothing power spectrum according to the time smoothing parameter to obtain a time smoothing power spectrum;
and the first determining submodule is used for determining the first noise component according to the frequency points meeting the power spectral density condition in the time smooth power spectrum on each frequency point.
In some embodiments, the first determining submodule is specifically configured to:
for each frame in the time domain, determining the frequency point with the minimum power spectral density of each frame according to the filtered power spectrum with the noise on each frequency point, and obtaining the minimum frequency point of each frame;
and selecting the frequency point with the minimum power spectral density from the minimum frequency points of each frame in the time domain as the first noise component.
In some embodiments, the second determining module comprises:
a second determining submodule, configured to determine an a posteriori signal-to-noise ratio according to the noisy frequency component and the second noise component;
the third determining submodule is used for determining the prior signal-to-noise ratio according to the posterior signal-to-noise ratio;
and the fourth determining submodule is used for determining the frequency domain estimation signal according to the prior signal-to-noise ratio and the frequency domain noisy signal to be processed.
In some embodiments, the third determining submodule is specifically configured to:
determining the prior signal-to-noise ratio of the current frame according to the posterior signal-to-noise ratio of the current frame and the posterior signal-to-noise ratio of the previous frame.
In some embodiments, the third determining submodule is specifically configured to:
determining the prior signal-to-noise ratio of the current frame according to the posterior signal-to-noise ratio of the current frame with the first weight and the posterior signal-to-noise ratio of the last frame with the second weight; wherein the first weight is determined according to the prior signal-to-noise ratio and a predetermined smoothing coefficient; the second weight is determined according to the smoothing coefficient.
In some embodiments, the fourth determining submodule is specifically configured to:
determining a gain function according to the prior signal-to-noise ratio;
and multiplying the to-be-processed noisy frequency power spectrum by the gain function to obtain the frequency domain estimation signal.
In some embodiments, the apparatus further comprises:
the second acquisition module is used for acquiring the audio signal to be processed;
the second conversion module is used for carrying out time-frequency conversion on the audio signal to obtain a frequency domain signal with noise;
and the third determining module is used for determining the to-be-processed noisy frequency power spectrum according to the frequency domain noisy signal.
According to a third aspect of the embodiments of the present disclosure, there is provided an apparatus for processing an audio signal, including at least: a processor and a memory for storing executable instructions operable on the processor, wherein:
the processor is configured to execute the executable instructions, and the executable instructions perform the steps of any one of the audio signal processing methods.
According to a fourth aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, implement the steps in any of the methods of processing an audio signal described above.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: according to the technical scheme of the embodiment of the disclosure, the common steady-state noise component is removed from the band-noise frequency power spectrum and then the noise reduction processing is performed on the audio signal, compared with the mode of directly estimating the noise according to the posterior signal-to-noise ratio and reducing the noise, the processing deviation caused by the steady-state noise contained in the audio signal and the noise is considered, and the pure noise reduction audio power spectrum is determined by respectively removing the band-noise frequency power spectrum and the noise power spectrum after the steady-state noise is removed, so that the deviation between the noise reduction audio signal and an ideal pure voice signal can be reduced, and the noise reduction effect is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a first flowchart illustrating a method of processing an audio signal according to an exemplary embodiment;
FIG. 2 is a flowchart II illustrating a method of processing an audio signal according to an exemplary embodiment;
fig. 3 is a flowchart three illustrating a method of processing an audio signal according to an exemplary embodiment;
fig. 4 is a block diagram illustrating a configuration of an audio signal processing apparatus according to an exemplary embodiment;
fig. 5 is a block diagram illustrating a physical structure of an audio signal processing apparatus according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
Fig. 1 is a flowchart illustrating a method of processing an audio signal according to an exemplary embodiment, as shown in fig. 1, the method including the steps of:
s101, acquiring a to-be-processed frequency power spectrum with noise and a noise power spectrum;
step S102, determining a first noise component according to the power characteristics of the power spectrum with the noise at each frequency point; wherein the first noise component is a steady-state noise component contained in both the noisy frequency power spectrum and the noise power spectrum;
step S103, subtracting the first noise component from the noisy frequency power spectrum to obtain a noisy frequency component;
step S104, subtracting the first noise component from the noise power spectrum to obtain a second noise component;
step S105, determining a frequency domain estimation signal according to the frequency component with noise and the second noise component;
and S106, performing time-frequency conversion based on the frequency domain estimation signal to obtain a noise reduction audio signal.
The method of the embodiment of the present disclosure may be applied to a terminal, where the terminal may be an electronic device having an audio acquisition component (e.g., having a microphone), and the method includes: the mobile phone, the notebook computer, the video camera, the wearable electronic equipment and various electronic equipment with human-computer interaction capability. The electronic device may also be an electronic device having an audio file processing function, such as a computer and a sound device that do not have an audio capture function but can process an audio file.
The audio signal is processed at the terminal to obtain a noise reduction audio signal with higher signal quality and most noise filtered. After the terminal processes the audio signal to obtain the noise reduction audio signal, the noise reduction audio signal can be transmitted to a predetermined device, such as a multimedia playing device like a sound box, a television and the like, and can also be played by using the playing function of the terminal. Of course, the noise-reduced audio signal may be processed by encoding, etc. to form an audio file that is convenient for transmission or storage.
In the embodiment of the disclosure, the power spectrum of the band noise frequency is the power spectral density of the original band noise frequency signal at each frequency point, and is used for representing the energy of the audio signal at each frequency point. The noise power spectrum can be obtained by noise detection, and a noise signal is obtained by detecting environmental noise in the absence of voice, so that the power spectral density of noise at each frequency point is determined.
Since the noise includes stationary noise and non-stationary noise, the stationary noise refers to continuous noise whose fluctuation range is smaller than a predetermined volume, or impulse noise whose repetition frequency is larger than a predetermined frequency. For example, continuous noise with a fluctuation range within 5dB (decibel), or impulse noise with a repetition frequency greater than 10 Hz. The original noise-frequency signal and the detected noise both contain approximately the same steady-state noise components, and the steady-state noise is usually concentrated in a certain frequency range, so that when the frequency domain signal is used for estimating the posterior signal-to-noise ratio, the estimation of the posterior signal-to-noise ratio is inaccurate due to the influence of the frequency domain components of the steady-state noise, and the accuracy of the noise reduction processing is further influenced.
Therefore, in the embodiment of the present disclosure, the stationary noise may be estimated according to the noisy audio power spectrum to obtain the first noise component, and then the first noise component representing the stationary noise is respectively extracted from the noisy audio power spectrum and the noise power spectrum. For example, the frequency point corresponding to the stationary noise can be estimated from the frequency point in the noisy frequency power spectrum that matches the stationary noise characteristics. And then, estimating a pure audio part in the original audio signal based on the band noise frequency component with the first noise component removed and the second noise component, namely, realizing the noise reduction processing of the original band noise frequency signal, thereby obtaining the noise reduction audio signal.
Therefore, according to the technical scheme of the embodiment of the disclosure, the steady-state noise is estimated according to the original band noise frequency signal, and then the noise reduction audio signal is determined by using the signal component and the noise component after the steady-state noise is removed, so that the processing deviation of the steady-state noise on the noise estimation in the signal can be reduced, the noise reduction effect is improved, and the noise reduction audio signal has higher signal quality.
In some embodiments, as shown in fig. 2, in the step S102, the determining a first noise component according to the power characteristic of the noisy frequency power spectrum at each frequency point includes:
step S201, performing band-pass filtering processing on the band-noise frequency power spectrum in a preset frequency band to obtain a frequency smooth power spectrum;
step S202, smoothing the frequency smoothing power spectrum according to a time smoothing parameter to obtain a time smoothing power spectrum;
step S203, determining the first noise component according to the frequency points meeting the power spectrum density condition in the time smooth power spectrum on each frequency point.
In the embodiment of the present disclosure, the process of determining the first noise component is a process of estimating the stationary noise according to the original frequency signal with noise.
Because the steady-state noise has the characteristic of small variation amplitude, the audio frequency of the steady-state noise is concentrated in a certain frequency range and has low energy, and the audio signal in the voice does not always keep a continuous and unchangeable frequency. Therefore, the frequency point satisfying the power spectral density condition in the noisy power spectrum can be used as the frequency point of the estimated stationary noise. Here, the power spectral density condition may be: the frequency point with power less than the predetermined threshold value, for example, the frequency point of the signal with power less than that corresponding to normal volume of the voice signal, or the frequency point with minimum power in a predetermined frequency band range, or the frequency point with minimum power in the full frequency band where the audio signal is located. In practical applications, the above condition may be determined according to the actual characteristics of the stationary noise of the environment.
In the embodiment of the present disclosure, in the process of estimating the stationary noise, the power spectrum of the noisy audio frequency may be subjected to band-pass filtering within a predetermined frequency band. Namely, a window function with a preset length is used for windowing, so that the frequency power spectrum with noise is smoothed in the frequency dimension, and the frequency smoothed power spectrum is obtained. And then smoothing the frequency smooth power spectrum on a time dimension so as to remove a jump point in the frequency power spectrum with the noise.
By judging the frequency points meeting the power spectral density condition according to the time smooth power spectrum obtained after the smoothing treatment, the estimation deviation caused by the conditions of signal jumping and the like in the audio signal can be reduced, and the first noise component closer to the actual steady-state noise can be obtained.
In some embodiments, the determining the first noise component according to the frequency points satisfying the power spectral density condition in the time-smoothed power spectrum at each frequency point includes:
for each frame in the time domain, determining the frequency point with the minimum power spectral density of each frame according to the filtered power spectrum with the noise on each frequency point, and obtaining the minimum frequency point of each frame;
and selecting the frequency point with the minimum power spectral density from the minimum frequency points of each frame in the time domain as the first noise component.
In the embodiment of the present disclosure, a frequency point with a minimum power spectral density may be searched from a frequency spectrum with noise in a minimum value search manner, and the frequency point is used as the first noise component.
When searching is carried out, each frame in the time domain can be searched respectively, processing can be carried out according to the time sequence, and then the frequency point with the minimum power spectral density can be selected from the minimum frequency points of each frame at intervals, so that the frequency point with the minimum power spectral density in the whole audio signal can be found. Thus, the frequency point with the minimum power spectral density is found from the audio signal along with the time, so that the stationary noise is estimated more accurately, and the non-stationary noise and the audio signal contained in the first noise component are reduced.
In some embodiments, said determining a frequency domain estimation signal based on said noisy frequency component and said second noise component comprises:
determining a posterior signal-to-noise ratio according to the noisy frequency component and the second noise component;
determining a prior signal-to-noise ratio according to the posterior signal-to-noise ratio;
and determining the frequency domain estimation signal according to the prior signal-to-noise ratio and the to-be-processed noisy frequency power spectrum.
The posterior signal-to-noise ratio is the ratio of the power spectrum of the original band noise frequency signal and the power spectrum of the noise. In the embodiment of the present disclosure, the posterior snr is the band noise frequency component after removing the steady-state noise for the power spectrum of the original band noise frequency signal. The power of each frequency point can be calculated by the frequency domain signal of the original signal with noise frequency to obtain the power spectrum of the original signal with noise frequency, and the power of each frequency point can be calculated by the frequency domain signal of the noise signal obtained by noise detection to obtain the noise power spectrum. Then, the first noise components are respectively subtracted to remove the steady-state noise. The ratio of the frequency component with noise to the second noise component thus obtained is the above-mentioned a posteriori signal-to-noise ratio.
The prior signal-to-noise ratio is the ratio of the pure voice signal to the noise component contained in the original audio signal, and because the audio signals acquired by the terminal are all signals with noise, ideal pure voice signals cannot be directly acquired, and therefore the prior signal-to-noise ratio can be estimated according to the posterior signal-to-noise ratio. And processing the original signal with noise according to the estimated prior signal-to-noise ratio to realize the noise reduction of the audio signal. That is, the product of the prior signal-to-noise ratio and the to-be-processed frequency-to-noise power spectrum is used as the frequency domain signal after the noise reduction processing, and since the prior signal-to-noise ratio is the signal-to-noise ratio parameter obtained by estimation, the frequency domain signal after the noise reduction processing is also an estimated signal, that is, a frequency domain estimation signal.
Therefore, the noise reduction processing of the original signal with noise can be realized according to the processing of the original signal with noise and noise, and the audio signal which is as close to pure voice as possible can be obtained.
In some embodiments, said determining an a priori signal-to-noise ratio from said a posteriori signal-to-noise ratio comprises:
determining the prior signal-to-noise ratio of the current frame according to the posterior signal-to-noise ratio of the current frame and the posterior signal-to-noise ratio of the previous frame.
In the embodiment of the present disclosure, the estimation of the prior signal-to-noise ratio according to the posterior signal-to-noise ratio can be implemented according to the posterior signal-to-noise ratio corresponding to the multi-frame audio signal. Therefore, the estimation of the expected value of the prior signal-to-noise ratio can be realized by utilizing the stability or regularity of the audio signal and processing the posterior signal-to-noise ratio according to the time sequence.
In some embodiments, the determining the prior snr of the current frame according to the a posteriori snr of the current frame and the a posteriori snr of the previous frame includes:
determining the prior signal-to-noise ratio of the current frame according to the posterior signal-to-noise ratio of the current frame with the first weight and the posterior signal-to-noise ratio of the last frame with the second weight; wherein the first weight is determined according to the prior signal-to-noise ratio and a predetermined smoothing coefficient; the second weight is determined according to the smoothing coefficient.
In the embodiment of the present disclosure, the implementation manner of estimating the prior snr of the current frame according to the posteriori snrs of the adjacent frames is an estimation manner of weighted average. The first weight may be determined based on a gain function including an a priori signal-to-noise ratio and a smoothing coefficient, and the second weight may be determined based on the smoothing coefficient. Thus, the estimation of the prior snr is also an estimation of the gain function containing the prior snr. The estimation method of DD (Decision Directed) algorithm can be referred to:
priorSNR(k,l)=w*H(k,l)*postSNR(k,l-1)+(1-w)*MAX(postSNR(k,l)-1,0)
wherein k is a frequency point, l is a frame number, priorrSNR (k, l) is a prior signal-to-noise ratio of the kth frequency point of the l frame, and postSNR (k, l) is a posterior signal-to-noise ratio of the kth frequency point of the l frame; w is the smoothing coefficient and takes a value between 0 and 1, for example, 0.8 can be taken; h (k, l) is the gain function of the wiener filter, determined by the a priori signal-to-noise ratio. Therefore, the algorithm can be used for realizing the estimation of the prior signal-to-noise ratio according to the posterior signal-to-noise ratio.
In some embodiments, the determining the frequency-domain estimation signal according to the a priori signal-to-noise ratio and the to-be-processed noisy frequency-spectrum power spectrum includes:
determining a gain function according to the prior signal-to-noise ratio;
and multiplying the to-be-processed noisy frequency power spectrum by the gain function to obtain the frequency domain estimation signal.
Here, the gain function is the ratio of the power spectrum of clean speech to the power spectrum of clean speech plus the noise power spectrum. Since the noisy band signal is understood to be composed of clean speech and noise, the power spectrum of the noisy band signal is multiplied by a gain function to obtain an estimated power spectrum of the clean speech, i.e., the frequency domain estimation signal.
The gain function may be calculated from the prior signal-to-noise ratio, i.e. the ratio of the prior signal-to-noise ratio to the prior signal-to-noise ratio plus 1.
Therefore, the prior signal-to-noise ratio can be obtained through the DD algorithm, namely the gain function is obtained, so that the noise reduction processing is carried out on the band noise frequency power spectrum by using the gain function, then the Fourier inversion is carried out, the frequency domain signal is converted into the time domain, and the enhanced voice signal is obtained.
In some embodiments, the method further comprises:
acquiring an audio signal to be processed;
performing time-frequency conversion on the audio signal to obtain a frequency domain signal with noise;
and determining the frequency power spectrum with the noise to be processed according to the frequency domain signal with the noise.
In the embodiment of the disclosure, the audio signal can be collected by the audio collecting component, and the audio signal in the file in the audio format can also be directly obtained. The audio signal can directly utilize the time domain signal played by the audio playing component, therefore, when the noise reduction processing is carried out, the audio signal can be converted from the time domain to the frequency domain signal by utilizing the Fourier transform, and then the power spectrum of the frequency with noise is determined according to the power of the frequency domain signal with noise at each frequency point. Here, only simple time-frequency conversion and other operation processing need to be performed, and the noise reduction processing can be realized on the acquired audio signal in the noise reduction mode.
The disclosed embodiments also provide the following examples:
when a mobile phone is used for calling or audio processing, the gain function of the wiener filter can be estimated according to the posterior signal-to-noise ratio, and then noise reduction processing is carried out on a noisy speech signal. The wiener filter is a filtering mode for minimizing the mean square value of estimation errors, and has good smoothing effect and noise reduction effect. The key for improving the filtering effect is to improve the accuracy of the gain function of the wiener filter.
For audio signals in the frequency domain, the gain function H (k, l) of the k-th frequency bin of the l-th frame is
Figure BDA0002625997020000091
Wherein, Px(k, l) is the power spectrum of the pure voice of the kth frequency point and the l frame; pdAnd (k, l) is the noise power spectrum of the kth frequency point and the l frame. The numerator and denominator of the gain function are simultaneously divided by Pd(k, l), the gain function form based on the prior signal-to-noise ratio can be obtained:
Figure BDA0002625997020000092
wherein the content of the first and second substances,
Figure BDA0002625997020000093
the prior snr representing the kth frequency point of the ith frame, i.e., H (k, l) ═ priorrsnr (k, l)/(1+ priorrsnr (k, l)) ═ gain. Based on the DD algorithm, the prior signal-to-noise ratio can be estimated by using the weighted average of the posterior signal-to-noise ratios:
priorSNR(k,l)=w*H(k,l)*postSNR(k,l-1)+(1-w)*MAX(postSNR(k,l)-1,0)
w is a smoothing factor as a weight of the a posteriori SNR of two adjacent frames, priorSNR (k, l) is the a priori SNR Ek(k, l), postSNR (k, l) is the posterior signal-to-noise ratio:
Figure BDA0002625997020000101
y (k, l) is a power spectrum of an original signal with noise frequency at the kth frequency point of the frame l, and D (k, l) is a power spectrum peculiar to noise obtained through detection.
The noise in the actual environment is divided into steady-state noise and unsteady-state noise, and both the band noise frequency power spectrum and the noise power spectrum of the band noise frequency signal contain steady-state noise, so that the estimation of the posterior signal-to-noise ratio is inaccurate, the estimation of the prior signal-to-noise ratio is further influenced, and the gain function of the wiener filter is inaccurate.
Therefore, in the embodiment of the disclosure, the steady-state noise is removed from both the noisy frequency power spectrum and the noisy power spectrum, and an improved posterior signal-to-noise ratio is obtained
Figure BDA0002625997020000102
Wherein, Ps(k, l) is steady-state noise, which can be estimated according to the noisy frequency power spectrum, and also can be estimated according to the noise power spectrum: ps(k,l)=MIN(Ps_Y(k,l),Psl-D (k, l)), wherein PsY (k, l) is steady state noise based on an estimate of the noisy frequency power spectrum, PsD (k, l) is the steady state noise estimated based on the noise power spectrum.
In the embodiment of the present disclosure, the estimation of the steady-state noise may be performed based on a Minimum Control Recursive Average (MCRA) algorithm based on a noisy frequency power spectrum:
step S301, respectively performing two-dimensional smoothing of time and frequency on the power spectrum of the voice with noise:
frequency smoothing:
Figure BDA0002625997020000103
and (3) time smoothing: s (k, l) ═ αs(k,l)S(k,l-1)+Sf(k,l)
Wherein alpha iss(0<αs< 1) is the smoothing factor, b (i) is a window, length 2w-1, window length can be used to control the search speed of the minimum.
Step S302, carrying out minimum value search on the smoothed noisy speech power spectrum to obtain estimated steady-state noise:
Smin(k,l)=min{Smin(k,l-1),S(k,l)}
Stmp(k,l)=min{Stmp(k,l-1),S(k,l)}
wherein Stmp is the minimum value searched at intervals, and Smin is the overall minimum value. The two are equal to each other, and after L frames, statistics can be performed for one time:
Smin(k,l)=min{Stmp(k,l-1),S(k,l)}
Stmp(k,l)=S(k,l)
finally, the frequency point of the minimum value can be tracked and used as the steady state noise of the estimationSound Ps(k,l)。
Step S303, updating the posterior signal-to-noise ratio according to the estimated steady-state noise:
Figure BDA0002625997020000111
step S304, based on the updated posterior signal-to-noise ratio, estimating the prior signal-to-noise ratio through a DD algorithm, and obtaining a gain function of the wiener filter:
priorSNR(k,l)=w*gain*postSNR_new(k,l-1)+(1-w)*MAX(postSNR_new(k,l)-1,0)
here, gain is a more accurate gain function estimated based on the updated a posteriori signal-to-noise ratio.
Step S305, based on the obtained gain function, denoising the voice signal:
Px(k, l) gain (k, l) Y (k, l), for PxAnd (k, l) performing FFT inverse transformation to obtain an enhanced voice signal of a time domain.
Through the technical scheme of the embodiment of the disclosure, more noises can be consistent, and a better smoothing effect is achieved. Meanwhile, the method belongs to a dynamic estimation method, does not need the limitation of a fixed threshold, and is applicable to more scenes.
Fig. 4 is a block diagram illustrating a configuration of an apparatus for processing an audio signal according to an exemplary embodiment, and as shown in fig. 4, the apparatus 400 includes:
a first obtaining module 401, configured to obtain a power spectrum with noise frequency and a noise power spectrum to be processed;
a first determining module 402, configured to determine a first noise component according to a power characteristic of the noisy frequency power spectrum at each frequency point; wherein the first noise component is a steady-state noise component contained in both the noisy frequency power spectrum and the noise power spectrum;
a first calculating module 403, configured to subtract the first noise component from the noisy frequency power spectrum to obtain a noisy frequency component;
a second calculating module 404, configured to subtract the first noise component from the noise power spectrum to obtain a second noise component;
a second determining module 405, configured to determine a frequency domain estimation signal according to the noisy frequency component and the second noise component;
a first conversion module 406, configured to perform time-frequency conversion based on the frequency domain estimation signal to obtain a noise reduction audio signal.
In some embodiments, the first determining module comprises:
the filtering submodule is used for carrying out band-pass filtering processing on the band-noise frequency power spectrum in a preset frequency band to obtain a frequency smooth power spectrum;
the smoothing submodule is used for smoothing the frequency smoothing power spectrum according to the time smoothing parameter to obtain a time smoothing power spectrum;
and the first determining submodule is used for determining the first noise component according to the frequency points meeting the power spectral density condition in the time smooth power spectrum on each frequency point.
In some embodiments, the first determining submodule is specifically configured to:
for each frame in the time domain, determining the frequency point with the minimum power spectral density of each frame according to the filtered power spectrum with the noise on each frequency point, and obtaining the minimum frequency point of each frame;
and selecting the frequency point with the minimum power spectral density from the minimum frequency points of each frame in the time domain as the first noise component.
In some embodiments, the second determining module comprises:
a second determining submodule, configured to determine an a posteriori signal-to-noise ratio according to the noisy frequency component and the second noise component;
the third determining submodule is used for determining the prior signal-to-noise ratio according to the posterior signal-to-noise ratio;
and the fourth determining submodule is used for determining the frequency domain estimation signal according to the prior signal-to-noise ratio and the frequency domain noisy signal to be processed.
In some embodiments, the third determining submodule is specifically configured to:
estimating the prior signal-to-noise ratio of the current frame according to the posterior signal-to-noise ratio of the current frame and the posterior signal-to-noise ratio of the previous frame.
In some embodiments, the third determining submodule is specifically configured to:
determining the prior signal-to-noise ratio of the current frame according to the posterior signal-to-noise ratio of the current frame with the first weight and the posterior signal-to-noise ratio of the last frame with the second weight; wherein the first weight is determined according to the prior signal-to-noise ratio and a predetermined smoothing coefficient; the second weight is determined according to the smoothing coefficient.
In some embodiments, the fourth determining submodule is specifically configured to:
determining a gain function according to the prior signal-to-noise ratio;
and multiplying the to-be-processed noisy frequency power spectrum by the gain function to obtain the frequency domain estimation signal.
In some embodiments, the apparatus further comprises:
the second acquisition module is used for acquiring the audio signal to be processed;
the second conversion module is used for carrying out time-frequency conversion on the audio signal to obtain a frequency domain signal with noise;
and the third determining module is used for determining the to-be-processed noisy frequency power spectrum according to the frequency domain noisy signal.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 5 is a block diagram illustrating an apparatus 500 according to an example embodiment. For example, the apparatus 500 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and so forth.
Referring to fig. 5, the apparatus 500 may include one or more of the following components: processing component 501, memory 502, power component 503, multimedia component 504, audio component 505, input/output (I/O) interface 506, sensor component 507, and communication component 508.
The processing component 501 generally controls the overall operation of the device 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 501 may include one or more processors 510 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 501 may also include one or more modules that facilitate interaction between the processing component 501 and other components. For example, the processing component 501 may include a multimedia module to facilitate interaction between the multimedia component 504 and the processing component 501.
The memory 510 is configured to store various types of data to support operations at the apparatus 500. Examples of such data include instructions for any application or method operating on the apparatus 500, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 502 may be implemented by any type or combination of volatile or non-volatile storage devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 503 provides power to the various components of the device 500. The power supply component 503 may include: a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the apparatus 500.
The multimedia component 504 includes a screen that provides an output interface between the device 500 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 504 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 500 is in an operating mode, such as a shooting mode or a video mode. Each front camera and/or rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 505 is configured to output and/or input audio signals. For example, audio component 505 includes a Microphone (MIC) configured to receive external audio signals when apparatus 500 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 510 or transmitted via the communication component 508. In some embodiments, audio component 505 further comprises a speaker for outputting audio signals.
The I/O interface 506 provides an interface between the processing component 501 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 507 includes one or more sensors for providing various aspects of condition assessment for the device 500. For example, the sensor assembly 507 may detect the open/closed status of the device 500, the relative positioning of components such as a display and keypad of the device 500, the sensor assembly 507 may also detect a change in the position of the device 500 or a component of the device 500, the presence or absence of user contact with the device 500, the orientation or acceleration/deceleration of the device 500, and a change in the temperature of the device 500. The sensor assembly 507 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 507 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 507 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 508 is configured to facilitate wired or wireless communication between the apparatus 500 and other devices. The apparatus 500 may access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof. In an exemplary embodiment, the communication component 508 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 508 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, or other technologies.
In an exemplary embodiment, the apparatus 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 502 comprising instructions, executable by the processor 510 of the apparatus 500 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
The embodiments of the present disclosure also provide a non-transitory computer-readable storage medium, where instructions in the storage medium, when executed by a processor of a mobile terminal, enable the mobile terminal to perform the method provided in any of the embodiments.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (18)

1. A method of processing an audio signal, comprising:
acquiring a to-be-processed frequency power spectrum with noise and a noise power spectrum;
determining a first noise component according to the power characteristics of the power spectrum with the noise at each frequency point; wherein the first noise component is a steady-state noise component contained in both the noisy frequency power spectrum and the noise power spectrum;
subtracting the first noise component from the noisy frequency power spectrum to obtain a noisy frequency component;
subtracting the first noise component from the noise power spectrum to obtain a second noise component;
determining a frequency domain estimation signal according to the noisy frequency component and the second noise component;
and performing time-frequency conversion based on the frequency domain estimation signal to obtain a noise reduction audio signal.
2. The method according to claim 1, wherein the determining a first noise component according to the power characteristics of the noisy frequency power spectrum at each frequency point comprises:
performing band-pass filtering processing on the band-noise frequency power spectrum in a preset frequency band to obtain a frequency smooth power spectrum;
according to the time smoothing parameter, smoothing the frequency smoothing power spectrum to obtain a time smoothing power spectrum;
and determining the first noise component according to the frequency points meeting the power spectral density condition in the time smooth power spectrum on each frequency point.
3. The method according to claim 2, wherein the determining the first noise component according to the frequency points satisfying the power spectral density condition in the time-smoothed power spectrum at each frequency point comprises:
for each frame in the time domain, determining the frequency point with the minimum power spectral density of each frame according to the filtered power spectrum with the noise on each frequency point, and obtaining the minimum frequency point of each frame;
and selecting the frequency point with the minimum power spectral density from the minimum frequency points of each frame in the time domain as the first noise component.
4. The method according to any one of claims 1 to 3, wherein determining a frequency domain estimation signal based on the noisy frequency component and the second noise component comprises:
determining a posterior signal-to-noise ratio according to the noisy frequency component and the second noise component;
determining a prior signal-to-noise ratio according to the posterior signal-to-noise ratio;
and determining the frequency domain estimation signal according to the prior signal-to-noise ratio and the to-be-processed noisy frequency power spectrum.
5. The method of claim 4, wherein said determining an a priori signal-to-noise ratio based on said a posteriori signal-to-noise ratio comprises:
determining the prior signal-to-noise ratio of the current frame according to the posterior signal-to-noise ratio of the current frame and the posterior signal-to-noise ratio of the previous frame.
6. The method of claim 5, wherein determining the prior SNR for the current frame based on the posterior SNR for the current frame and the posterior SNR for the previous frame comprises:
determining the prior signal-to-noise ratio of the current frame according to the posterior signal-to-noise ratio of the current frame with the first weight and the posterior signal-to-noise ratio of the last frame with the second weight; wherein the first weight is determined according to the prior signal-to-noise ratio and a predetermined smoothing coefficient; the second weight is determined according to the smoothing coefficient.
7. The method of claim 4, wherein determining the frequency-domain estimation signal according to the prior SNR and the noisy frequency-domain power spectrum to be processed comprises:
determining a gain function according to the prior signal-to-noise ratio;
and multiplying the to-be-processed noisy frequency power spectrum by the gain function to obtain the frequency domain estimation signal.
8. The method of claim 1, further comprising:
acquiring an audio signal to be processed;
performing time-frequency conversion on the audio signal to obtain a frequency domain signal with noise;
and determining the frequency power spectrum with the noise to be processed according to the frequency domain signal with the noise.
9. An apparatus for processing an audio signal, comprising:
the first acquisition module is used for acquiring a band noise frequency power spectrum and a noise power spectrum to be processed;
the first determining module is used for determining a first noise component according to the power characteristics of the power spectrum with the noise at each frequency point; wherein the first noise component is a steady-state noise component contained in both the noisy frequency power spectrum and the noise power spectrum;
the first calculation module is used for subtracting the first noise component from the noisy frequency power spectrum to obtain a noisy frequency component;
the second calculation module is used for subtracting the first noise component from the noise power spectrum to obtain a second noise component;
a second determining module, configured to determine a frequency domain estimation signal according to the noisy frequency component and the second noise component;
and the first conversion module is used for carrying out time-frequency conversion on the basis of the frequency domain estimation signal to obtain a noise reduction audio signal.
10. The apparatus of claim 9, wherein the first determining module comprises:
the filtering submodule is used for carrying out band-pass filtering processing on the band-noise frequency power spectrum in a preset frequency band to obtain a frequency smooth power spectrum;
the smoothing submodule is used for smoothing the frequency smoothing power spectrum according to the time smoothing parameter to obtain a time smoothing power spectrum;
and the first determining submodule is used for determining the first noise component according to the frequency points meeting the power spectral density condition in the time smooth power spectrum on each frequency point.
11. The apparatus of claim 10, wherein the first determining submodule is specifically configured to:
for each frame in the time domain, determining the frequency point with the minimum power spectral density of each frame according to the filtered power spectrum with the noise on each frequency point, and obtaining the minimum frequency point of each frame;
and selecting the frequency point with the minimum power spectral density from the minimum frequency points of each frame in the time domain as the first noise component.
12. The apparatus of any of claims 9 to 11, wherein the second determining module comprises:
a second determining submodule, configured to determine an a posteriori signal-to-noise ratio according to the noisy frequency component and the second noise component;
the third determining submodule is used for determining the prior signal-to-noise ratio according to the posterior signal-to-noise ratio;
and the fourth determining submodule is used for determining the frequency domain estimation signal according to the prior signal-to-noise ratio and the frequency domain noisy signal to be processed.
13. The apparatus according to claim 12, wherein the third determining submodule is specifically configured to:
determining the prior signal-to-noise ratio of the current frame according to the posterior signal-to-noise ratio of the current frame and the posterior signal-to-noise ratio of the previous frame.
14. The apparatus according to claim 13, wherein the third determining submodule is specifically configured to:
determining the prior signal-to-noise ratio of the current frame according to the posterior signal-to-noise ratio of the current frame with the first weight and the posterior signal-to-noise ratio of the last frame with the second weight; wherein the first weight is determined according to the prior signal-to-noise ratio and a predetermined smoothing coefficient; the second weight is determined according to the smoothing coefficient.
15. The apparatus according to claim 12, wherein the fourth determination submodule is specifically configured to:
determining a gain function according to the prior signal-to-noise ratio;
and multiplying the to-be-processed noisy frequency power spectrum by the gain function to obtain the frequency domain estimation signal.
16. The apparatus of claim 9, further comprising:
the second acquisition module is used for acquiring the audio signal to be processed;
the second conversion module is used for carrying out time-frequency conversion on the audio signal to obtain a frequency domain signal with noise;
and the third determining module is used for determining the to-be-processed noisy frequency power spectrum according to the frequency domain noisy signal.
17. An apparatus for processing an audio signal, the apparatus comprising at least: a processor and a memory for storing executable instructions operable on the processor, wherein:
the processor is configured to execute the executable instructions, which when executed perform the steps of the method of processing an audio signal as provided in any of the preceding claims 1 to 8.
18. A non-transitory computer-readable storage medium, wherein computer-executable instructions are stored in the computer-readable storage medium, and when executed by a processor, implement the steps in the method for processing an audio signal as provided in any one of claims 1 to 8.
CN202010796977.4A 2020-08-10 2020-08-10 Audio signal processing method and device and storage medium Pending CN111968662A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010796977.4A CN111968662A (en) 2020-08-10 2020-08-10 Audio signal processing method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010796977.4A CN111968662A (en) 2020-08-10 2020-08-10 Audio signal processing method and device and storage medium

Publications (1)

Publication Number Publication Date
CN111968662A true CN111968662A (en) 2020-11-20

Family

ID=73365047

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010796977.4A Pending CN111968662A (en) 2020-08-10 2020-08-10 Audio signal processing method and device and storage medium

Country Status (1)

Country Link
CN (1) CN111968662A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112382277A (en) * 2021-01-07 2021-02-19 博智安全科技股份有限公司 Smart device wake-up method, smart device and computer-readable storage medium
CN112735458A (en) * 2020-12-28 2021-04-30 苏州科达科技股份有限公司 Noise estimation method, noise reduction method and electronic equipment
CN113488067A (en) * 2021-06-30 2021-10-08 北京小米移动软件有限公司 Echo cancellation method, echo cancellation device, electronic equipment and storage medium
CN113539285A (en) * 2021-06-04 2021-10-22 浙江华创视讯科技有限公司 Audio signal noise reduction method, electronic device, and storage medium
CN113613112A (en) * 2021-09-23 2021-11-05 三星半导体(中国)研究开发有限公司 Method and electronic device for suppressing wind noise of microphone
CN113808608A (en) * 2021-09-17 2021-12-17 随锐科技集团股份有限公司 Single sound channel noise suppression method and device based on time-frequency masking smoothing strategy
CN115426582A (en) * 2022-11-06 2022-12-02 江苏米笛声学科技有限公司 Earphone audio processing method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070260454A1 (en) * 2004-05-14 2007-11-08 Roberto Gemello Noise reduction for automatic speech recognition
KR20120059431A (en) * 2010-11-30 2012-06-08 (주)트란소노 Apparatus and method for adaptive noise estimation
CN104103278A (en) * 2013-04-02 2014-10-15 北京千橡网景科技发展有限公司 Real time voice denoising method and device
CN106098077A (en) * 2016-07-28 2016-11-09 浙江诺尔康神经电子科技股份有限公司 Artificial cochlea's speech processing system of a kind of band noise reduction and method
CN106161751A (en) * 2015-04-14 2016-11-23 电信科学技术研究院 A kind of noise suppressing method and device
CN110164467A (en) * 2018-12-18 2019-08-23 腾讯科技(深圳)有限公司 The method and apparatus of voice de-noising calculate equipment and computer readable storage medium
US20200251090A1 (en) * 2019-01-31 2020-08-06 Harman Becker Automotive Systems Gmbh Detection of fricatives in speech signals

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070260454A1 (en) * 2004-05-14 2007-11-08 Roberto Gemello Noise reduction for automatic speech recognition
KR20120059431A (en) * 2010-11-30 2012-06-08 (주)트란소노 Apparatus and method for adaptive noise estimation
CN104103278A (en) * 2013-04-02 2014-10-15 北京千橡网景科技发展有限公司 Real time voice denoising method and device
CN106161751A (en) * 2015-04-14 2016-11-23 电信科学技术研究院 A kind of noise suppressing method and device
CN106098077A (en) * 2016-07-28 2016-11-09 浙江诺尔康神经电子科技股份有限公司 Artificial cochlea's speech processing system of a kind of band noise reduction and method
CN110164467A (en) * 2018-12-18 2019-08-23 腾讯科技(深圳)有限公司 The method and apparatus of voice de-noising calculate equipment and computer readable storage medium
US20200251090A1 (en) * 2019-01-31 2020-08-06 Harman Becker Automotive Systems Gmbh Detection of fricatives in speech signals

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
庞亮;刘双东;: "基于语音存在概率的噪声功率谱估计改进算法", 电声技术, no. 02 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735458A (en) * 2020-12-28 2021-04-30 苏州科达科技股份有限公司 Noise estimation method, noise reduction method and electronic equipment
CN112382277A (en) * 2021-01-07 2021-02-19 博智安全科技股份有限公司 Smart device wake-up method, smart device and computer-readable storage medium
CN113539285A (en) * 2021-06-04 2021-10-22 浙江华创视讯科技有限公司 Audio signal noise reduction method, electronic device, and storage medium
CN113539285B (en) * 2021-06-04 2023-10-31 浙江华创视讯科技有限公司 Audio signal noise reduction method, electronic device and storage medium
CN113488067A (en) * 2021-06-30 2021-10-08 北京小米移动软件有限公司 Echo cancellation method, echo cancellation device, electronic equipment and storage medium
CN113808608A (en) * 2021-09-17 2021-12-17 随锐科技集团股份有限公司 Single sound channel noise suppression method and device based on time-frequency masking smoothing strategy
CN113808608B (en) * 2021-09-17 2023-07-25 随锐科技集团股份有限公司 Method and device for suppressing mono noise based on time-frequency masking smoothing strategy
CN113613112A (en) * 2021-09-23 2021-11-05 三星半导体(中国)研究开发有限公司 Method and electronic device for suppressing wind noise of microphone
CN113613112B (en) * 2021-09-23 2024-03-29 三星半导体(中国)研究开发有限公司 Method for suppressing wind noise of microphone and electronic device
CN115426582A (en) * 2022-11-06 2022-12-02 江苏米笛声学科技有限公司 Earphone audio processing method and device

Similar Documents

Publication Publication Date Title
CN111968662A (en) Audio signal processing method and device and storage medium
CN107833579B (en) Noise elimination method, device and computer readable storage medium
CN111128221B (en) Audio signal processing method and device, terminal and storage medium
CN109361828B (en) Echo cancellation method and device, electronic equipment and storage medium
CN111986693A (en) Audio signal processing method and device, terminal equipment and storage medium
CN111009257B (en) Audio signal processing method, device, terminal and storage medium
CN111883164B (en) Model training method and device, electronic equipment and storage medium
CN110853664A (en) Method and device for evaluating performance of speech enhancement algorithm and electronic equipment
CN112037825B (en) Audio signal processing method and device and storage medium
CN110970051A (en) Voice data acquisition method, terminal and readable storage medium
CN111179960A (en) Audio signal processing method and device and storage medium
CN109256145B (en) Terminal-based audio processing method and device, terminal and readable storage medium
CN112201267A (en) Audio processing method and device, electronic equipment and storage medium
CN113241089A (en) Voice signal enhancement method and device and electronic equipment
CN111292761B (en) Voice enhancement method and device
CN114040309B (en) Wind noise detection method and device, electronic equipment and storage medium
CN112951262B (en) Audio recording method and device, electronic equipment and storage medium
CN110580910A (en) Audio processing method, device and equipment and readable storage medium
CN113077808B (en) Voice processing method and device for voice processing
CN111667842A (en) Audio signal processing method and device
CN113810828A (en) Audio signal processing method and device, readable storage medium and earphone
CN113345461A (en) Voice processing method and device for voice processing
CN113190207A (en) Information processing method, information processing device, electronic equipment and storage medium
CN111613239A (en) Audio denoising method and device, server and storage medium
CN112863537B (en) Audio signal processing method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination