CN107833579B - Noise elimination method, device and computer readable storage medium - Google Patents

Noise elimination method, device and computer readable storage medium Download PDF

Info

Publication number
CN107833579B
CN107833579B CN201711042223.4A CN201711042223A CN107833579B CN 107833579 B CN107833579 B CN 107833579B CN 201711042223 A CN201711042223 A CN 201711042223A CN 107833579 B CN107833579 B CN 107833579B
Authority
CN
China
Prior art keywords
noise
signal
short
time
amplitude
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711042223.4A
Other languages
Chinese (zh)
Other versions
CN107833579A (en
Inventor
肖纯智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Kugou Computer Technology Co Ltd
Original Assignee
Guangzhou Kugou Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kugou Computer Technology Co Ltd filed Critical Guangzhou Kugou Computer Technology Co Ltd
Priority to CN201711042223.4A priority Critical patent/CN107833579B/en
Publication of CN107833579A publication Critical patent/CN107833579A/en
Application granted granted Critical
Publication of CN107833579B publication Critical patent/CN107833579B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Abstract

The invention discloses a noise elimination method, a noise elimination device and a computer readable storage medium, and belongs to the technical field of voice processing. The method comprises the following steps: pre-denoising the audio signal to obtain a noise signal and a denoised short-time spectrum signal; determining a short-time spectrum signal amplitude and a corresponding noise signal amplitude of each time frequency point in the short-time spectrum signal based on the short-time spectrum signal and the noise signal; determining at least one noise point from a plurality of time frequency points included in each time frequency point based on the short-time spectrum signal amplitude and the noise signal amplitude of each time frequency point; and carrying out attenuation processing on the signal at the at least one noise point in the short-time spectrum signal so as to eliminate the noise in the audio signal. The invention further carries out noise reduction processing on the audio signal after the pre-noise reduction processing is carried out on the audio signal, thereby improving the efficiency of eliminating noise.

Description

Noise elimination method, device and computer readable storage medium
Technical Field
The present invention relates to the field of speech processing technologies, and in particular, to a noise cancellation method and apparatus, and a computer-readable storage medium.
Background
With the development of scientific technology, more and more places are used for voice technology in life, such as video conferences, voice communication and the like. However, in the process of acquiring the audio signal, the terminal may additionally acquire some noise signals, so that the terminal may have a problem of being unclear when playing the audio, and therefore, in order to ensure the clarity of the played audio signal, the terminal needs to eliminate the noise signals included in the audio signal after acquiring the audio signal.
Currently, the operation of eliminating the noise signal included in the audio signal may be: sequentially performing framing, windowing and short-time Fourier transform processing on the audio signal to obtain a short-time spectrum signal added with noise; and determining an estimated noise signal from the noisy short-time frequency spectrum signal through a noise estimation algorithm, and then performing noise reduction processing on the estimated noise signal included in the noisy short-time frequency spectrum signal through a noise reduction algorithm such as spectral subtraction, so as to reduce the noise signal in the audio signal.
However, when noise is removed in the above manner, only part of the noise can be removed, and a non-stationary noise signal such as a breath sound still remains in the audio signal, so that the noise signal is not completely removed, and the efficiency of removing the noise is reduced.
Disclosure of Invention
The embodiment of the invention provides a noise elimination method, a noise elimination device and a computer readable storage medium, which are used for solving the problem of low noise elimination efficiency in the prior art. The technical scheme is as follows:
in a first aspect, a method for noise cancellation is provided, the method comprising:
pre-denoising the audio signal to obtain a noise signal and a denoised short-time spectrum signal;
determining a short-time spectrum signal amplitude and a corresponding noise signal amplitude of each time frequency point in the short-time spectrum signal based on the short-time spectrum signal and the noise signal;
determining at least one noise point from a plurality of time frequency points included in each time frequency point based on the short-time spectrum signal amplitude and the noise signal amplitude of each time frequency point;
and carrying out attenuation processing on the signal at the at least one noise point in the short-time spectrum signal so as to eliminate the noise in the audio signal.
Optionally, the determining, based on the short-time spectrum signal and the noise signal, a short-time spectrum signal amplitude and a corresponding noise signal amplitude of each time frequency point in the short-time spectrum signal includes:
and performing region smoothing treatment on the short-time spectrum signal and the noise signal to obtain a short-time spectrum signal amplitude and a corresponding noise signal amplitude of each time frequency point in the short-time spectrum signal.
Optionally, the determining at least one noise point from a plurality of time frequency points included in the short-frequency spectrum signal based on the short-frequency spectrum signal amplitude and the noise signal amplitude of each time frequency point includes:
determining an amplitude ratio between the short-time spectrum signal amplitude and the noise signal amplitude of each of the multiple time frequency points;
and determining the time frequency point of which the amplitude ratio is smaller than or equal to a preset amplitude ratio in the plurality of time frequency points as a noise point to obtain the at least one noise point.
Optionally, the attenuating the signal at the at least one noise point in the short-time spectrum signal includes:
and attenuating the noise signal amplitude at the at least one noise point in the short-time spectrum signal according to a preset attenuation proportion.
Optionally, the attenuating the signal at the at least one noise point in the short-time spectrum signal includes:
and attenuating the noise signal amplitude at the at least one noise point in the short-time spectrum signal to a preset attenuation amplitude.
In a second aspect, there is provided a noise cancellation device, the device comprising:
the preprocessing module is used for carrying out pre-denoising processing on the audio signal to obtain a noise signal and a denoised short-time spectrum signal;
a first determining module, configured to determine, based on the short-time spectrum signal and the noise signal, a short-time spectrum signal amplitude and a corresponding noise signal amplitude of each time frequency point in the short-time spectrum signal;
the second determining module is used for determining at least one noise point from a plurality of time frequency points included in the short-time spectrum signal based on the short-time spectrum signal amplitude and the noise signal amplitude of each time frequency point;
and the attenuation processing module is used for carrying out attenuation processing on the signal at the at least one noise point in the short-time spectrum signal so as to eliminate the noise in the audio signal.
Optionally, the first determining module includes:
and the processing submodule is used for carrying out regional smoothing processing on the short-time spectrum signal and the noise signal to obtain the short-time spectrum signal amplitude and the corresponding noise signal amplitude of each time frequency point in the short-time spectrum signal.
Optionally, the second determining module includes:
the first determining submodule is used for determining the amplitude ratio between the short-time spectrum signal amplitude and the noise signal amplitude of each time frequency point in the plurality of time frequency points;
and the second determining submodule is used for determining the time frequency point of which the amplitude ratio is less than or equal to a preset amplitude ratio in the plurality of time frequency points as a noise point so as to obtain the at least one noise point.
Optionally, the attenuation processing module is configured to:
and attenuating the noise signal amplitude at the at least one noise point in the short-time spectrum signal according to a preset attenuation proportion.
Optionally, the attenuation processing module is configured to:
and attenuating the noise signal amplitude at the at least one noise point in the short-time spectrum signal to a preset attenuation amplitude.
In a third aspect, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of any of the methods provided in the first aspect.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, after the audio signal is subjected to pre-noise reduction processing, a noise signal and a short-time spectrum signal can be obtained, then the short-time spectrum signal amplitude and the corresponding noise signal amplitude of each time frequency point in the short-time spectrum signal are determined, and because the short-time spectrum signal amplitude and the noise signal amplitude of each time frequency point can reflect the size of the noise signal at the time frequency point, at least one noise point can be determined from a plurality of time frequency points included in the short-time spectrum signal through the short-time spectrum signal amplitude and the noise signal amplitude of each time frequency point, and then the signal at the at least one noise point is subjected to attenuation processing, so that the noise in the audio signal is further eliminated, the noise elimination efficiency is improved, and the definition of audio playing is further improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flow chart of a noise cancellation method according to an embodiment of the present invention;
fig. 2A is a schematic structural diagram of a noise cancellation apparatus according to an embodiment of the present invention;
fig. 2B is a schematic structural diagram of a first determining module according to an embodiment of the present invention;
fig. 2C is a schematic structural diagram of a second determining module according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of another noise cancellation apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Before explaining the embodiments of the present invention in detail, an application scenario related to the embodiments of the present invention is explained.
Currently, voice processing technologies, such as voice navigation, video conferencing, voice dialing, etc., are required in more and more occasions in life. In video conference and other occasions, the requirement for audio playing is high, that is, the audio content required to be played is clear. Therefore, in order to ensure the definition of the audio content, the terminal needs to perform noise reduction processing on the audio signal after acquiring the audio signal. However, the current noise reduction processing method can only eliminate part of stationary noise signals, and can not eliminate non-stationary noise signals such as breath sounds, so that the audio signals are still unclear, and the noise reduction efficiency of the audio signals is reduced.
Based on such a scenario, the embodiment of the present invention provides a noise cancellation method capable of improving noise reduction efficiency of an audio signal.
After describing an application scenario of the embodiment of the present invention, a detailed description will be given of a noise cancellation method provided by the embodiment of the present invention with reference to the drawings.
Fig. 1 is a flow chart illustrating a method of noise cancellation according to an exemplary embodiment, referring to fig. 1, the method including the following steps.
Step 101: and the terminal performs pre-noise reduction processing on the audio signal to obtain a noise signal and a noise-reduced short-time spectrum signal.
As some noise signals such as background noise and breath sound may be mixed in the process of acquiring the audio signal by the terminal, that is, the audio signal acquired by the terminal not only includes the audio signal of normal voice, but also may include the noise signal, the audio playing is unclear. Therefore, in order to improve the audio playing definition and the efficiency of eliminating noise, the terminal can perform pre-noise reduction processing on the audio signal after acquiring the audio signal. And the operation of the terminal for performing pre-noise reduction on the audio signal may be: carrying out framing processing, windowing processing and short-time Fourier change processing on the audio signal in sequence to obtain a short-time spectrum signal added with noise; and determining the estimated noise signal from the noisy short-time frequency spectrum signal through a noise estimation algorithm, and then performing noise reduction processing on the estimated noise signal included in the noisy short-time frequency spectrum signal through a noise reduction algorithm to obtain the short-time frequency spectrum signal.
It should be noted that, the step of performing, by the terminal, framing the audio signal is to divide the audio signal into a plurality of audio signal units in units of frames, and the frame length may be set in advance, for example, the frame length may be 10ms (milliseconds), 25ms, 35ms, and the like. In addition, the windowing of the framed audio signal by the terminal means processing by a window function, which may be a hamming window function, a hanning window function, or the like.
In addition, when the audio signal is a time-domain signal, the change of the audio signal is reflected in the change of time, but since the change of the audio signal is related not only to the change of time but also to information such as frequency and phase, the terminal performs time-domain analysis on the audio signal, can determine only the shape of the audio signal, and cannot accurately describe and analyze the audio signal. When the terminal analyzes the audio signal in the frequency domain, the terminal can decompose the complex audio signal into the superposition of simple signals, so that the audio signal can be more accurately described and analyzed, and therefore, the terminal can convert the audio signal from a time domain signal into a frequency domain signal for accurately analyzing and describing the audio information.
For example, the audio signal collected by the terminal is y (i), and the audio signal is framed to obtain a framed audio signal yλ(n) and for yλ(n) adding a Hamming window, the windowed audio signal being yλ(n)', i.e., yλ(n)'=yλ(n) × ham (256), where i denotes a discrete point sequence when the audio signal is a time-domain signal, and i is 1, 2. . . λ is the number of frame sequences, and λ is 1, 2. . . . N is an intra number, and n is 1 or 2. . . Ham (256) represents a Hamming window function of 256 points in size. Then, the terminal may perform short-time fourier transform on the windowed audio signal to obtain a frequency domain signal Y of the windowed audio signalλ(ω), i.e. Yλ(ω)=STFT(yλ(n)')。
Furthermore, the noise estimation algorithm may be MCRA (Minimum-constrained Recursive Averaging algorithm), and the noise reduction algorithm may be a spectral subtraction algorithm, a wiener filter algorithm, an MMSE (Minimum mean Square Error estimation) algorithm, a subspace method, and the like.
It should be noted that, after performing the pre-noise reduction processing on the audio signal, the short-time spectrum signal may further include a residual noise signal such as a breath sound, and therefore, the terminal may further continue to eliminate the noise in the short-time spectrum signal based on the following steps 102 to 104.
Step 102: and the terminal determines the short-time spectrum signal amplitude and the corresponding noise signal amplitude of each time frequency point from the short-time spectrum signals based on the short-time spectrum signals and the noise optical signals.
As can be seen from the foregoing step 101, after the terminal performs the pre-noise reduction processing on the noisy audio signal, a residual part of the noise signal may also be present in the short-time spectrum signal, and in the short-time spectrum signal, the signal at each time-frequency point may include the audio signal of normal speech and the residual noise signal, or may include only the audio signal of normal speech or only the noise signal, and the signal specifically included at each time-frequency point may be reflected by the amplitude of the signal. The operation of the terminal determining the short-time spectrum signal amplitude and the noise signal amplitude of each time frequency point based on the short-time spectrum signal and the noise signal may be: and performing region smoothing treatment on the short-time spectrum signal and the noise signal to obtain the short-time spectrum signal amplitude and the noise signal amplitude of each time frequency point.
The terminal may perform region smoothing on the short-time spectrum signal and the noise signal through a window function, where the window function may be a rectangular window, an elliptical window, or the like.
In addition, the method of the terminal performing the regional smoothing processing on the short-time spectrum signal and the noise signal through the window function is similar to the method of the terminal performing the regional smoothing processing on the picture, that is, when the terminal performs the regional smoothing processing on the short-time spectrum signal, the terminal can align the center of the window function with each time frequency point in the short-time spectrum signal, and perform the smoothing processing on the time frequency point through the smoothing algorithm, so as to obtain the short-time spectrum amplitude of the time frequency point. Similarly, when the terminal performs the area smoothing on the noise signal, the terminal may align the center of the window function with each time-frequency point in the noise signal, and perform the smoothing on the time-frequency point through a smoothing algorithm, thereby obtaining the noise amplitude of the time-frequency point.
It should be noted that, because the noise signal is estimated based on the audio signal, the time-frequency points included in the noise signal correspond to the time-frequency points in the short-time spectrum signal one to one, and the terminal determines the amplitude of each time-frequency point in the noise signal, that is, determines the amplitude of the noise signal corresponding to each time-frequency point in the short-time spectrum signal.
For example, the time frequency point in the short-time spectrum signal is A, B, C, and the time frequency points corresponding to the time frequency point A, B, C in the noise signal are a1、B1、C1Therefore, the terminal determines the timing point A1、B1、C1The noise amplitudes of (a) are 10, 15 and 20 respectively, that is, the noise signal amplitudes of the time frequency point A, B, C in the short-time spectrum signal are determined to be 10, 15 and 20 respectively.
Step 103: the terminal determines at least one noise point from a plurality of time frequency points included by the short-time frequency spectrum signal based on the short-time spectrum signal amplitude and the noise signal amplitude of each time frequency point.
Since the signal at each time-frequency point may include only the audio signal of normal speech, or both the audio signal of normal speech and the residual noise signal, the amplitude of the audio signal of normal speech in the short-time spectrum signal is usually much larger than that of the noise signal. Therefore, the short-time spectrum signal amplitude of each time frequency point is greater than or equal to the noise signal amplitude, and the amplitude ratio between the short-time spectrum signal amplitude and the noise signal amplitude is larger. When only the residual noise signal is included, the amplitude ratio between the short-time spectrum signal amplitude and the noise signal amplitude is smaller because the amplitude of the residual noise signal is smaller than the estimated noise signal amplitude. Therefore, the magnitude of the noise signal included in the signal at each time-frequency point can be determined according to the magnitude ratio between the short-time spectrum signal magnitude of the time-frequency point and the short-time spectrum signal magnitude. That is, the terminal may determine an amplitude ratio between the short-time spectrum signal amplitude and the noise signal amplitude of each of the multiple time frequency points; and determining the time frequency point of which the amplitude ratio is smaller than or equal to a preset amplitude ratio in the plurality of time frequency points as a noise point to obtain at least one noise point.
When the amplitude ratio between the short-time spectrum signal amplitude and the noise signal amplitude of any one time frequency point is smaller than or equal to the preset amplitude ratio, it is indicated that the signal at the time frequency point is only the residual noise signal, and the residual noise signal at the time frequency point will greatly affect the definition of the audio signal at the time frequency point, so that the terminal can determine the time frequency point of which the amplitude ratio is smaller than or equal to the preset amplitude ratio among the time frequency points as the noise point.
It should be noted that the preset amplitude ratio can be set in advance, for example, the preset amplitude ratio can be 1.5, 1.3, 1, and so on.
In addition, when the amplitude ratio between the short-time spectrum signal amplitude and the noise signal amplitude of any time frequency point is greater than the preset amplitude ratio, it is indicated that the signal at the time frequency point only includes the audio signal of the normal voice, or includes the audio signal of the normal voice and the residual noise signal at the same time.
Step 104: and the terminal performs attenuation processing on the signal at least one noise point in the short-time spectrum signal so as to eliminate the noise in the audio signal.
After the terminal performs pre-noise reduction on the audio signal, most of the noise signals in the audio signal are eliminated, and at the moment, the terminal only needs to perform attenuation processing on the signals at the noise points to achieve the purpose of noise elimination. And the operation of the terminal for attenuating the signal at the at least one noise point in the short-time spectrum signal to eliminate the noise in the audio signal may be: attenuating the noise signal amplitude at least one noise point in the short-time spectrum signal according to a preset attenuation proportion; or, attenuating the noise signal amplitude at the at least one noise point in the short-time spectrum signal to a preset attenuation amplitude.
It should be noted that the preset attenuation ratio is a ratio between the amplitude of the noise signal before the terminal performs the attenuation processing on the noise signal and the amplitude of the noise signal after the terminal performs the attenuation processing on the noise signal, and the preset attenuation ratio may be set in advance, for example, the preset attenuation ratio may be 2:1, 3:1, 5:1, and the like. The preset attenuation amplitude can also be set in advance, for example, the preset attenuation amplitude can be 50, 40 and the like in the case of fixed point data with 16-bit precision.
It should be noted that, because the noise signal in the short-time spectrum signal is estimated by the terminal through the noise estimation algorithm, the noise signal may not be the accurate noise signal in the short-time spectrum signal, and therefore, the terminal attenuates the amplitude of the noise signal through the preset attenuation ratio, which can prevent the amplitude of the audio signal of normal voice from being attenuated when the noise point is not the true noise point, thereby ensuring the continuity of audio content when the audio signal is subsequently played.
Furthermore, after the terminal performs attenuation processing on the signal at least one noise point in the short-time spectrum signal, in order to smoothly play the audio signal with the noise removed, the terminal may further change the audio signal with the noise removed from a frequency domain signal into a time domain signal.
In step 101, the terminal changes the audio signal from the time-domain signal to the frequency-domain signal through the windowing function and the short-time fourier transform, so that the terminal can change the noise-removed audio signal from the frequency-domain signal to the time-domain signal through the inverse fourier transform operation and the windowing function operation.
For example, the frequency domain signal of the audio signal after the noise removal is Yλ' (ω) subjecting the noise-removed audio signal to a short-time inverse Fourier transform operation and a Hamming window removal operation to obtain a time-domain signal y ' of the noise-removed audio signal 'λ(n), i.e. y'λ(n)=ISFST(Y'λ(ω))/ham(256)。
In the embodiment of the invention, because the noise signal is mixed in the audio signal when the terminal collects the audio signal, after the terminal collects the audio signal, the terminal can perform pre-denoising processing on the audio signal to obtain the short-time spectrum signal still comprising part of the noise signal and the estimated noise signal, and then determine the short-time spectrum signal amplitude of each time frequency point and the corresponding noise signal amplitude. The amplitude ratio between the short-time spectrum signal amplitude and the noise signal amplitude of each time frequency point can reflect the size of the noise signal at the time frequency point, so that the terminal can determine the amplitude ratio between the short-time spectrum signal amplitude and the noise signal amplitude of each time frequency point. When the amplitude ratio is smaller than or equal to the preset amplitude ratio, the time-frequency point only comprises the noise signal, so that the terminal can determine the time-frequency point as the noise point, and determine at least one noise point from the short-time spectrum signal. And then the terminal can perform attenuation processing on the noise signal amplitude at the at least one noise point to eliminate the noise in the audio signal, so that the noise elimination efficiency is improved, and the audio playing definition is improved.
After explaining the noise canceling method provided by the embodiment of the present invention, a noise canceling device provided by the present invention is described next.
Fig. 2A is a block diagram of a noise cancellation apparatus according to an embodiment of the present invention, and referring to fig. 2A, the noise cancellation apparatus may be implemented by software, hardware, or a combination of the two. The device includes: a preprocessing module 201, a first determining module 202, a second determining module 203, and an attenuation processing module 204.
The preprocessing module 201 is configured to perform denoising processing on the audio signal to obtain a noise signal and a denoised short-time spectrum signal;
a first determining module 202, configured to determine, based on the short-time spectrum signal and the noise signal, a short-time spectrum signal amplitude and a corresponding noise signal amplitude of each time frequency point in the short-time spectrum signal;
a second determining module 203, configured to determine at least one noise point from multiple time frequency points included in the short-time spectrum signal based on the short-time spectrum signal amplitude and the noise signal amplitude of each time frequency point;
an attenuation processing module 204, configured to perform attenuation processing on the signal at the at least one noise point in the short-time spectrum signal to eliminate noise in the audio signal.
Optionally, referring to fig. 2B, the first determining module 202 includes:
the processing sub-module 2021 is configured to perform region smoothing on the short-time spectrum signal and the noise signal to obtain a short-time spectrum signal amplitude and a corresponding noise signal amplitude of each time frequency point in the short-time spectrum signal.
Optionally, referring to fig. 2C, the second determining module 203 includes:
a first determining submodule 2031, configured to determine an amplitude ratio between a short-time spectrum signal amplitude and a noise signal amplitude of each of the multiple time frequency points;
the second determining submodule 2032 is configured to determine, as a noise point, a time-frequency point of the multiple time-frequency points whose amplitude ratio is smaller than or equal to a preset amplitude ratio, so as to obtain the at least one noise point.
Optionally, the attenuation processing module 204 is configured to:
and attenuating the noise signal amplitude at the at least one noise point in the short-time spectrum signal according to a preset attenuation proportion.
Optionally, the attenuation processing module 204 is configured to:
and attenuating the noise signal amplitude at the at least one noise point in the short-time spectrum signal to a preset attenuation amplitude.
In summary, in the embodiment of the present invention, because the terminal may mix a noise signal into the audio signal when acquiring the audio signal, after acquiring the audio signal, the terminal may perform pre-denoising on the audio signal to obtain a short-time spectrum signal still including a part of the noise signal and an estimated noise signal, and then determine a short-time spectrum signal amplitude of each time frequency point and a corresponding noise signal amplitude. The amplitude ratio between the short-time spectrum signal amplitude and the noise signal amplitude of each time frequency point can reflect the size of the noise signal at the time frequency point, so that the terminal can determine the amplitude ratio between the short-time spectrum signal amplitude and the noise signal amplitude of each time frequency point. When the amplitude ratio is smaller than or equal to the preset amplitude ratio, the time-frequency point only comprises the noise signal, so that the terminal can determine the time-frequency point as the noise point, and determine at least one noise point from the short-time spectrum signal. And then the terminal can perform attenuation processing on the noise signal amplitude at the at least one noise point to eliminate the noise in the audio signal, so that the noise elimination efficiency is improved, and the audio playing definition is improved.
It should be noted that: in the noise cancellation device provided in the above embodiment, when the noise is cancelled, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the above described functions. In addition, the noise cancellation device and the noise cancellation method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.
Fig. 3 is a block diagram illustrating a noise cancellation device 300 according to an example embodiment. For example, the apparatus 300 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 3, the apparatus 300 may include one or more of the following components: processing component 302, memory 304, power component 306, multimedia component 308, audio component 310, input/output (I/O) interface 312, sensor component 314, and communication component 316.
The processing component 302 generally controls overall operation of the device 300, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 302 may include one or more processors 320 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 302 can include one or more modules that facilitate interaction between the processing component 302 and other components. For example, the processing component 302 may include a multimedia module to facilitate interaction between the multimedia component 308 and the processing component 302.
The memory 304 is configured to store various types of data to support operations at the apparatus 300. Examples of such data include instructions for any application or method operating on device 300, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 304 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 306 provides power to the various components of the device 300. The power components 306 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power supplies for the apparatus 300.
The multimedia component 308 includes a screen that provides an output interface between the device 300 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 308 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 300 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 310 is configured to output and/or input audio signals. For example, audio component 310 includes a Microphone (MIC) configured to receive external audio signals when apparatus 300 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 304 or transmitted via the communication component 316. In some embodiments, audio component 310 also includes a speaker for outputting audio signals.
The I/O interface 312 provides an interface between the processing component 302 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 314 includes one or more sensors for providing various aspects of status assessment for the device 300. For example, sensor assembly 314 may detect an open/closed state of device 300, the relative positioning of components, such as a display and keypad of device 300, the change in position of device 300 or a component of device 300, the presence or absence of user contact with device 300, the orientation or acceleration/deceleration of device 300, and the change in temperature of device 300. Sensor assembly 314 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 314 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 314 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 316 is configured to facilitate wired or wireless communication between the apparatus 300 and other devices. The device 300 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 316 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 316 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the methods provided by the embodiments shown in fig. 1 and described above.
In the above embodiments, the implementation may be wholly or partly realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with embodiments of the invention, to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., Digital Versatile Disk (DVD)), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A method of noise cancellation, the method comprising:
pre-denoising the audio signal to obtain a noise signal and a denoised short-time spectrum signal;
determining a short-time spectrum signal amplitude and a corresponding noise signal amplitude of each time frequency point in the short-time spectrum signal based on the short-time spectrum signal and the noise signal, wherein the time frequency points included in the short-time spectrum signal correspond to the time frequency points included in the noise signal one to one;
determining at least one noise point from a plurality of time frequency points included in each time frequency point based on the short-time spectrum signal amplitude and the noise signal amplitude of each time frequency point;
and carrying out attenuation processing on the signal at the at least one noise point in the short-time spectrum signal so as to eliminate the noise in the audio signal.
2. The method of claim 1, wherein the determining the short-time spectrum signal amplitude and the corresponding noise signal amplitude for each time bin in the short-time spectrum signal based on the short-time spectrum signal and the noise signal comprises:
and performing region smoothing treatment on the short-time spectrum signal and the noise signal to obtain a short-time spectrum signal amplitude and a corresponding noise signal amplitude of each time frequency point in the short-time spectrum signal.
3. The method according to claim 1 or 2, wherein the determining at least one noise point from a plurality of time frequency points included in the short-time spectrum signal based on the short-time spectrum signal amplitude and the noise signal amplitude of each time frequency point comprises:
determining an amplitude ratio between the short-time spectrum signal amplitude and the noise signal amplitude of each of the multiple time frequency points;
and determining the time frequency point of which the amplitude ratio is smaller than or equal to a preset amplitude ratio in the plurality of time frequency points as a noise point to obtain the at least one noise point.
4. The method of claim 1, wherein the attenuating the signal at the at least one noise point in the short-time spectrum signal comprises:
and attenuating the noise signal amplitude at the at least one noise point in the short-time spectrum signal according to a preset attenuation proportion.
5. The method of claim 1, wherein the attenuating the signal at the at least one noise point in the short-time spectrum signal comprises:
and attenuating the noise signal amplitude at the at least one noise point in the short-time spectrum signal to a preset attenuation amplitude.
6. A noise cancellation apparatus, characterized in that the apparatus comprises:
the preprocessing module is used for carrying out pre-denoising processing on the audio signal to obtain a noise signal and a denoised short-time spectrum signal;
a first determining module, configured to determine, based on the short-time spectrum signal and the noise signal, a short-time spectrum signal amplitude and a corresponding noise signal amplitude of each time frequency point in the short-time spectrum signal, where the time frequency points included in the short-time spectrum signal correspond to the time frequency points included in the noise signal one to one;
the second determining module is used for determining at least one noise point from a plurality of time frequency points included in the short-time spectrum signal based on the short-time spectrum signal amplitude and the noise signal amplitude of each time frequency point;
and the attenuation processing module is used for carrying out attenuation processing on the signal at the at least one noise point in the short-time spectrum signal so as to eliminate the noise in the audio signal.
7. The apparatus of claim 6, wherein the first determining module comprises:
and the processing submodule is used for carrying out regional smoothing processing on the short-time spectrum signal and the noise signal to obtain the short-time spectrum signal amplitude and the corresponding noise signal amplitude of each time frequency point in the short-time spectrum signal.
8. The apparatus of claim 6 or 7, wherein the second determining module comprises:
the first determining submodule is used for determining the amplitude ratio between the short-time spectrum signal amplitude and the noise signal amplitude of each time frequency point in the plurality of time frequency points;
and the second determining submodule is used for determining the time frequency point of which the amplitude ratio is less than or equal to a preset amplitude ratio in the plurality of time frequency points as a noise point so as to obtain the at least one noise point.
9. The apparatus of claim 6, wherein the attenuation processing module is to:
and attenuating the noise signal amplitude at the at least one noise point in the short-time spectrum signal according to a preset attenuation proportion.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.
CN201711042223.4A 2017-10-30 2017-10-30 Noise elimination method, device and computer readable storage medium Active CN107833579B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711042223.4A CN107833579B (en) 2017-10-30 2017-10-30 Noise elimination method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711042223.4A CN107833579B (en) 2017-10-30 2017-10-30 Noise elimination method, device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN107833579A CN107833579A (en) 2018-03-23
CN107833579B true CN107833579B (en) 2021-06-11

Family

ID=61650199

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711042223.4A Active CN107833579B (en) 2017-10-30 2017-10-30 Noise elimination method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN107833579B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804072A (en) * 2018-06-13 2018-11-13 广州酷狗计算机科技有限公司 Audio-frequency processing method, device, storage medium and terminal
CN110021305B (en) * 2019-01-16 2021-08-20 上海惠芽信息技术有限公司 Audio filtering method, audio filtering device and wearable equipment
CN109817241B (en) * 2019-02-18 2021-06-01 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method, device and storage medium
CN111009251B (en) * 2019-10-31 2023-04-18 惠州华阳通用电子有限公司 Vehicle-mounted sound mixing method and device
CN110931035B (en) * 2019-12-09 2023-10-10 广州酷狗计算机科技有限公司 Audio processing method, device, equipment and storage medium
CN111710213A (en) * 2020-06-05 2020-09-25 河南艺树教育科技有限公司 Quantifiable music teaching system
CN112201269B (en) * 2020-10-19 2021-09-07 深圳市车宝信息科技有限公司 MMSE-LSA speech enhancement method based on improved noise estimation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101197130A (en) * 2006-12-07 2008-06-11 华为技术有限公司 Sound activity detecting method and detector thereof
CN101339766A (en) * 2008-03-20 2009-01-07 华为技术有限公司 Audio signal processing method and device
CN101477800A (en) * 2008-12-31 2009-07-08 瑞声声学科技(深圳)有限公司 Voice enhancing process
CN103761974A (en) * 2014-01-28 2014-04-30 上海力声特医学科技有限公司 Cochlear implant
CN105575405A (en) * 2014-10-08 2016-05-11 展讯通信(上海)有限公司 Double-microphone voice active detection method and voice acquisition device
CN106098076A (en) * 2016-06-06 2016-11-09 成都启英泰伦科技有限公司 A kind of based on dynamic noise estimation time-frequency domain adaptive voice detection method

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1841500B (en) * 2005-03-30 2010-04-14 松下电器产业株式会社 Method and apparatus for resisting noise based on adaptive nonlinear spectral subtraction
JP5141542B2 (en) * 2008-12-24 2013-02-13 富士通株式会社 Noise detection apparatus and noise detection method
CN103295582B (en) * 2012-03-02 2016-04-20 联芯科技有限公司 Noise suppressing method and system thereof
CN103021420B (en) * 2012-12-04 2015-02-25 中国科学院自动化研究所 Speech enhancement method of multi-sub-band spectral subtraction based on phase adjustment and amplitude compensation
JP6300464B2 (en) * 2013-08-09 2018-03-28 キヤノン株式会社 Audio processing device
JP6337519B2 (en) * 2014-03-03 2018-06-06 富士通株式会社 Speech processing apparatus, noise suppression method, and program
CN106328151B (en) * 2015-06-30 2020-01-31 芋头科技(杭州)有限公司 ring noise eliminating system and application method thereof
CN105280195B (en) * 2015-11-04 2018-12-28 腾讯科技(深圳)有限公司 The processing method and processing device of voice signal

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101197130A (en) * 2006-12-07 2008-06-11 华为技术有限公司 Sound activity detecting method and detector thereof
CN101339766A (en) * 2008-03-20 2009-01-07 华为技术有限公司 Audio signal processing method and device
CN101477800A (en) * 2008-12-31 2009-07-08 瑞声声学科技(深圳)有限公司 Voice enhancing process
CN103761974A (en) * 2014-01-28 2014-04-30 上海力声特医学科技有限公司 Cochlear implant
CN105575405A (en) * 2014-10-08 2016-05-11 展讯通信(上海)有限公司 Double-microphone voice active detection method and voice acquisition device
CN106098076A (en) * 2016-06-06 2016-11-09 成都启英泰伦科技有限公司 A kind of based on dynamic noise estimation time-frequency domain adaptive voice detection method

Also Published As

Publication number Publication date
CN107833579A (en) 2018-03-23

Similar Documents

Publication Publication Date Title
CN107833579B (en) Noise elimination method, device and computer readable storage medium
CN108198569B (en) Audio processing method, device and equipment and readable storage medium
US20210327448A1 (en) Speech noise reduction method and apparatus, computing device, and computer-readable storage medium
CN111341336B (en) Echo cancellation method, device, terminal equipment and medium
CN111951819A (en) Echo cancellation method, device and storage medium
CN111968662A (en) Audio signal processing method and device and storage medium
CN111883164B (en) Model training method and device, electronic equipment and storage medium
CN111009257B (en) Audio signal processing method, device, terminal and storage medium
CN111128221A (en) Audio signal processing method and device, terminal and storage medium
CN109308905B (en) Audio data processing method and device, electronic equipment and storage medium
EP4254408A1 (en) Speech processing method and apparatus, and apparatus for processing speech
CN111179960A (en) Audio signal processing method and device and storage medium
CN111986693A (en) Audio signal processing method and device, terminal equipment and storage medium
CN113763977A (en) Method, apparatus, computing device and storage medium for eliminating echo signal
CN110970051A (en) Voice data acquisition method, terminal and readable storage medium
CN109256145B (en) Terminal-based audio processing method and device, terminal and readable storage medium
CN109119097B (en) Pitch detection method, device, storage medium and mobile terminal
CN112201267A (en) Audio processing method and device, electronic equipment and storage medium
CN113674752B (en) Noise reduction method and device for audio signal, readable medium and electronic equipment
CN111292761B (en) Voice enhancement method and device
CN111667842B (en) Audio signal processing method and device
CN107564534B (en) Audio quality identification method and device
CN114040285B (en) Method and device for generating feedforward filter parameters of earphone, earphone and storage medium
CN111294473A (en) Signal processing method and device
CN112951262B (en) Audio recording method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant