CN111128213B - Noise suppression method and system for processing in different frequency bands - Google Patents

Noise suppression method and system for processing in different frequency bands Download PDF

Info

Publication number
CN111128213B
CN111128213B CN201911278646.5A CN201911278646A CN111128213B CN 111128213 B CN111128213 B CN 111128213B CN 201911278646 A CN201911278646 A CN 201911278646A CN 111128213 B CN111128213 B CN 111128213B
Authority
CN
China
Prior art keywords
frequency
power spectrum
value
frequency point
noise power
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911278646.5A
Other languages
Chinese (zh)
Other versions
CN111128213A (en
Inventor
于伟维
纪伟
潘思伟
董斐
雍雅琴
林福辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spreadtrum Communications Shanghai Co Ltd
Original Assignee
Spreadtrum Communications Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spreadtrum Communications Shanghai Co Ltd filed Critical Spreadtrum Communications Shanghai Co Ltd
Priority to CN201911278646.5A priority Critical patent/CN111128213B/en
Publication of CN111128213A publication Critical patent/CN111128213A/en
Priority to PCT/CN2020/111672 priority patent/WO2021114733A1/en
Application granted granted Critical
Publication of CN111128213B publication Critical patent/CN111128213B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/02Constructional features of telephone sets
    • H04M1/19Arrangements of transmitters, receivers, or complete sets to prevent eavesdropping, to attenuate local noise or to prevent undesired transmission; Mouthpieces or receivers specially adapted therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)

Abstract

The invention discloses a noise suppression method for processing by frequency division, which is used for suppressing the noise in the call sound, and comprises the following steps: converting the signal type of the collected sound data from a time domain signal into a frequency domain signal; dividing the frequency domain signal into a low-frequency signal and a medium-high frequency signal, calculating a low-frequency noise power spectrum estimation value according to the amplitude value of the low-frequency signal, and calculating a medium-high frequency noise power spectrum estimation value according to the amplitude value of the medium-high frequency signal; calculating according to the low-frequency noise power spectrum estimated value, the medium-high frequency noise power spectrum estimated value and the frequency domain signal to obtain a frequency domain signal after gain; and transforming the frequency domain signal after the gain into a time domain signal after the gain. The invention also discloses a noise suppression system for processing in different frequency bands; the invention inhibits the noise transmission, improves the quality of voice communication and effectively inhibits the noise in the communication.

Description

Noise suppression method and system for processing in different frequency bands
Technical Field
The invention relates to the technical field of voice communication, in particular to a noise suppression method and a system for processing in different frequency bands.
Background
With the development of science and technology, handwritten letters are gradually replaced by voice calls, and compared with character communication, the voice calls have the advantages that the information transmission speed is higher, and the transmitted information is more accurate; in the communication process, the communication person can sense the emotion of the opposite side through the tone, expression and the like of the opposite side, so that the communication efficiency is improved. For the voice communication at the present stage, a microphone is generally adopted to collect voice data, the voice data is analog signals, the voice data of the analog signals is converted into digital signals, the digital signals are transmitted to an opposite party in a wired or wireless mode, and the digital signals are converted into the analog signals for playing after the opposite party receives the digital signals.
At present, when a common microphone collects voice data, some noise data cannot be collected, wherein the noise data is generally environmental noise of a user during a call, and the noise data is transmitted to the other end of the call along with the voice data of the user; in general, noise affects call quality; in severe cases, the noise can cause communication obstacle, so that the transmitted voice generates deviation during communication; how to accurately distinguish voice and noise in the call, effectively inhibit the propagation of noise and improve the call quality becomes a difficult problem.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a noise suppression method and a system thereof for processing in different frequency bands, which are used for accurately distinguishing voice and noise in a call, effectively suppressing the propagation of the noise and improving the call quality.
In order to solve the above technical problem, the present invention provides a noise suppression method for processing in different frequency bands, which is used for suppressing noise in a call sound, and comprises the following steps: step S1, converting the signal type of the collected sound data from time domain signals into frequency domain signals; step S2, dividing the frequency domain signal into a low frequency signal and a middle-high frequency signal, calculating a low frequency noise power spectrum estimation value according to the amplitude value of the low frequency signal, and calculating a middle-high frequency noise power spectrum estimation value according to the amplitude value of the middle-high frequency signal; step S3, calculating according to the low-frequency noise power spectrum estimated value, the medium-high frequency noise power spectrum estimated value and the frequency domain signal to obtain a frequency domain signal after gain; in step S4, the frequency domain signal after the gain is converted into a time domain signal after the gain.
The present invention also provides a noise suppression system for processing in different frequency bands, wherein the noise suppression system for processing in different frequency bands comprises: the signal transformation module is used for transforming the signal type of the collected sound data from a time domain signal into a frequency domain signal; the noise power spectrum estimation module is used for dividing the frequency domain signal into a low-frequency signal and a medium-high frequency signal, calculating a low-frequency noise power spectrum estimation value according to the amplitude value of the low-frequency signal, and calculating a medium-high frequency noise power spectrum estimation value according to the amplitude value of the medium-high frequency signal; the gain frequency domain calculation module is used for calculating according to the low-frequency noise power spectrum estimation value, the medium-high frequency noise power spectrum estimation value and the frequency domain signal to obtain a frequency domain signal after gain; and the signal inverse transformation module is used for transforming the frequency domain signal after the gain into a time domain signal after the gain.
The invention provides a noise suppression method and a system for processing in different frequency bands.A frequency domain signal is divided into a low-frequency signal and a medium-high frequency signal, the low-frequency signal is processed to obtain a low-frequency noise power spectrum estimated value, the medium-high frequency signal is processed to obtain a medium-high frequency noise power spectrum estimated value, and the frequency domain signal is calculated according to the low-frequency noise power spectrum estimated value and the medium-high frequency noise power spectrum estimated value to obtain a gained frequency domain signal; according to the signal characteristics of different frequencies, different noise processing modes are adopted, noise and voice are divided more accurately, the propagation of noise is effectively inhibited, the voice call efficiency is improved, and the effect of enhancing the voice call is achieved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a flowchart of a noise suppression method for processing in frequency bands according to an embodiment of the present invention.
Fig. 2 is a sub-flowchart of step S1 in fig. 1.
Fig. 3 is a sub-flowchart of step S2 in fig. 1.
Fig. 4 is a sub-flowchart of step S22 in fig. 3.
Fig. 5 is a sub-flowchart of step S223 in fig. 4.
Fig. 6 is a sub-flowchart of step S23 in fig. 3.
Fig. 7 is a sub-flowchart of step S3 in fig. 1.
Fig. 8 is a sub-flowchart of step S33 in fig. 7.
Fig. 9 is a structural diagram of a noise suppression system performing processing in different frequency bands according to an embodiment of the present invention.
Fig. 10 is a schematic diagram of the structure of the signal conversion module in fig. 9.
Fig. 11 is a schematic diagram of the structure of the noise power spectrum estimation module in fig. 9.
Fig. 12 is a schematic diagram of the structure of the low-frequency power spectrum estimation module in fig. 11.
Fig. 13 is a schematic structural diagram of the non-fundamental frequency point power spectrum estimation module in fig. 12.
Fig. 14 is a schematic diagram of a structure of the medium-high frequency power spectrum estimation module in fig. 11.
Fig. 15 is a schematic diagram of the structure of the gain frequency domain calculating module in fig. 9.
Fig. 16 is a schematic diagram of the structure of the gain signal calculation module in fig. 15.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
In the description of the embodiments of the present invention, it should be understood that the terms "first" and "second" are only used for convenience in describing the present invention and simplifying the description, and thus, should not be construed as limiting the present invention.
Referring to fig. 1, fig. 1 is a flowchart of a noise suppression method for processing in frequency division according to an embodiment of the present invention.
As shown in fig. 1, the present invention provides a noise suppression method for processing in different frequency bands, which is used for suppressing noise in a call sound, and the method includes the following steps: step S1, converting the signal type of the collected sound data from time domain signals into frequency domain signals; step S2, dividing the frequency domain signal into a low frequency signal and a middle-high frequency signal, calculating a low frequency noise power spectrum estimation value according to the amplitude value of the low frequency signal, and calculating a middle-high frequency noise power spectrum estimation value according to the amplitude value of the middle-high frequency signal; step S3, calculating according to the low-frequency noise power spectrum estimated value, the medium-high frequency noise power spectrum estimated value and the frequency domain signal to obtain a frequency domain signal after gain; in step S4, the frequency domain signal after the gain is converted into a time domain signal after the gain.
Therefore, according to the noise suppression method for processing in different frequency bands, provided by the invention, a frequency domain signal is divided into a low-frequency signal and a medium-high frequency signal, the low-frequency signal is processed to obtain a low-frequency noise power spectrum estimation value, the medium-high frequency signal is processed to obtain a medium-high frequency noise power spectrum estimation value, and the frequency domain signal is calculated according to the low-frequency noise power spectrum estimation value and the medium-high frequency noise power spectrum estimation value to obtain a gained frequency domain signal; according to the signal characteristics of different frequencies, different noise processing modes are adopted, noise and voice are divided more accurately, the propagation of noise is effectively inhibited, the voice call efficiency is improved, and the effect of enhancing the voice call is achieved.
Referring to fig. 2, fig. 2 is a schematic processing flow diagram of a full frequency signal according to an embodiment of the invention.
As shown in fig. 2, in some embodiments, the step S1 includes: step S11, acquiring sound data of which the signal type is a time domain signal; step S12, framing the time domain signal according to a preset time interval to obtain a multi-frame time domain signal; step S13, a frequency domain signal is obtained by performing time-frequency transformation on each frame of time domain signal.
The voice data comprises voice data and environmental voice data which are sent by a user and collected by the communication equipment in the process that the user uses the communication equipment; the voice data is generally acquired and converted through a microphone, and after analog signals of voice sent by a user and environmental sound are acquired through the microphone, the analog signals are converted into digital signals through analog-to-digital conversion; here, the sound data is a digital signal subjected to analog-to-digital conversion.
The time domain signal is a signal obtained by arranging the sound data according to the sequence of the collected time; the time domain is used to describe the variation of physical quantity with time. And the time domain waveform of the time domain signal is used for expressing the change situation of the signal along with time.
The frequency domain signal is a signal obtained by sorting the sound data in order of frequency magnitude. The frequency domain is used to describe the variation of the physical quantity with frequency. The frequency domain waveform of the frequency domain signal is used for expressing the change condition of the amplitude of the signal along with the frequency, the frequency domain signal is composed of a plurality of frequency points, and each frequency point comprises corresponding amplitude information and phase information.
The framing refers to combining the collected sound data in time domain signals according to a preset time interval, combining the time domain signals in the preset time interval to form a frame, wherein the time domain signals are multi-frame time domain signals after being combined, and the time frequency signals are subjected to subsequent transformation by taking the frame as a unit.
The time-frequency transformation in this embodiment is to perform fourier transformation on each frame in a time-domain signal, obtain amplitude and phase information of a plurality of corresponding frequency points after each frame is subjected to fourier transformation, and obtain a frequency-domain signal after all frequency points are summarized; in other embodiments, other transformation methods may be used to convert the time domain signal into the frequency domain signal.
Specifically, the sound data collected by the microphone are arranged according to the collection time sequence to generate time domain signals, the time domain signals are combined in the time domain signals according to a preset time interval, the time domain signals in the preset time interval are combined into a frame, then each combined frame is subjected to Fourier transform, and the frequency points obtained by each frame through Fourier transform are collected to obtain frequency domain signals.
Referring to fig. 3, fig. 3 is a sub-flowchart of step S2 in fig. 1.
As shown in fig. 3, in an embodiment, the step S2 includes: step S21, dividing the frequency domain signal into a low frequency signal and a medium-high frequency signal according to a preset frequency division standard; step S22, performing noise power spectrum density estimation on the single-path low-frequency signal acquired by the single-path microphone in the frequency domain signal to obtain noise power spectrum estimation values of all low-frequency points in the single-path low-frequency signal; and step S23, performing noise power spectrum density estimation on the two-way medium-high frequency signals collected by the two-way microphone in the frequency domain signals to obtain noise power spectrum estimation values of all medium-high frequency points in the two-way medium-high frequency signals.
The preset frequency division standard refers to a preset standard for dividing frequency domain signals; in this embodiment, the frequency domain signal may be divided into a low frequency signal and a medium-high frequency signal according to the preset frequency division standard; in this embodiment, the preset frequency division standard may be a fixed frequency; for example, the predetermined frequency division standard may be a standard for dividing low-frequency signals and medium-high frequency signals with a predetermined frequency, such as 1000 Hz.
The low-frequency signal refers to a signal segment lower than the preset frequency in the frequency domain signal. For example, a signal segment of the frequency domain signal below the preset frequency belongs to the low frequency signal. And the frequency points in the low-frequency signals are low-frequency points.
The medium-high frequency signal refers to a signal segment which is not lower than the preset frequency in the frequency domain signal. For example, a signal segment equal to or higher than the preset frequency in the frequency domain signal belongs to the medium-high frequency signal. And the frequency points in the medium-high frequency signals are medium-high frequency points.
The microphones are parts used for collecting sound data in the communication equipment, and the number of the microphones is generally two, and the two microphones are respectively arranged at different parts of the communication equipment; when a user communicates, the two microphones simultaneously collect sound data, and the collected sound data are respectively subjected to time-frequency transformation to obtain two paths of frequency domain signals; when the noise suppression processing is carried out on the signals of different frequency bands, the frequency domain signals of the microphones of different paths are adopted. In this embodiment, when performing noise suppression processing on a low-frequency signal, a frequency-domain signal of one microphone with a high signal-to-noise ratio is selected to perform noise suppression processing. And when the middle-high frequency signals are subjected to noise suppression processing, two paths of frequency domain signals of two paths of microphones are simultaneously selected for noise suppression processing.
The Power Spectral Density (PSD), sometimes also referred to as Spectral Power Distribution (SPD), is the fourier transform of the autocorrelation function of the signal (noise), i.e., the power carried by the signal (noise) per unit frequency. The power spectral density is a probabilistic method and is a measure of the mean square value of random variables.
The noise power spectrum estimation value, namely the estimation value of the power spectral density of the noise, is also called power spectrum estimation. The power spectral density of noise is used to describe the energy characteristics of noise as a function of frequency.
Specifically, after the preset frequency division standard is determined, a signal segment of the frequency domain signal with a frequency lower than the preset frequency division standard is used as a low frequency segment, and a signal segment equal to or higher than the preset frequency division standard is used as a medium frequency segment according to the preset frequency division standard.
And performing time-frequency transformation on sound data acquired by two microphones to obtain two paths of frequency domain signals, selecting one path of frequency domain signal with high signal-to-noise ratio, selecting a low frequency band in the path of frequency domain signal as a low frequency signal according to the preset frequency division standard, and performing noise power spectral density estimation on the low frequency signal to obtain noise power spectral estimation values of all low frequency points in the low frequency signal.
And simultaneously selecting the middle-high frequency bands of the two paths of frequency domain signals as middle-high frequency signals according to the preset frequency division standard, and estimating the noise power spectral density of the middle-high frequency signals to obtain the noise power spectral estimated values of all the middle-high frequency points in the middle-high frequency signals.
In other embodiments, the preset frequency division standard may be a fixed frequency segment; two paths of frequency domain signals are divided more finely according to the frequency segments; for example, 0-1000Hz is the low frequency band, 1000-3000Hz is the middle frequency band, and above 3000Hz is the high frequency band.
Therefore, two paths of sound data are collected through two paths of microphones respectively, time-frequency transformation is carried out on the two paths of sound data respectively to obtain two paths of frequency domain signals, low-frequency signals and middle-high frequency signals are selected from the two paths of frequency domain signals respectively, noise power spectral density estimation is carried out on the low-frequency signals and the middle-high frequency signals, the frequency domain signals are processed in a segmented mode by means of the accuracy of noise estimation in the low-frequency signals and the correlation of noise estimation in the middle-high frequency signals, the collected two paths of sound data are fully utilized, the noise power spectral estimation value is improved, and residual noise is removed as far as possible.
Referring to fig. 4, fig. 4 is a sub-flowchart of step S22 in fig. 3.
In some embodiments, the step S22 includes: step S221, squaring the amplitude of each low-frequency point in the single-path low-frequency signal to obtain a square value of the amplitude of each low-frequency point; step S222, dividing the low-frequency points into fundamental frequency points and non-fundamental frequency points by a fundamental tone detection method; step S223, calculating to obtain a noise power spectrum estimation value of the non-fundamental frequency point according to the square value of the amplitude of the non-fundamental frequency point; step S224, calculating to obtain a noise power spectrum estimation value of the base frequency point according to the square value of the amplitude of the base frequency point and the noise power spectrum estimation value of the non-base frequency point; and step S225, combining the noise power spectrum estimation value of the non-fundamental frequency point with the noise power spectrum estimation value of the fundamental frequency point to obtain the noise power spectrum estimation values of all low-frequency points.
The pitch is a period of vocal cord vibration when a person utters a voiced sound, and the estimation of the pitch period is called pitch detection, and is intended to extract a pitch frequency, which is one of the most important characteristic parameters in speech signal processing, as a contour curve of pitch period variation that matches or possibly matches the frequency of the vocal cord vibration of the person.
The fundamental tone detection method is a method for detecting a fundamental tone signal, and since a speech signal can be regarded as a dynamic non-stationary random process, and the frequency variation range of speech waveform and vocal cord vibration is large and very complex, the fundamental tone detection method includes many algorithms, such as an overtone inner product spectrum method, a cepstrum analysis method, a maximum likelihood estimation method, and the like; in this embodiment, the low-frequency point is detected by using a cepstrum method, and the pitch frequency in the pitch detection may be determined by the following calculation method:
when the time-domain signal sequence is x (n), the inverse Fourier transform of the logarithm of the amplitude spectrum
Figure BDA0002311688780000071
I.e., the cepstral sequence of x (n), i.e.:
X(ω)=FT[x(n)]
Figure BDA0002311688780000072
wherein FT and FT -1 Respectively representing a fourier transform and an inverse fourier transform. Since the time-domain signal x (n) is obtained by filtering the glottal pulse excitation u (n) with the acoustic channel response v (n), i.e.
x(n)=u(n)*v(n)
Let the three quantities of cepstrum be respectively
Figure BDA0002311688780000073
And
Figure BDA0002311688780000074
then there is
Figure BDA0002311688780000075
In the cepstral domain, the spectral domain,
Figure BDA0002311688780000076
and
Figure BDA0002311688780000077
is relatively separated, meaning that the cepstrum of the sounding pulse containing the fundamental information can be separated from the cepstrum of the vocal tract response, and thus separated from the cepstral domain
Figure BDA0002311688780000078
And recovering u (n), calculating a pitch period from the u (n), and calculating the reciprocal of the pitch period to obtain the pitch frequency.
The fundamental frequency point refers to a frequency point in which the square of the amplitude in the low-frequency signal conforms to the fundamental frequency.
The non-fundamental frequency point refers to a frequency point of which the square of the amplitude in the low-frequency signal does not accord with the fundamental frequency.
The noise power spectrum estimation value of the fundamental frequency point is an estimation value of the power spectral density of the noise of the fundamental frequency point. The noise power spectrum estimation value of the non-fundamental frequency point is an estimation value of the power spectral density of the noise of the non-fundamental frequency point. The noise power spectrum estimation value can be calculated by the following formula:
Figure BDA0002311688780000079
wherein M ═ f 0 ,2f 0 ,3f 0 ,…}。f 0 To the fundamental frequency, 2f 0 ,3f 0 And … denotes harmonic frequencies. M represents a set of base frequency points. | X (lambda, mu) emittingphosphor 2 The method comprises the steps of inputting the original input, namely, the noise power spectrum estimation value of a non-fundamental frequency point is the square of the amplitude value of the non-fundamental frequency point; i X inter (λ,μ)| 2 The method comprises the steps of calculating a noise power spectrum estimation value of a basic frequency point by using an interpolation method, namely interpolating through the noise power spectrum estimation values of two adjacent non-basic frequency points to obtain the noise power spectrum estimation value of the basic frequency point.
Specifically, the amplitude of each low-frequency point in the single-channel low-frequency signal is squared to obtain a square value of the amplitude of each low-frequency point.
And screening all frequency points in the low-frequency signal according to a pitch detection method, and screening out frequency points of which the squares of the amplitudes of the frequency points in the low-frequency signal accord with the pitch frequency as base frequency points and frequency points which do not accord with the pitch frequency as non-base frequency points.
And calculating to obtain the estimated value of the power spectrum of the noise at the fundamental frequency point and the estimated value of the power spectrum of the noise at the non-fundamental frequency point according to the calculation formula of the estimated value of the power spectrum of the noise at the fundamental frequency point and the estimated value of the power spectrum of the noise at the non-fundamental frequency point.
And combining and summarizing the calculated noise power spectrum estimation value of the non-fundamental frequency point and the noise power spectrum estimation value of the fundamental frequency point to obtain the noise power spectrum estimation values of all low-frequency points.
Referring to fig. 5, fig. 5 is a sub-flowchart of step S223 in fig. 4.
In some embodiments, the step S223 includes: step S2231, calculating the voice existence probability value of each low-frequency point according to the square value of the amplitude of each low-frequency point; step S2232, calculating to obtain a noise power spectrum preliminary estimation value of each non-fundamental frequency point according to the square value of the amplitude of each non-fundamental frequency point; step S2233, finding out the voice existence probability value of the non-fundamental frequency point corresponding to the non-fundamental frequency point from the voice existence probability values of all the low-frequency points; and step S2234, calculating to obtain a noise power spectrum estimation value of each non-fundamental frequency point according to the voice existence probability value of each non-fundamental frequency point and the noise power spectrum preliminary estimation value of the corresponding non-fundamental frequency point.
The voice is the voice of the user speech in the voice data, the voice data comprises voice data and noise data, and the voice data is data required to be transmitted in a call.
The voice existence probability value of the low-frequency point refers to the possibility of voice existence in the low-frequency point. The voice existence probability value can be calculated by the following formula:
Figure BDA0002311688780000081
in the calculation formula, q (k, lambda) represents the voice nonexistence probability, and the value can be obtained by comparing the square of the amplitude spectrum of the corresponding frequency point with a preset threshold value; ξ (k, λ) is the prior signal-to-noise ratio; v (k, λ) can be calculated from the a posteriori signal-to-noise ratio and a priori signal-to-noise ratio definitions. Where the a priori signal-to-noise ratio is the power of the clean speech signal divided by the power of the noise signal. The a posteriori signal-to-noise ratio is the power of the noisy speech signal divided by the power of the noise signal.
The preliminary estimation value of the noise power spectrum of the non-basic frequency point is obtained by preliminarily calculating the estimation value of the noise power spectrum of the non-basic frequency point by using the formula.
Specifically, the square value of the amplitude of each low-frequency point in the low-frequency signal is substituted into the formula of the voice existence probability to calculate, so as to obtain the voice existence probability values of all the low-frequency points in the low-frequency signal.
And substituting the square value of the amplitude of each non-basic frequency point into the formula of the noise power spectrum estimation value to calculate to obtain a noise power spectrum initial estimation value of each non-basic frequency point.
And correspondingly searching the voice existence probability value of each non-fundamental frequency point in the voice existence probability values of all the low frequency points according to the voice existence probability values of all the low frequency points and the non-fundamental frequency points in the pitch detection result.
And multiplying the voice existence probability value of each non-fundamental frequency point by the corresponding noise power spectrum preliminary estimation value of the non-fundamental frequency point, and calculating to obtain the noise power spectrum estimation value of each non-fundamental frequency point.
Therefore, the low-frequency points in the low-frequency signal are divided into fundamental frequency points and non-fundamental frequency points, and then different calculation modes are adopted for the fundamental frequency points and the non-fundamental frequency points respectively to calculate the noise power spectrum estimation value of the fundamental frequency points and the noise power spectrum estimation value of the non-fundamental frequency points in the low-frequency signal; by introducing the fundamental tone detection method, the quality of call voice is ensured, the voice call effect under stable and non-stable noise environments is improved, the noise is distinguished more accurately, and the noise suppression efficiency is improved.
Referring to fig. 6, fig. 6 is a sub-flowchart of step S23 in fig. 3.
In some embodiments, step S231 obtains a square value of the amplitude of each of the medium-high frequency points by squaring the amplitude of each of the frequency points in the two-path medium-high frequency signal; step S232, calculating to obtain the self-power spectrum value of each middle-high frequency point and the cross-power spectrum value of each middle-high frequency point according to the square value of the amplitude of each middle-high frequency point; step S233, calculating the correlation value of each middle-high frequency point according to the square value, the self-power spectrum value and the cross-power spectrum value of the amplitude of each middle-high frequency point; step S234, calculating to obtain a preliminary estimation value of the noise power spectrum of each medium-high frequency point according to the correlation value, the self-power spectrum value and the cross-power spectrum value of each medium-high frequency point; step S235, calculating the voice existence probability of each medium-high frequency point according to the relevance value of each medium-high frequency point; step S236, the noise power spectrum estimation value of each middle-high frequency point is obtained by smoothing the preliminary estimation value of the noise power spectrum of each middle-high frequency point and the voice existence probability of the corresponding middle-high frequency point.
The self-power spectrum value and the cross-power spectrum value are used for reflecting the internal relation of the random signal and other signals at different moments expressed by the correlation function in the time domain, and the waveform similarity degree between the same random sample at different moments is known. The calculation formulas of the self-power spectrum and the cross-power spectrum value are as follows:
Figure BDA0002311688780000091
wherein, λ represents frame number, μ represents frequency point, α s In order to smooth out the coefficients of the coefficients,
Figure BDA0002311688780000092
for self-power spectral cross-power spectral values, X, of two microphones i ,X j The amplitude of the frequency domain signal represents the complex conjugate.
The correlation value comprises a correlation value of the speech signal in the two microphones and a correlation value of the noise signal in the two microphone signals. The relevance value may be calculated according to the following formula:
Figure BDA0002311688780000101
Figure BDA0002311688780000102
Figure BDA0002311688780000103
Figure BDA0002311688780000104
wherein, gamma is x,λ Correlation of signals received by two microphones;
Figure BDA0002311688780000105
is an estimate of noise correlation; gamma-shaped s,cor Is a preliminary estimation of speech relevance;
Figure BDA0002311688780000106
an estimate of the smoothed speech correlation; gamma is the posterior signal-to-noise ratio; alpha is alpha Γ Is a smoothing factor.
The voice existence probability of the medium-high frequency points refers to the possibility of existence of voice signals in each medium-high frequency point. The voice existence probability of the medium-high frequency point can be calculated by the following formula:
Figure BDA0002311688780000107
the smoothing treatment refers to substituting the noise power spectrum estimation value of each medium-high frequency point and the voice existence probability of the corresponding frequency point into a formula to calculate to obtain the noise power spectrum estimation value of the smoothed medium-high frequency point;
the calculation formula of the smoothing process is as follows:
Figure BDA0002311688780000108
the preliminary estimation value of the noise power spectrum of the medium-high frequency point is an estimation value of the noise power spectrum of the medium-high frequency point obtained by preliminary calculation according to a formula; the calculation formula of the preliminary estimation value of the noise power spectrum of the medium-high frequency point is as follows:
Figure BDA0002311688780000109
specifically, firstly, the medium-high frequency signals in the two paths of frequency domain signals are simultaneously obtained, and the amplitude of each frequency point in the medium-high frequency signals is squared to obtain the square value of the amplitude of each medium-high frequency point.
And substituting the square value of the amplitude of each medium-high frequency point into the calculation formula of the self-power spectrum value and the cross-power spectrum value, and calculating to obtain the self-power spectrum value and the cross-power spectrum value of each medium-high frequency point.
And then, substituting the square value, the self-power spectrum value and the cross-power spectrum value of the amplitude of each medium-high frequency point into the calculation formula of the correlation value, and calculating to obtain the correlation value of each medium-high frequency point.
And then, according to a calculation formula of the preliminary estimation value of the noise power spectrum of each medium-high frequency point, the correlation value, the self-power spectrum value and the cross-power spectrum value of each medium-high frequency point are substituted into the preliminary estimation value of the noise power spectrum of the medium-high frequency point, and the preliminary estimation value of the noise power spectrum of each medium-high frequency point is calculated and obtained.
Meanwhile, the voice existence probability of each medium-high frequency point is calculated and obtained by substituting the correlation value of each medium-high frequency point into the calculation formula of the voice existence probability of the medium-high frequency points.
And finally, substituting the preliminary estimation value of the noise power spectrum of each medium-high frequency point and the voice existence probability of the corresponding medium-high frequency point into the smoothing formula for calculation to obtain the estimation value of the noise power spectrum of each medium-high frequency point.
Therefore, the amplitude values of the medium-high frequency points are substituted into a series of formulas to be calculated, the characteristic that the correlation difference of the voice in the medium-high frequency range is obvious is utilized, the noise suppression processing is carried out on the frequency points of the medium-high frequency signals, and the noise power spectrum estimation value of each frequency point in the medium-high frequency signals is obtained.
Referring to fig. 7, fig. 7 is a sub-flowchart of step S3 in fig. 1.
As shown in fig. 7, in some embodiments, the step S3 includes: step S31, combining the low-frequency noise power spectrum estimation value and the medium-high frequency noise power spectrum estimation value to obtain a full-frequency-band noise power spectrum estimation value; step S32, gain calculation is carried out according to the noise power spectrum estimation value of each frequency point in the full-band noise power spectrum estimation value, and the amplitude spectrum gain value of each frequency point is obtained; and step S33, calculating the frequency domain signal after the gain according to the magnitude spectrum gain value of each frequency point and the frequency domain signal.
The full-band noise power spectrum estimation value refers to noise power spectrum estimation values of all frequency points in a frequency domain signal.
The amplitude spectrum gain value refers to the gain proportion of the amplitude of each frequency point obtained by substituting the noise power spectrum estimation value of the frequency point into an amplitude gain function.
The frequency domain signal after the gain refers to the amplitude of each frequency point obtained after each frequency point in the original frequency domain signal is gained according to the amplitude spectrum gain value of the corresponding frequency point. The original frequency domain signal refers to a frequency domain signal obtained after the time-frequency transformation.
Specifically, the low-frequency noise power spectrum and the medium-high frequency noise power spectrum are combined and summarized to obtain the noise power spectrum estimation values of all frequency points in the frequency domain signal.
And substituting the noise power spectrum estimation value of each frequency point in the noise power spectrum estimation values of all the frequency points into a gain function for calculation to obtain the amplitude spectrum gain value of each frequency point.
Calculating according to the amplitude spectrum gain value of each frequency point and the amplitude of the corresponding frequency point in the original frequency domain signal to obtain the amplitude of each frequency point after gain, and combining and summarizing the amplitudes of all frequency points after gain to obtain the frequency domain signal after gain.
Referring to fig. 8, fig. 8 is a sub-flowchart of step S33 in fig. 7.
As shown in fig. 8, in some embodiments, the step S33 includes: step S331, obtaining the amplitude spectrum gain value of each frequency point and the amplitude value of the corresponding frequency point in the frequency domain signal; step S332, multiplying the amplitude spectrum gain value of each frequency point with the amplitude value of the corresponding frequency point to obtain the amplitude value of each frequency point after the gain, and combining the amplitude values of all frequency points in the frequency domain signal after the gain to obtain the frequency domain signal after the gain.
Specifically, the amplitude of each frequency point is obtained in the original frequency domain signal according to the amplitude spectrum gain value of each frequency point.
Multiplying the amplitude spectrum gain value of each frequency point by the amplitude value of the corresponding frequency point to obtain the amplitude value of each frequency point after the gain; and combining the gained amplitudes of all the frequency points to obtain the gained frequency domain signal.
Therefore, the amplitude spectrum gain value of each frequency point is calculated according to the noise power spectrum estimation value of each frequency point in the frequency domain signal, the amplitude of each frequency point in the original frequency domain signal is multiplied according to the amplitude of each amplitude spectrum gain value, and the frequency domain signal after gain is calculated, so that the speech identifiability is improved, the noise suppression efficiency is improved by performing noise suppression treatment on each frequency point, the speech data in the call is clearer, the transmission of the noise data is suppressed, and the quality of the call is improved.
In some embodiments, the step S4 includes: and carrying out inverse time-frequency transformation on the gained amplitude of each frequency point in the gained frequency domain signal to obtain a gained time domain signal.
The inverse time-frequency transformation is to transform the frequency domain signal into a time domain signal, and each frequency point is subjected to inverse Fourier transformation, so that the whole frequency domain signal is transformed into the time domain signal.
Specifically, the frequency domain signals after the gain are subjected to inverse Fourier transform according to frames to obtain time domain signals of corresponding frames, and the time domain signals of all the frames are combined to obtain time domain signals after the gain; thereby, the gained frequency domain signal is transformed into a gained time domain signal.
The noise suppression method for processing in frequency bands provided by the present invention may be implemented in hardware, firmware, or as software or computer code that may be stored in a computer readable storage medium such as a CD, ROM, RAM, floppy disk, hard disk, or magneto-optical disk, or as computer code that is originally stored on a remote recording medium or a non-transitory machine readable medium, downloaded over a network, and stored in a local recording medium, so that a noise suppression method for processing in frequency bands described herein may be presented using a general purpose computer or special processor, or in programmable or dedicated hardware such as an ASIC or FPGA as software stored on a recording medium. As can be appreciated in the art, the computer, processor, microprocessor, controller or programmable hardware includes a memory component, e.g., RAM, ROM, flash memory, etc., which can store or receive software or computer code when accessed and executed by the computer, processor or hardware implementing one of the noise suppression methods for processing in bands described herein. In addition, when a general-purpose computer accesses code for implementing the processing shown herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the processing shown herein.
The computer readable storage medium may be a solid state memory, a memory card, an optical disc, etc. The computer-readable storage medium stores program instructions for a computer to invoke and then execute a noise suppression method for processing in different frequency bands as shown in fig. 1 to 8.
Referring to fig. 9, fig. 9 is a structural diagram of a noise suppression system 100 for performing processing in different frequency bands according to an embodiment of the present invention.
In some embodiments, the noise suppression system 100 for processing with sub-bands comprises: a signal transformation module 10, configured to transform the signal type of the acquired sound data from a time domain signal to a frequency domain signal; a noise power spectrum estimation module 20, configured to divide the frequency domain signal into a low frequency signal and a medium-high frequency signal, calculate a low frequency noise power spectrum estimation value according to an amplitude value of the low frequency signal, and calculate a medium-high frequency noise power spectrum estimation value according to an amplitude value of the medium-high frequency signal; the gain frequency domain calculation module 30 is configured to calculate according to the low-frequency noise power spectrum estimation value, the medium-high frequency noise power spectrum estimation value, and the frequency domain signal to obtain a frequency domain signal after gain; and a signal inverse transformation module 40, configured to transform the gained frequency domain signal into a gained time domain signal.
Referring to fig. 10, fig. 10 is a schematic structural diagram of the signal conversion module 10 in fig. 9.
In some embodiments, the signal transformation module 10 comprises: the signal acquisition module 11 is configured to acquire sound data of which the acquired signal type is a time-domain signal; the signal framing module 12 is configured to frame the time domain signal according to a preset time interval to obtain a multi-frame time domain signal; and the time-frequency transformation module 13 is configured to perform time-frequency transformation on each frame of time-domain signal to obtain a frequency-domain signal.
Referring to fig. 11, fig. 11 is a schematic structural diagram of the noise power spectrum estimation module 20 in fig. 9.
In some embodiments, the noise power spectrum estimation module 20 comprises: the signal dividing module 21 is configured to divide the frequency domain signal into a low frequency signal and a medium-high frequency signal according to a preset frequency division standard; the low-frequency power spectrum estimation module 22 is configured to perform noise power spectrum density estimation on the single-path low-frequency signal acquired by the single-path microphone in the frequency domain signal to obtain noise power spectrum estimation values of all low-frequency points in the single-path low-frequency signal; and the medium-high frequency power spectrum estimation module 23 is configured to perform noise power spectrum density estimation on the two-way medium-high frequency signal acquired by the two-way microphone in the frequency domain signal to obtain noise power spectrum estimation values of all medium-high frequency points in the two-way medium-high frequency signal.
Referring to fig. 12, fig. 12 is a schematic structural diagram of the low-frequency power spectrum estimation module 22 in fig. 11.
In some embodiments, the low frequency power spectrum estimation module 22 comprises: the low-frequency amplitude squaring module 221 is configured to square the amplitude of each low-frequency point in the single-channel low-frequency signal to obtain a square value of the amplitude of each low-frequency point; a fundamental tone detection module 222, configured to divide the low-frequency points into fundamental frequency points and non-fundamental frequency points through a fundamental tone detection method; the non-fundamental frequency point power spectrum estimation module 223 is used for calculating a noise power spectrum estimation value of the non-fundamental frequency point according to a square value of the amplitude of the non-fundamental frequency point; a fundamental frequency point power spectrum estimation module 224, configured to calculate a noise power spectrum estimation value of the fundamental frequency point according to a square value of the amplitude of the fundamental frequency point and a noise power spectrum estimation value of the non-fundamental frequency point; and the low-frequency power spectrum combination module 225 is configured to combine the noise power spectrum estimation value of the non-fundamental frequency point with the noise power spectrum estimation value of the fundamental frequency point to obtain the noise power spectrum estimation values of all low-frequency points.
Referring to fig. 13, fig. 13 is a schematic structural diagram of the non-fundamental frequency point power spectrum estimation module 223 in fig. 12.
In some embodiments, the non-fundamental point power spectrum estimation module 223 includes: the low-frequency voice probability calculation module 2231 is used for calculating the voice existence probability value of each low-frequency point according to the square value of the amplitude of each low-frequency point; a non-fundamental frequency point power spectrum preliminary estimation module 2232, configured to obtain a preliminary noise power spectrum estimation value of each non-fundamental frequency point through calculation according to a square value of the amplitude of each non-fundamental frequency point; a non-fundamental frequency point voice probability searching module 2233, configured to search the voice existence probability values of the non-fundamental frequency points corresponding to the non-fundamental frequency points in the voice existence probability values of all the low frequency points; and the non-fundamental frequency point power spectrum calculation module 2234 is configured to calculate a noise power spectrum estimation value of each non-fundamental frequency point according to the voice existence probability value of each non-fundamental frequency point and the corresponding noise power spectrum preliminary estimation value of the non-fundamental frequency point.
Referring to fig. 14, fig. 14 is a schematic structural diagram of the medium-high frequency power spectrum estimation module 23 in fig. 11.
In some embodiments, the medium-high frequency power spectrum estimation module 23 includes: the medium-high frequency amplitude squaring module 231 is used for squaring the amplitude of each frequency point in the two paths of medium-high frequency signals to obtain a square value of the amplitude of each medium-high frequency point; the power spectrum value calculation module 232 is configured to calculate a self-power spectrum value of each medium-high frequency point and a cross-power spectrum value of each medium-high frequency point according to a square value of the amplitude of each medium-high frequency point; a correlation value calculating module 233, configured to calculate a correlation value of each medium-high frequency point according to a square value of the amplitude of each medium-high frequency point, an own power spectrum value, and a cross power spectrum value; the medium-high frequency preliminary estimation module 234 is configured to calculate a preliminary estimation value of the noise power spectrum of each medium-high frequency point according to the correlation value, the self-power spectrum value, and the cross-power spectrum value of each medium-high frequency point; the medium-high frequency voice probability calculation module 235 is used for calculating the voice existence probability of each medium-high frequency point according to the correlation value of each medium-high frequency point; and the medium-high frequency smoothing processing module 236 is configured to perform smoothing processing on the preliminary estimated value of the noise power spectrum of each medium-high frequency point and the voice existence probability of the corresponding medium-high frequency point to obtain an estimated value of the noise power spectrum of each medium-high frequency point.
Referring to fig. 15, fig. 15 is a schematic structural diagram of the gain frequency domain calculating module 30 in fig. 9.
In some embodiments, the gain frequency domain calculation module 30 includes: a full-band power spectrum combination module 31, configured to combine the low-frequency noise power spectrum estimation value with the medium-high frequency noise power spectrum estimation value to obtain a full-band noise power spectrum estimation value; the amplitude spectrum gain calculation module 32 is configured to perform gain calculation according to the noise power spectrum estimation value of each frequency point in the full-band noise power spectrum estimation value to obtain an amplitude spectrum gain value of each frequency point; and the gain signal calculation module 33 is configured to calculate a frequency domain signal after gain according to the magnitude spectrum gain value of each frequency point and the frequency domain signal.
Referring to fig. 16, fig. 16 is a schematic structural diagram of the gain signal calculation module 33 in fig. 15.
In some embodiments, the gain signal calculation module 33 includes: the amplitude spectrum gain value acquiring module 331 is configured to acquire an amplitude spectrum gain value of each frequency point and an amplitude value of a corresponding frequency point in the frequency domain signal; and the gain amplitude value calculation module 332 is configured to multiply the amplitude spectrum gain value of each frequency point with the amplitude value of the corresponding frequency point to obtain a gained amplitude value of each frequency point, and combine the gained amplitude values of all frequency points in the frequency domain signal to obtain a gained frequency domain signal.
In some embodiments, the signal inverse transformation module 40 is further configured to perform inverse time-frequency transformation on the gained amplitude of each frequency point in the gained frequency domain signal to obtain a gained time domain signal.
The noise suppression system 100 for Processing with frequency division also includes a Central Processing Unit (CPU), and may also be other general purpose processors, Digital Signal Processors (DSP), Application Specific Integrated Circuits (ASIC), Field Programmable Gate Arrays (FPGA) or other Programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, and so on. The general-purpose processor may be a microprocessor, or the processor may be any conventional processor, etc., and the processing unit is a data processing center of the noise suppression system performing processing in a frequency division band, and each module of the noise suppression system 100 performing processing in the entire frequency division band is connected by a wired or wireless line. For processing the data transmitted by each module.
As shown in fig. 1, in some embodiments, the noise suppression system 100 for processing in multiple frequency bands further includes a storage module 50, where the storage module 50 is configured to store the time domain signal and the frequency domain signal
The memory module 50 may include a high speed random access memory, and may also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), a plurality of magnetic disk storage devices, a Flash memory device, or other volatile solid state storage devices.
Specifically, the storage module 50 is located in the telephony device, and is mainly connected to the processing unit, and is configured to store the sound data collected by the microphone, and the frequency domain signal, the time domain signal, the noise power spectrum estimation value, and the like processed by the processing unit.
In other embodiments, the storage module 50 may also be connected to each module of the noise suppression system 100 for processing with frequency division, and is used to store data generated by each module during parameter calculation.
As shown in fig. 1, in some embodiments, the noise suppression system for processing in different frequency bands includes a telephony device, which may be a mobile terminal, and the signal transformation module 10, the noise power spectrum estimation module 20, the gain frequency domain calculation module 30, the signal inverse transformation module 40, the storage module 50, and the like are all disposed in the host device; the signal transformation module 10 is connected with the noise power spectrum estimation module 20 in a wired or wireless manner, and can transmit the frequency domain signal after time-frequency transformation to the noise power spectrum estimation module 20; the noise power spectrum estimation module 20 is connected with the gain frequency domain calculation module 30 in a wired or wireless manner, and can transmit the calculated low-frequency noise power spectrum estimation value and the calculated medium-high frequency noise power spectrum estimation value to the gain frequency domain calculation module 30; the gain frequency domain calculation module 30 is connected to the signal inverse transformation module 40 in a wired or wireless manner, and can transmit the gained frequency domain signal to the signal inverse transformation module 40, so that the signal inverse transformation module 40 inversely time-converts the gained frequency domain signal into a time domain signal; the storage module 50 is connected with the signal transformation module 10, the noise power spectrum estimation module 20, the gain frequency domain calculation module 30 and the signal inverse transformation module 40 in a wired or wireless manner, and can store data such as time domain signals, frequency domain signals, low-frequency noise power spectrum estimation values, medium-high frequency noise power spectrum estimation values and the like.
The invention provides a noise suppression method and a system for processing in different frequency bands.A frequency domain signal is divided into a low-frequency signal and a medium-high frequency signal, the low-frequency signal is processed to obtain a low-frequency noise power spectrum estimated value, the medium-high frequency signal is processed to obtain a medium-high frequency noise power spectrum estimated value, and the frequency domain signal is calculated according to the low-frequency noise power spectrum estimated value and the medium-high frequency noise power spectrum estimated value to obtain a gained frequency domain signal; according to the signal characteristics of different frequencies, different noise processing modes are adopted, noise and voice are divided more accurately, the propagation of noise is effectively inhibited, the voice call efficiency is improved, and the effect of enhancing the voice call is achieved.
The foregoing is illustrative of embodiments of the present invention, and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the embodiments of the present invention and are intended to be within the scope of the present invention.

Claims (20)

1. A noise suppression method for processing in different frequency bands is used for suppressing the noise in the conversation sound collected by two microphones, and is characterized by comprising the following steps:
step S1, converting the signal type of the collected sound data from time domain signals into frequency domain signals;
step S2, dividing the frequency domain signal into a low frequency signal and a medium-high frequency signal, dividing the low frequency point into a basic frequency point and a non-basic frequency point by a fundamental tone detection method aiming at the low frequency signal, calculating to obtain a noise power spectrum estimation value of the non-basic frequency point and a noise power spectrum estimation value of the basic frequency point, and combining the noise power spectrum estimation value of the non-basic frequency point and the noise power spectrum estimation value of the basic frequency point to obtain noise power spectrum estimation values of all the low frequency points; aiming at the medium-high frequency signals, calculating a medium-high frequency noise power spectrum estimation value according to the amplitude of the medium-high frequency signals;
step S3, calculating according to the low-frequency noise power spectrum estimated value, the medium-high frequency noise power spectrum estimated value and the frequency domain signal to obtain a frequency domain signal after gain;
in step S4, the frequency domain signal after the gain is converted into a time domain signal after the gain.
2. The method for suppressing noise through frequency division according to claim 1, wherein the step S1 includes:
step S11, acquiring sound data of which the signal type is a time domain signal;
step S12, framing the time domain signal according to a preset time interval to obtain a multi-frame time domain signal;
and step S13, obtaining a frequency domain signal by performing time-frequency transformation on each frame of time domain signal.
3. The method of suppressing noise in a sub-band as claimed in any one of claims 1 or 2, wherein said step S2 includes:
step S21, dividing the frequency domain signal into a low frequency signal and a medium-high frequency signal according to a preset frequency division standard;
step S22, performing noise power spectrum density estimation on the single-path low-frequency signal acquired by the single-path microphone in the frequency domain signal to obtain noise power spectrum estimation values of all low-frequency points in the single-path low-frequency signal;
and step S23, performing noise power spectrum density estimation on the two-way medium-high frequency signals collected by the two-way microphone in the frequency domain signals to obtain noise power spectrum estimation values of all medium-high frequency points in the two-way medium-high frequency signals.
4. The method for suppressing noise according to claim 3, wherein the step S22 comprises:
step S221, the amplitude of each low-frequency point in the single-path low-frequency signal is squared to obtain a square value of the amplitude of each low-frequency point;
step S222, dividing the low-frequency points into fundamental frequency points and non-fundamental frequency points by a fundamental tone detection method;
step S223, calculating to obtain a noise power spectrum estimation value of the non-fundamental frequency point according to the square value of the amplitude of the non-fundamental frequency point;
step S224, calculating to obtain a noise power spectrum estimation value of the base frequency point according to the square value of the amplitude of the base frequency point and the noise power spectrum estimation value of the non-base frequency point;
and step S225, combining the noise power spectrum estimation value of the non-fundamental frequency point with the noise power spectrum estimation value of the fundamental frequency point to obtain the noise power spectrum estimation values of all low-frequency points.
5. The method of claim 4, wherein the step S223 comprises:
step S2231, calculating the voice existence probability value of each low-frequency point according to the square value of the amplitude of each low-frequency point;
step S2232, calculating to obtain a noise power spectrum preliminary estimation value of each non-fundamental frequency point according to the square value of the amplitude of each non-fundamental frequency point;
step S2233, finding out the voice existence probability value of the non-fundamental frequency point corresponding to the non-fundamental frequency point from the voice existence probability values of all the low-frequency points;
and step S2234, calculating to obtain a noise power spectrum estimation value of each non-fundamental frequency point according to the voice existence probability value of each non-fundamental frequency point and the noise power spectrum preliminary estimation value of the corresponding non-fundamental frequency point.
6. The method for suppressing noise according to claim 3, wherein the step S23 comprises:
step S231, squaring the amplitude of each frequency point in the two-way medium-high frequency signal to obtain a square value of the amplitude of each medium-high frequency point;
step S232, calculating to obtain the self-power spectrum value of each middle-high frequency point and the cross-power spectrum value of each middle-high frequency point according to the square value of the amplitude of each middle-high frequency point;
step S233, calculating the correlation value of each middle-high frequency point according to the square value, the self-power spectrum value and the cross-power spectrum value of the amplitude of each middle-high frequency point;
step S234, calculating to obtain a preliminary estimation value of the noise power spectrum of each medium-high frequency point according to the correlation value, the self-power spectrum value and the cross-power spectrum value of each medium-high frequency point;
step S235, calculating the voice existence probability of each medium-high frequency point according to the relevance value of each medium-high frequency point;
step S236, the noise power spectrum estimation value of each middle-high frequency point is obtained by smoothing the preliminary estimation value of the noise power spectrum of each middle-high frequency point and the voice existence probability of the corresponding middle-high frequency point.
7. The method for suppressing noise through frequency division according to claim 1, wherein the step S3 includes:
step S31, combining the low-frequency noise power spectrum estimation value and the medium-high frequency noise power spectrum estimation value to obtain a full-frequency-band noise power spectrum estimation value;
step S32, gain calculation is carried out according to the noise power spectrum estimation value of each frequency point in the full-band noise power spectrum estimation value, and the amplitude spectrum gain value of each frequency point is obtained;
and step S33, calculating the frequency domain signal after the gain according to the amplitude spectrum gain value of each frequency point and the frequency domain signal.
8. The method for suppressing noise through frequency division band processing according to claim 7, wherein said step S33 includes:
step S331, obtaining the amplitude spectrum gain value of each frequency point and the amplitude value of the corresponding frequency point in the frequency domain signal;
step S332, multiplying the amplitude spectrum gain value of each frequency point with the amplitude value of the corresponding frequency point to obtain the amplitude value of each frequency point after the gain, and combining the amplitude values of all frequency points in the frequency domain signal after the gain to obtain the frequency domain signal after the gain.
9. The method for suppressing noise through frequency division multiplexing according to claim 8, wherein the step S4 includes: and carrying out inverse time-frequency transformation on the gained amplitude of each frequency point in the gained frequency domain signal to obtain a gained time domain signal.
10. The method of claim 9, wherein the time-frequency transform is a fourier transform and the inverse time-frequency transform is an inverse fourier transform.
11. The utility model provides a noise suppression system that frequency channel was handled for carry out suppression processing to the noise in the conversation sound that two way microphones gathered, its characterized in that, noise suppression system that frequency channel was handled includes:
the signal transformation module is used for transforming the signal type of the collected sound data from a time domain signal into a frequency domain signal;
the noise power spectrum estimation module is used for dividing the frequency domain signals into low-frequency signals and medium-high frequency signals, dividing the low-frequency points into basic frequency points and non-basic frequency points by a fundamental tone detection method aiming at the low-frequency signals, calculating to obtain a noise power spectrum estimation value of the non-basic frequency points and a noise power spectrum estimation value of the basic frequency points, and combining the noise power spectrum estimation value of the non-basic frequency points and the noise power spectrum estimation value of the basic frequency points to obtain noise power spectrum estimation values of all the low-frequency points; calculating a medium-high frequency noise power spectrum estimation value according to the amplitude value of the medium-high frequency signal;
the gain frequency domain calculation module is used for calculating according to the low-frequency noise power spectrum estimation value, the medium-high frequency noise power spectrum estimation value and the frequency domain signal to obtain a frequency domain signal after gain;
and the signal inverse transformation module is used for transforming the frequency domain signal after the gain into a time domain signal after the gain.
12. The system of claim 11, wherein the signal transformation module comprises:
the signal acquisition module is used for acquiring sound data of which the acquired signal type is a time domain signal;
the signal framing module is used for framing the time domain signals according to a preset time interval to obtain multi-frame time domain signals;
and the time-frequency transformation module is used for performing time-frequency transformation on each frame of time-domain signal to obtain a frequency-domain signal.
13. A system for noise suppression processing in frequency bands according to any one of claims 11 or 12, wherein said noise power spectrum estimation module comprises:
the signal dividing module is used for dividing the frequency domain signal into a low-frequency signal and a medium-high frequency signal according to a preset frequency dividing standard;
the low-frequency power spectrum estimation module is used for carrying out noise power spectrum density estimation on the single-path low-frequency signal acquired by the single-path microphone in the frequency domain signal to obtain noise power spectrum estimation values of all low-frequency points in the single-path low-frequency signal;
and the medium-high frequency power spectrum estimation module is used for estimating the noise power spectrum density of the two-way medium-high frequency signals acquired by the two-way microphone in the frequency domain signals to obtain the noise power spectrum estimation values of all medium-high frequency points in the two-way medium-high frequency signals.
14. The system of claim 13, wherein the low frequency power spectrum estimation module comprises:
the low-frequency amplitude squaring module is used for squaring the amplitude of each low-frequency point in the single-path low-frequency signal to obtain the square value of the amplitude of each low-frequency point;
the pitch detection module is used for dividing the low-frequency points into base frequency points and non-base frequency points through a pitch detection method;
the non-fundamental frequency point power spectrum estimation module is used for calculating a noise power spectrum estimation value of a non-fundamental frequency point according to a square value of the amplitude of the non-fundamental frequency point;
the base frequency point power spectrum estimation module is used for calculating to obtain a noise power spectrum estimation value of the base frequency point according to the square value of the amplitude of the base frequency point and the noise power spectrum estimation value of the non-base frequency point;
and the low-frequency power spectrum combination module is used for combining the noise power spectrum estimation value of the non-fundamental frequency point and the noise power spectrum estimation value of the fundamental frequency point to obtain the noise power spectrum estimation values of all the low-frequency points.
15. The system of claim 14, wherein the non-baseband power spectrum estimation module comprises:
the low-frequency voice probability calculation module is used for calculating the voice existence probability value of each low-frequency point according to the square value of the amplitude of each low-frequency point;
the non-fundamental frequency point power spectrum preliminary estimation module is used for calculating a noise power spectrum preliminary estimation value of each non-fundamental frequency point according to the square value of the amplitude of each non-fundamental frequency point;
the non-fundamental frequency point voice probability searching module is used for searching the voice existence probability value of the non-fundamental frequency point corresponding to the non-fundamental frequency point in the voice existence probability values of all the low-frequency points;
and the non-fundamental frequency point power spectrum calculation module is used for calculating a noise power spectrum estimation value of each non-fundamental frequency point according to the voice existence probability value of each non-fundamental frequency point and the noise power spectrum preliminary estimation value of the corresponding non-fundamental frequency point.
16. The system of claim 13, wherein the medium-high frequency power spectrum estimation module comprises:
the medium-high frequency amplitude squaring module is used for squaring the amplitude of each frequency point in the two paths of medium-high frequency signals to obtain the square value of the amplitude of each medium-high frequency point;
the power spectrum value calculation module is used for calculating the self-power spectrum value of each medium-high frequency point and the cross-power spectrum value of each medium-high frequency point according to the square value of the amplitude of each medium-high frequency point;
the correlation value calculation module is used for calculating the correlation value of each medium-high frequency point according to the square value, the self-power spectrum value and the cross-power spectrum value of the amplitude of each medium-high frequency point;
the medium-high frequency preliminary estimation module is used for calculating to obtain a preliminary estimation value of the noise power spectrum of each medium-high frequency point according to the correlation value, the self-power spectrum value and the cross-power spectrum value of each medium-high frequency point;
the middle-high frequency voice probability calculation module is used for calculating the voice existence probability of each middle-high frequency point according to the correlation value of each middle-high frequency point;
and the medium-high frequency smoothing processing module is used for smoothing the preliminary estimation value of the noise power spectrum of each medium-high frequency point and the voice existence probability of the corresponding medium-high frequency point to obtain the noise power spectrum estimation value of each medium-high frequency point.
17. The system of claim 11, wherein the gain-frequency-domain calculating module comprises:
the full-band power spectrum combination module is used for combining the low-frequency noise power spectrum estimation value and the medium-high frequency noise power spectrum estimation value to obtain a full-band noise power spectrum estimation value;
the amplitude spectrum gain calculation module is used for performing gain calculation according to the noise power spectrum estimation value of each frequency point in the full-band noise power spectrum estimation value to obtain an amplitude spectrum gain value of each frequency point;
and the gain signal calculation module is used for calculating the frequency domain signal after the gain according to the magnitude spectrum gain value of each frequency point and the frequency domain signal.
18. The system of claim 17, wherein the gain signal calculation module comprises:
the amplitude spectrum gain value acquisition module is used for acquiring the amplitude spectrum gain value of each frequency point and the amplitude value of the corresponding frequency point in the frequency domain signal;
and the gain amplitude value calculation module is used for multiplying the amplitude spectrum gain value of each frequency point with the amplitude value of the corresponding frequency point to obtain the amplitude value of each frequency point after the gain, and combining the amplitude values of all frequency points in the frequency domain signal after the gain to obtain the frequency domain signal after the gain.
19. The system of claim 18, wherein the signal inverse transform module is further configured to perform inverse time-frequency transform on the gained amplitudes of each frequency point in the gained frequency-domain signals to obtain the gained time-domain signals.
20. The system of claim 19, wherein the time-frequency transform is a fourier transform and the inverse time-frequency transform is an inverse fourier transform.
CN201911278646.5A 2019-12-10 2019-12-10 Noise suppression method and system for processing in different frequency bands Active CN111128213B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911278646.5A CN111128213B (en) 2019-12-10 2019-12-10 Noise suppression method and system for processing in different frequency bands
PCT/CN2020/111672 WO2021114733A1 (en) 2019-12-10 2020-08-27 Noise suppression method for processing at different frequency bands, and system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911278646.5A CN111128213B (en) 2019-12-10 2019-12-10 Noise suppression method and system for processing in different frequency bands

Publications (2)

Publication Number Publication Date
CN111128213A CN111128213A (en) 2020-05-08
CN111128213B true CN111128213B (en) 2022-09-27

Family

ID=70498955

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911278646.5A Active CN111128213B (en) 2019-12-10 2019-12-10 Noise suppression method and system for processing in different frequency bands

Country Status (2)

Country Link
CN (1) CN111128213B (en)
WO (1) WO2021114733A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111128213B (en) * 2019-12-10 2022-09-27 展讯通信(上海)有限公司 Noise suppression method and system for processing in different frequency bands
CN112309419B (en) * 2020-10-30 2023-05-02 浙江蓝鸽科技有限公司 Noise reduction and output method and system for multipath audio
CN113516988B (en) * 2020-12-30 2024-02-23 腾讯科技(深圳)有限公司 Audio processing method and device, intelligent equipment and storage medium
CN113571078B (en) * 2021-01-29 2024-04-26 腾讯科技(深圳)有限公司 Noise suppression method, device, medium and electronic equipment
CN112951262B (en) * 2021-02-24 2023-03-10 北京小米松果电子有限公司 Audio recording method and device, electronic equipment and storage medium
CN112700787B (en) * 2021-03-24 2021-06-25 深圳市中科蓝讯科技股份有限公司 Noise reduction method, nonvolatile readable storage medium and electronic device
CN113393857A (en) * 2021-06-10 2021-09-14 腾讯音乐娱乐科技(深圳)有限公司 Method, device and medium for eliminating human voice of music signal
CN113778226A (en) * 2021-08-26 2021-12-10 江西恒必达实业有限公司 Infrared AI intelligent glasses based on speech recognition technology control intelligence house
CN113613112B (en) * 2021-09-23 2024-03-29 三星半导体(中国)研究开发有限公司 Method for suppressing wind noise of microphone and electronic device
CN113851151A (en) * 2021-10-26 2021-12-28 北京融讯科创技术有限公司 Masking threshold estimation method, device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101582264A (en) * 2009-06-12 2009-11-18 瑞声声学科技(深圳)有限公司 Method and voice collecting system for speech enhancement
CN104103278A (en) * 2013-04-02 2014-10-15 北京千橡网景科技发展有限公司 Real time voice denoising method and device
CN104867499A (en) * 2014-12-26 2015-08-26 深圳市微纳集成电路与系统应用研究院 Frequency-band-divided wiener filtering and de-noising method used for hearing aid and system thereof
CN105280193A (en) * 2015-07-20 2016-01-27 广东顺德中山大学卡内基梅隆大学国际联合研究院 Prior signal-to-noise ratio estimating method based on MMSE error criterion
CN108986832A (en) * 2018-07-12 2018-12-11 北京大学深圳研究生院 Ears speech dereverberation method and device based on voice probability of occurrence and consistency

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19957221A1 (en) * 1999-11-27 2001-05-31 Alcatel Sa Exponential echo and noise reduction during pauses in speech
US8433564B2 (en) * 2009-07-02 2013-04-30 Alon Konchitsky Method for wind noise reduction
JP5634959B2 (en) * 2011-08-08 2014-12-03 日本電信電話株式会社 Noise / dereverberation apparatus, method and program thereof
CN103871421B (en) * 2014-03-21 2018-02-02 厦门莱亚特医疗器械有限公司 A kind of self-adaptation noise reduction method and system based on subband noise analysis
CN106297817B (en) * 2015-06-09 2019-07-09 中国科学院声学研究所 A kind of sound enhancement method based on binaural information
CN108831500B (en) * 2018-05-29 2023-04-28 平安科技(深圳)有限公司 Speech enhancement method, device, computer equipment and storage medium
CN108877826A (en) * 2018-08-29 2018-11-23 昆明理工大学 A kind of voice noise reducing method based on more windows spectrum
CN111128213B (en) * 2019-12-10 2022-09-27 展讯通信(上海)有限公司 Noise suppression method and system for processing in different frequency bands

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101582264A (en) * 2009-06-12 2009-11-18 瑞声声学科技(深圳)有限公司 Method and voice collecting system for speech enhancement
CN104103278A (en) * 2013-04-02 2014-10-15 北京千橡网景科技发展有限公司 Real time voice denoising method and device
CN104867499A (en) * 2014-12-26 2015-08-26 深圳市微纳集成电路与系统应用研究院 Frequency-band-divided wiener filtering and de-noising method used for hearing aid and system thereof
CN105280193A (en) * 2015-07-20 2016-01-27 广东顺德中山大学卡内基梅隆大学国际联合研究院 Prior signal-to-noise ratio estimating method based on MMSE error criterion
CN108986832A (en) * 2018-07-12 2018-12-11 北京大学深圳研究生院 Ears speech dereverberation method and device based on voice probability of occurrence and consistency

Also Published As

Publication number Publication date
WO2021114733A1 (en) 2021-06-17
CN111128213A (en) 2020-05-08

Similar Documents

Publication Publication Date Title
CN111128213B (en) Noise suppression method and system for processing in different frequency bands
US7756700B2 (en) Perceptual harmonic cepstral coefficients as the front-end for speech recognition
KR101266894B1 (en) Apparatus and method for processing an audio signal for speech emhancement using a feature extraxtion
Shrawankar et al. Techniques for feature extraction in speech recognition system: A comparative study
CN108831499A (en) Utilize the sound enhancement method of voice existing probability
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
US20210193149A1 (en) Method, apparatus and device for voiceprint recognition, and medium
US20060253285A1 (en) Method and apparatus using spectral addition for speaker recognition
US20090177468A1 (en) Speech recognition with non-linear noise reduction on mel-frequency ceptra
CN108305639B (en) Speech emotion recognition method, computer-readable storage medium and terminal
US8566084B2 (en) Speech processing based on time series of maximum values of cross-power spectrum phase between two consecutive speech frames
CN108682432B (en) Speech emotion recognition device
CN105679312A (en) Phonetic feature processing method of voiceprint identification in noise environment
CN112116909A (en) Voice recognition method, device and system
US20150162014A1 (en) Systems and methods for enhancing an audio signal
Sun et al. An adaptive speech endpoint detection method in low SNR environments
Flynn et al. Combined speech enhancement and auditory modelling for robust distributed speech recognition
CN103971697B (en) Sound enhancement method based on non-local mean filtering
CN110875037A (en) Voice data processing method and device and electronic equipment
CN112233693B (en) Sound quality evaluation method, device and equipment
CN113593604A (en) Method, device and storage medium for detecting audio quality
Shu-Guang et al. Isolated word recognition in reverberant environments
Mallidi et al. Robust speaker recognition using spectro-temporal autoregressive models.
CN112201261A (en) Frequency band expansion method and device based on linear filtering and conference terminal system
Zhang et al. Fundamental frequency estimation combining air-conducted speech with bone-conducted speech in noisy environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant