CN114724574A - Double-microphone noise reduction method with adjustable expected sound source direction - Google Patents
Double-microphone noise reduction method with adjustable expected sound source direction Download PDFInfo
- Publication number
- CN114724574A CN114724574A CN202210157383.8A CN202210157383A CN114724574A CN 114724574 A CN114724574 A CN 114724574A CN 202210157383 A CN202210157383 A CN 202210157383A CN 114724574 A CN114724574 A CN 114724574A
- Authority
- CN
- China
- Prior art keywords
- noise
- omega
- signal
- calculating
- channel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 230000009467 reduction Effects 0.000 title abstract description 11
- 230000008569 process Effects 0.000 claims abstract description 23
- 230000000873 masking effect Effects 0.000 claims abstract description 13
- 238000005070 sampling Methods 0.000 claims abstract description 12
- 238000001914 filtration Methods 0.000 claims abstract description 10
- 238000010606 normalization Methods 0.000 claims abstract description 6
- 238000009432 framing Methods 0.000 claims abstract description 5
- 230000009977 dual effect Effects 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 37
- 238000001228 spectrum Methods 0.000 claims description 27
- 238000013507 mapping Methods 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 238000013441 quality evaluation Methods 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 abstract description 4
- 230000009466 transformation Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 8
- 238000009499 grossing Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000002860 competitive effect Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02165—Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a double-microphone noise reduction method with adjustable expected sound source direction, which comprises the following steps: preprocessing the noisy signal x received by the dual microphones1(t) and x2(t) discrete sampling, pre-emphasis, framing and windowing, and performing short-time Fourier transform to obtain frequency domain signal X1(omega) and X2(ω); a beam forming process, introducing a virtual microphone at the middle point of the connection line of the two microphones, and performing frequency domain signal X according to a central difference format1(omega) and X2(omega) performing a differential transformation to construct a differential signal Y1(omega) and Y2(ω). Calculating a difference signal Y1(omega) and Y2(ω) and the ratio of the statistical averages is recorded as a directivity function Γ (ω, θ), the properties of the directivity function Γ (ω, θ) are analyzed,it is directly mapped to the noise masking value λ (ω) by the normalization function. Mixing X1Multiplying (omega) by lambda (omega) to obtain a signal R with the competing direction noise eliminated1(ω); post-wiener filtering process, for R1Estimating signal energy and noise energy in (omega) to obtain channel signal-to-noise ratio and calculating gain function to further eliminate R1Residual noise in (ω).
Description
Technical Field
The invention relates to the technical field of voice signal noise reduction, in particular to a double-microphone noise reduction method with adjustable expected sound source direction.
Background
Portable devices such as bluetooth headsets have become good tools for improving efficiency in daily life, but when users make or receive calls with the portable devices, if the users are interfered by background noise, voice in a non-target direction, and the like, the call quality is rapidly reduced. In this case, it is desirable to keep the speech close to the speaking direction of the user and suppress the background noise and the speech in the non-target direction as much as possible while ensuring no distortion of the speech.
Existing generalized side lobe cancellers (GSCs) and delay beamformers use multiple microphone recorded signals for spatial filtering. For portable devices such as bluetooth headsets, GSCs are too complex to be able to handle the capabilities of micro devices. Delay beamforming techniques such as the first-order differential microphone (FDM) and adaptive null-forming (ANF) require only two microphones, which are suitable settings for size limitation and real-time processing. However, this fixed beamformer has a maximum gain at 0 ° and a null at 180 °, and cannot eliminate noise in directions other than the null. Algorithms based on coherence functions between input signals discuss the nature of the real and imaginary parts of the coherence function to produce different means of masking noise. The coherent function based approach does not rely on noise statistics, but the target direction is not adjustable. A competing directional noise cancellation method for hearing aids combines spectral estimation and array beamforming to suppress noise. The directivity coefficients are estimated in the pure noise interval and updated to adapt to the mobile noise. Similarly, this method can set the desired direction only to a limited range around 0 °. Since the position of the sound source is sometimes not constant, it is important to design a noise reduction algorithm with adjustable sound source direction in practical application.
In order to solve the problems that the sound in a non-target direction cannot be accurately eliminated when the beam forming is directly applied to a close-range double-microphone system, the direction of an expected sound source cannot be set according to requirements and the like, a two-step denoising method based on the beam forming and the wiener filtering is provided. The test result shows that: under the condition of low signal-to-noise ratio and coexistence of multiple types of noise sources, the method can effectively recover the energy distribution characteristics of the original signal, reduce background noise and non-target direction voice, and obviously improve the signal-to-noise ratio.
Disclosure of Invention
According to the problems in the prior art, the invention discloses a double-microphone noise reduction method with adjustable expected sound source direction, which specifically comprises the following steps:
the pretreatment process comprises the following steps: for noisy signal x received by dual microphones1(t) and x2(t) discrete sampling, pre-emphasis, framing and windowing, and performing short-time Fourier transform to obtain frequency domain signal X1(omega) and X2(ω);
And (3) beam forming process: introducing a virtual microphone at the midpoint of the two-microphone line, and applying the frequency domain signal X according to the central difference format1(omega) and X2(omega) performing differential conversion to construct a differential signal Y1(omega) and Y2(ω) calculating a difference signal Y1(omega) and Y2(ω) a statistical average of the power spectrum of the frequency domain signal X, recording a ratio of the statistical average as a directivity function Γ (ω, θ), analyzing a property of the directivity function Γ (ω, θ), directly mapping it as a noise masking value λ (ω) by a normalization function, and converting the frequency domain signal X into a frequency domain signal X1Multiplying (omega) by the noise masking value lambda (omega) to obtain a signal R with the competing direction noise eliminated1(ω);
Post wiener filtering process: to R1Estimating signal energy and noise energy in (omega) to obtain channel signal-to-noise ratio and calculating gain function, thereby eliminating signal R1Residual noise in (ω).
Further, the pretreatment process comprises the following steps:
will carry the noise signal x1(t) and x2(t) discrete sampling is carried out, and then pre-emphasis processing is carried out on the high-frequency part of the voice;
sampling signal x1(n) and x2(n) dividing the signals into frames with the length of 10ms, adding equal-length Hamming windows w (n), introducing the windowed signals into a buffer area for processing, obtaining the frequency domain signals of the current frame through short-time Fourier transform, and outputting the signals of the first 1/2 frequency points for beam forming processing according to the conjugate symmetry of real number sequence Fourier transform.
Further, the beamforming process includes amplitude alignment, power spectrum calculation, directivity function value calculation, threshold calculation, and normalized mapping;
the amplitude alignment mode is to the frequency domain signal X1(omega) and X2(ω) multiplying by a scaling factor respectively for amplitude alignment;
when calculating the power spectrum: assuming that the desired beam is S (omega) and the direction thereof is preset to be alpha, introducing a virtual microphone at the midpoint of the two-way microphone to receive the desired beam S (omega), and according to a central difference format and a frequency domain signal X1(ω)、X2The spatial relationship of (ω) to the desired beam S (ω) constructs a differential signal Y1(omega) and Y2(ω) calculating a difference signal Y1(omega) and Y2(ω) a power spectrum;
when calculating the directional function value: wherein the differential signal Y1(omega) and Y2(ω) the ratio of the statistical average of the power spectra is the value of the directivity function Γ (ω, θ), which tends to infinity when the actual sound source incidence direction θ is equal to the given desired sound source incidence direction α, and which functions monotonically and approximately symmetrically on both sides of the α -axis, discussing the nature of Γ (ω, θ);
when calculating the threshold and normalizing the mapping: as gamma (omega, theta) tends to be infinite, a threshold omega is calculated according to a preset main lobe width theta and passes through a sigmoid functionNormalized mapping, directly mapping gamma (omega, theta) to noise masking value lambda (omega) of corresponding frequency point, and mapping X1Multiplying (omega) by lambda (omega) to obtain a signal R with the competing direction noise eliminated1(ω)。
Further, the post-wiener filtering process comprises the steps of calculating a signal-to-noise ratio index, calculating a logarithmic spectrum deviation, modifying or resetting a noise flag and calculating a gain function value;
when calculating the signal-to-noise ratio index, the signal R is calculated1(omega) dividing the channel into a plurality of channels according to a critical bandwidth criterion, estimating the energy of each channel, initializing the channel noise energy estimation into the channel energy of the first four frames, and calculating a channel signal-to-noise ratio index according to the channel noise energy estimation;
when calculating the logarithmic spectrum deviation, designing a nonlinear data table as a voice index table, mapping the signal-to-noise ratio index into a group of numbers for measuring the voice quality, taking the sum of the voice indexes in a certain frequency range as the voice quality evaluation result of the current channel, taking the logarithm of the signal energy of the current channel, and calculating the deviation of the long-time logarithmic spectrum energy and the short-time logarithmic spectrum energy;
modifying or resetting the noise mark, judging whether the current frame is a voice frame or a noise frame according to the calculated voice index sum, the signal-to-noise ratio index and the log spectrum deviation parameter information, resetting the noise updating mark, checking the updating marks of the previous frames, and if the noise cannot be updated for a long time and the result is unreliable, forcibly updating the signal-to-noise ratio index;
when the gain function value is calculated, the channel signal-to-noise ratio index is used for calculating the channel gain value to remove residual background noise, and the noise energy estimation of the next frame is updated according to the result of the noise updating mark.
Due to the adoption of the technical scheme, the method for reducing the noise of the double microphones with the adjustable expected sound source direction, provided by the invention, comprises the steps of firstly calculating the ratio of the constructed statistical average value of the power spectrum of the differential signal as the value of a directional function after preprocessing signals of the double microphones, and then obtaining the masking value of noise through the mapping of a normalization function. Meanwhile, a wiener filter is installed in the next step, and residual noise is reduced by estimating a signal-to-noise ratio index and calculating a gain function; the algorithm provided by the invention is simple and efficient, and the signal-to-noise ratio and the quality of the voice interfered by the non-target sound in different noise scenes are obviously improved after the voice is enhanced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic overall view of the process of the present invention;
FIG. 2 is a schematic diagram of the pretreatment process of the present invention;
FIG. 3 is a schematic diagram of a beamforming process in the present invention;
FIG. 4 is a schematic view of the sound source propagation in the present invention;
FIG. 5 is a diagram illustrating directional functions in accordance with the present invention;
FIG. 6 is a diagram illustrating a post-wiener filtering process in accordance with the present invention;
FIG. 7 is a diagram showing the PESQ comparison results of the present invention with other noise reduction methods for a single noise source with different SNR;
FIG. 8 is a diagram showing the PESQ comparison result between the present invention and other noise reduction methods when multiple noise sources have different signal-to-noise ratios;
FIG. 9 is a graph of SegSNR comparison results for a single noise source with different SNR for the present invention and other noise reduction methods;
FIG. 10 is a graph of SegSNR comparison results for multiple noise sources with different SNR;
fig. 11 is a diagram illustrating the results of the present invention when the expected sound source directions are different.
Detailed Description
In order to make the technical solutions and advantages of the present invention clearer, the following describes the technical solutions in the embodiments of the present invention clearly and completely with reference to the drawings in the embodiments of the present invention:
fig. 1 shows a method for reducing noise with two microphones, where the direction of a desired sound source is adjustable, and in implementation, the method includes: a preprocessing process, a beam forming process and a post-wiener filtering process. The method disclosed by the invention comprises the following specific steps:
s1: preprocessing the noisy signal x received by the two microphones, as shown in FIG. 21(t) and x2(t) after discrete sampling, pre-emphasis, framing and windowing, obtaining a frequency domain signal X by short-time Fourier transform1(omega) and X2(ω), specifically in the following manner:
s11: continuous signal x with noise1(t) and x2(t) discrete sampling is carried out first, and the sampling frequency is 16 kHz. Pre-emphasis of the speech pitch part is achieved by a first order FIR high-pass digital filter, where EMP _ FAC is the pre-emphasis coefficient. Setting the sampling value of the voice signal at the time n as x (n), and the result after pre-emphasis processing is as follows:
z(n)=x(n)-EMP_FAC*x(n-1) (1)
taking EMP _ FAC as 0.8, after the noise reduction process, synthesizing a time domain signal by using short-time fourier transform, and performing de-emphasis operation on the time domain signal to restore a high-frequency part.
S12: sampling signal x1(n) and x2(n) framing, the frame length is 10ms, hamming windows with equal length are added to the framed signals, the window function formula is as formula (2), the windowed signals are introduced into a buffer area, and the length of the buffer area is 5 times of the number of FFT points.
S13: and obtaining a current frame frequency domain signal through fast Fourier transform, and outputting signals of the first 1/2 frequency points for subsequent algorithm processing according to the conjugate symmetry of real number sequence fast Fourier transform.
S2: the beam forming process is as shown in fig. 3, a virtual microphone is introduced at the midpoint of the connection line of the two microphones, and the obtained frequency domain signal X is1(omega) and X2(omega) carrying out differential transformation according to the central differential format to construct a differential signal Y1(omega) and Y2(ω), calculating Y1(omega) and Y2(ω) and the ratio of the statistical averages is taken as the directivity function Γ (ω, θ). Analyzing the property of the gamma (omega, theta), and directly mapping the property of the gamma (omega, theta) into a noise masking value through a normalization function, wherein the following method is adopted:
s21: the amplitudes are aligned and although the sound field is assumed to be far-field, the received signals of the two microphones have slight differences in amplitude. In order to further conform to the hypothesis, two paths of frequency domain signals X are firstly processed1(omega) and X2And (ω) multiplying the respective scaling factors to perform amplitude alignment.
S22: assuming that the desired beam is S (ω) and its direction is preset to α, a virtual microphone is introduced at the midpoint of the two-way microphone to receive the signal, and the sound source propagation diagram is as shown in fig. 4. X1(ω)、X2The spatial relationship between (ω) and S (ω) is:
wherein d is the microphone spacing, v is the sound velocity, and theta is the actual sound source incidence direction; according to the center difference format and X1(ω)、X2The spatial relationship between (omega) and S (omega) to construct a differential signal Y1(omega) and Y2(ω), and calculate Y1(omega) and Y2(ω) power spectrum.
S23:Y1(omega) and Y2The ratio of the statistical average of the (ω) power spectra is the value of the directional function Γ (ω, θ).
Image of Γ (ω, θ) as in fig. 5, it was found that Γ (ω, θ) tends to infinity when the actual incident direction θ of the sound source is equal to the given desired incident direction α of the sound source; and the function value is monotonous and approximately symmetrical on two sides of the axis theta-alpha;
s24: since the infinity of Γ (ω, θ) tends to be not reached in actual calculation when θ is α, it is necessary to calculate a threshold value Ω based on a preset main lobe width Θ.
The sigmoid function is an S-shaped function with a value range of (0,1), and the gamma (omega, theta) can be directly mapped to the noise masking value lambda (omega) of the corresponding frequency point in a normalized mode through certain deformation.
S25: the sound after masking the noise of the competing direction is:
R1(ω)=λ(ω)X1(ω) (8)
s3: post wiener filtering process as shown in FIG. 6, by pair R1Estimating signal energy and noise energy in (omega) to obtain channel signal-to-noise ratio and calculate gain function, further eliminating R1The residual noise in (ω) is specifically as follows:
s31: r is to be1(ω) are divided into NUM _ CHAN channels according to a critical bandwidth criterion. Because the voice energy is mainly concentrated at 0.3-3.4 kHz and at low frequencyThe corresponding channel is narrower; at high frequencies, the corresponding channel is wider. And estimating the energy of each channel, wherein beta is a smoothing factor, M is the number of the frequency points in the current channel, M represents the label of the channel of the current frame, i is the label of the current frame, and k is the label of the frequency points in the current channel.
S32: initializing the channel noise estimate to the channel energy of the first four frames, the signal-to-noise ratio can be calculated by (10):
s33: a nonlinear data table is designed as a voice index table, and the signal-to-noise ratio index (quantized signal-to-noise ratio value) is mapped to a group of numbers for measuring voice quality. And when the signal-to-noise ratio is high, the voice quality is considered to be high, and the sum of the voice indexes in the frequency range of 0.3-3.4 kHz is calculated.
S34: the total noise energy estimate (tne) and the total energy estimate (tce) for the first HI _ CHAN channels are calculated, i.e., the sum of the noise energy and the sum of the channel energy over a frequency range of 0.3-3.4 kHz.
S35: and calculating a log spectrum of the current channel energy, and recording the deviation of the long-time log spectrum energy and the short-time log spectrum energy as ch _ enrg _ dev.
ch_enrg_db(i,m)=10lg(ch_enrg(i,m)) (13)
S36: calculating a long-term integration constant alpha, which is a function of the total channel energy (tce), i.e., high tce (-40dB), slow smoothing (alpha 0.99); low tce (-60dB), fast smoothing (alpha ═ 0.50).
S37: and calculating and updating long-term log spectral energy.
S38: and resetting a noise updating flag Update _ flag through comparison according to the calculated parameters such as the voice index sum, the signal-to-noise ratio, the logarithmic spectrum deviation and the like. "Update _ flag" indicates that the current frame is a noise frame, and "Update _ flag" indicates that the current frame is a speech frame. And then, the noise updating marks of the previous frames need to be checked, if the noise cannot be updated for a long time, the current result is considered to be unreliable, and the signal-to-noise ratio index needs to be updated forcibly.
S39: and calculating a channel gain ftmp2 by using the obtained channel signal-to-noise ratio index.
If the noise Update flag Update _ flag is TRUE, the current frame is determined as a noise frame, and the energy estimation of the noise needs to be updated at this time.
To verify the effectiveness of the present invention, several tests were performed. It should be noted that, in order to verify that the method is applicable to various types of sounds, the voice data used for evaluation is derived from the TIMIT database, and the noise includes Babble noise and competitive directional voice. The experimental results in the invention are the results obtained by processing 10 sections of voice data averagely.
The present invention is compared with both the Coherence and SNR-Coherence methods, and α is first set to 0 °. Fig. 7 and 8 show PESQ scores of different methods after adding various noises (including competitive speech and Babble noise). It is clear that the present invention is superior to the Coherence method, and comparable to SNR-Coherence. In general, the PESQ results of the present invention are at least 0.5 higher than the unprocessed signal and the effect is maintained under multiple noise source conditions.
Fig. 9 and 10 show that the SegSNR value of the present invention is improved by at least 5dB over the unprocessed value at low signal-to-noise ratios (-5dB and 0dB) when the interference is non-target speech and Babble noise. The SegSNR result of the present invention is both higher than the SNR-Coherence method and almost equal to the Coherence method. Furthermore, the invention maintains optimal results in the presence of multiple noise sources.
Meanwhile, the evaluation results when the desired direction is set at other angles are shown in fig. 11. It can be seen that the invention still maintains a good noise suppression capability compared to the sound before processing.
The comparative tests show the good noise reduction performance and good working stability of the invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.
Claims (4)
1. A method for reducing noise of a dual microphone with adjustable expected sound source direction is characterized by comprising the following steps:
the pretreatment process comprises the following steps: for noisy signal x received by dual microphones1(t) and x2(t) discrete sampling, pre-emphasis, framing and windowing, and performing short-time Fourier transform to obtain frequency domain signal X1(omega) and X2(ω);
And (3) beam forming process: introducing a virtual microphone at the midpoint of the two-microphone line, and applying the frequency domain signal X according to the central difference format1(omega) and X2(omega) performing differential conversion to construct a differential signal Y1(omega) and Y2(ω) calculating a difference signal Y1(omega) and Y2(ω) a statistical average of the power spectrum, and the ratio of the statistical average is recorded as a directivity function Γ (ω, θ), the properties of the directivity function Γ (ω, θ) are analyzed, it is directly mapped to a noise masking value λ (ω) by a normalization function, and the frequency domain signal X is converted into a frequency domain signal X1Multiplying (omega) by the noise masking value lambda (omega) to obtain a signal R with the competing direction noise eliminated1(ω);
Post wiener filtering process: to R1Estimating signal energy and noise energy in (omega) to obtain channel signal-to-noise ratio and calculating gain function, thereby eliminating signal R1Residual noise in (ω).
2. The method of claim 1, wherein: the pretreatment process comprises the following steps:
will carry the noise signal x1(t) and x2(t) discrete sampling is carried out, and then pre-emphasis processing is carried out on the high-frequency part of the voice;
sampling signal x1(n) and x2(n) dividing the signals into frames with the length of 10ms, adding equal-length Hamming windows w (n), introducing the windowed signals into a buffer area for processing, obtaining the frequency domain signals of the current frame through short-time Fourier transform, and outputting the signals of the first 1/2 frequency points for beam forming processing according to the conjugate symmetry of real number sequence Fourier transform.
3. The method of claim 1, wherein: the beam forming process comprises amplitude alignment, power spectrum calculation, directivity function value calculation, threshold calculation and normalization mapping;
the amplitude alignment mode is to the frequency domain signal X1(omega) and X2(ω) multiplying by a scaling factor respectively for amplitude alignment;
when calculating the power spectrum: assuming that the desired beam is S (co),the direction of the two-way microphone is preset to be alpha, a virtual microphone is introduced at the midpoint of the two-way microphone to receive a desired beam S (omega), and a central difference format and a frequency domain signal X are used1(ω)、X2The spatial relationship of (ω) to the desired beam S (ω) constructs a differential signal Y1(omega) and Y2(ω) calculating a difference signal Y1(omega) and Y2(ω) a power spectrum;
when calculating the directional function value: wherein the differential signal Y1(omega) and Y2(ω) the ratio of the statistical average of the power spectra is the value of the directivity function Γ (ω, θ), which tends to infinity when the actual sound source incidence direction θ is equal to the given desired sound source incidence direction α, and which functions monotonically and approximately symmetrically on both sides of the α -axis, discussing the nature of Γ (ω, θ);
when calculating the threshold and normalizing the mapping: as gamma (omega, theta) tends to be infinite, a threshold omega is calculated according to a preset main lobe width theta, the gamma (omega, theta) is directly mapped into a noise masking value lambda (omega) of a corresponding frequency point through normalized mapping of a sigmoid function, and X is used for mapping X to the noise masking value lambda (omega) of the corresponding frequency point1Multiplying (omega) by lambda (omega) to obtain a signal R with the competing direction noise eliminated1(ω)
4. The method of claim 1, wherein: the post-wiener filtering process comprises the steps of calculating a signal-to-noise ratio index, calculating a logarithmic spectrum deviation, modifying or resetting a noise mark and calculating a gain function value;
when calculating the SNR index, it willSignal R1(omega) dividing the channel into a plurality of channels according to a critical bandwidth criterion, estimating the energy of each channel, initializing the channel noise energy estimation into the channel energy of the first four frames, and calculating a channel signal-to-noise ratio index according to the channel noise energy estimation;
when calculating the logarithmic spectrum deviation, designing a nonlinear data table as a voice index table, mapping the signal-to-noise ratio index into a group of numbers for measuring the voice quality, taking the sum of the voice indexes in a certain frequency range as the voice quality evaluation result of the current channel, taking the logarithm of the signal energy of the current channel, and calculating the deviation of the long-time logarithmic spectrum energy and the short-time logarithmic spectrum energy;
modifying or resetting the noise mark, judging whether the current frame is a voice frame or a noise frame according to the calculated voice index sum, the signal-to-noise ratio index and the log spectrum deviation parameter information, resetting the noise updating mark, checking the updating marks of the previous frames, and if the noise cannot be updated for a long time and the result is unreliable, forcibly updating the signal-to-noise ratio index;
when the gain function value is calculated, the channel signal-to-noise ratio index is used for calculating the channel gain value to remove residual background noise, and the noise energy estimation of the next frame is updated according to the result of the noise updating mark.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210157383.8A CN114724574B (en) | 2022-02-21 | 2022-02-21 | Dual-microphone noise reduction method with adjustable expected sound source direction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210157383.8A CN114724574B (en) | 2022-02-21 | 2022-02-21 | Dual-microphone noise reduction method with adjustable expected sound source direction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114724574A true CN114724574A (en) | 2022-07-08 |
CN114724574B CN114724574B (en) | 2024-07-05 |
Family
ID=82235970
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210157383.8A Active CN114724574B (en) | 2022-02-21 | 2022-02-21 | Dual-microphone noise reduction method with adjustable expected sound source direction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114724574B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115497500A (en) * | 2022-11-14 | 2022-12-20 | 北京探境科技有限公司 | Audio processing method and device, storage medium and intelligent glasses |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1809105A (en) * | 2006-01-13 | 2006-07-26 | 北京中星微电子有限公司 | Dual-microphone speech enhancement method and system applicable to mini-type mobile communication devices |
US20080019548A1 (en) * | 2006-01-30 | 2008-01-24 | Audience, Inc. | System and method for utilizing omni-directional microphones for speech enhancement |
US20090055170A1 (en) * | 2005-08-11 | 2009-02-26 | Katsumasa Nagahama | Sound Source Separation Device, Speech Recognition Device, Mobile Telephone, Sound Source Separation Method, and Program |
US20100246851A1 (en) * | 2009-03-30 | 2010-09-30 | Nuance Communications, Inc. | Method for Determining a Noise Reference Signal for Noise Compensation and/or Noise Reduction |
CN101916567A (en) * | 2009-11-23 | 2010-12-15 | 瑞声声学科技(深圳)有限公司 | Speech enhancement method applied to dual-microphone system |
CN102347027A (en) * | 2011-07-07 | 2012-02-08 | 瑞声声学科技(深圳)有限公司 | Double-microphone speech enhancer and speech enhancement method thereof |
US20120140947A1 (en) * | 2010-12-01 | 2012-06-07 | Samsung Electronics Co., Ltd | Apparatus and method to localize multiple sound sources |
US20120278070A1 (en) * | 2011-04-26 | 2012-11-01 | Parrot | Combined microphone and earphone audio headset having means for denoising a near speech signal, in particular for a " hands-free" telephony system |
CN111063366A (en) * | 2019-12-26 | 2020-04-24 | 紫光展锐(重庆)科技有限公司 | Method and device for reducing noise, electronic equipment and readable storage medium |
-
2022
- 2022-02-21 CN CN202210157383.8A patent/CN114724574B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090055170A1 (en) * | 2005-08-11 | 2009-02-26 | Katsumasa Nagahama | Sound Source Separation Device, Speech Recognition Device, Mobile Telephone, Sound Source Separation Method, and Program |
CN1809105A (en) * | 2006-01-13 | 2006-07-26 | 北京中星微电子有限公司 | Dual-microphone speech enhancement method and system applicable to mini-type mobile communication devices |
US20080019548A1 (en) * | 2006-01-30 | 2008-01-24 | Audience, Inc. | System and method for utilizing omni-directional microphones for speech enhancement |
US20100246851A1 (en) * | 2009-03-30 | 2010-09-30 | Nuance Communications, Inc. | Method for Determining a Noise Reference Signal for Noise Compensation and/or Noise Reduction |
CN101916567A (en) * | 2009-11-23 | 2010-12-15 | 瑞声声学科技(深圳)有限公司 | Speech enhancement method applied to dual-microphone system |
US20120140947A1 (en) * | 2010-12-01 | 2012-06-07 | Samsung Electronics Co., Ltd | Apparatus and method to localize multiple sound sources |
US20120278070A1 (en) * | 2011-04-26 | 2012-11-01 | Parrot | Combined microphone and earphone audio headset having means for denoising a near speech signal, in particular for a " hands-free" telephony system |
CN102347027A (en) * | 2011-07-07 | 2012-02-08 | 瑞声声学科技(深圳)有限公司 | Double-microphone speech enhancer and speech enhancement method thereof |
CN111063366A (en) * | 2019-12-26 | 2020-04-24 | 紫光展锐(重庆)科技有限公司 | Method and device for reducing noise, electronic equipment and readable storage medium |
Non-Patent Citations (4)
Title |
---|
HUANG, GONGPING ET, AL.: "《ROBUST AND STEERABLE KRONECKER PRODUCT DIFFERENTIAL BEAMFORMING WITH RECTANGULAR MICROPHONE ARRAYS》", 《IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP)》, 2 March 2021 (2021-03-02), pages 211 - 215 * |
ZHAO QINGYING ET.AL.: "《Directional Noise Suppression Based on Dual-Microphone With Desired Direction Presetting》", 《IEEE SENSORS JOURNAL》, vol. 24, no. 6, 15 March 2024 (2024-03-15), pages 8427 - 8437 * |
徐娜;吴长奇;: "结合差分阵列与幅度谱减的双麦语音增强算法", 信号处理, no. 07, 25 July 2018 (2018-07-25), pages 124 - 129 * |
陈震昊: "《基于麦克风阵列的语音增强算法的研究与实现》", 《中国硕士优秀学位论文全文数据库 信息科技辑》, no. 03, 16 February 2022 (2022-02-16), pages 10 - 67 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115497500A (en) * | 2022-11-14 | 2022-12-20 | 北京探境科技有限公司 | Audio processing method and device, storage medium and intelligent glasses |
Also Published As
Publication number | Publication date |
---|---|
CN114724574B (en) | 2024-07-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7011075B2 (en) | Target voice acquisition method and device based on microphone array | |
CN106782590B (en) | Microphone array beam forming method based on reverberation environment | |
US7366662B2 (en) | Separation of target acoustic signals in a multi-transducer arrangement | |
US9224393B2 (en) | Noise estimation for use with noise reduction and echo cancellation in personal communication | |
JP5007442B2 (en) | System and method using level differences between microphones for speech improvement | |
JP5762956B2 (en) | System and method for providing noise suppression utilizing nulling denoising | |
US8965003B2 (en) | Signal processing using spatial filter | |
US8538749B2 (en) | Systems, methods, apparatus, and computer program products for enhanced intelligibility | |
US8204252B1 (en) | System and method for providing close microphone adaptive array processing | |
US8958572B1 (en) | Adaptive noise cancellation for multi-microphone systems | |
US9232309B2 (en) | Microphone array processing system | |
CN108447496B (en) | Speech enhancement method and device based on microphone array | |
US20140037100A1 (en) | Multi-microphone noise reduction using enhanced reference noise signal | |
Priyanka | A review on adaptive beamforming techniques for speech enhancement | |
CN114724574B (en) | Dual-microphone noise reduction method with adjustable expected sound source direction | |
Xu et al. | Adaptive speech enhancement algorithm based on first-order differential microphone array | |
US11153695B2 (en) | Hearing devices and related methods | |
CN113257270A (en) | Multi-channel voice enhancement method based on reference microphone optimization | |
Priyanka et al. | Adaptive Beamforming Using Zelinski-TSNR Multichannel Postfilter for Speech Enhancement | |
CN113763984A (en) | Parameterized noise elimination system for distributed multiple speakers | |
JP2003044087A (en) | Device and method for suppressing noise, voice identifying device, communication equipment and hearing aid | |
CN116320947B (en) | Frequency domain double-channel voice enhancement method applied to hearing aid | |
Lotter et al. | A stereo input-output superdirective beamformer for dual channel noise reduction. | |
US20230186934A1 (en) | Hearing device comprising a low complexity beamformer | |
Hussain et al. | A novel psychoacoustically motivated multichannel speech enhancement system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |