CN111048106A - Pickup method and apparatus based on double microphones and computer device - Google Patents
Pickup method and apparatus based on double microphones and computer device Download PDFInfo
- Publication number
- CN111048106A CN111048106A CN202010171449.XA CN202010171449A CN111048106A CN 111048106 A CN111048106 A CN 111048106A CN 202010171449 A CN202010171449 A CN 202010171449A CN 111048106 A CN111048106 A CN 111048106A
- Authority
- CN
- China
- Prior art keywords
- channel frequency
- frequency domain
- domain signal
- signal
- dual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000001228 spectrum Methods 0.000 claims abstract description 104
- 230000005236 sound signal Effects 0.000 claims abstract description 81
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 45
- 238000004364 calculation method Methods 0.000 claims abstract description 28
- 230000001427 coherent effect Effects 0.000 claims abstract description 18
- 238000003860 storage Methods 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims description 114
- 230000009467 reduction Effects 0.000 claims description 36
- 238000001914 filtration Methods 0.000 claims description 24
- 238000009499 grossing Methods 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 12
- 238000006243 chemical reaction Methods 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 9
- 230000009977 dual effect Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 20
- 230000000694 effects Effects 0.000 abstract description 11
- 238000004519 manufacturing process Methods 0.000 abstract description 4
- 238000011946 reduction process Methods 0.000 abstract description 4
- 238000009432 framing Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The application provides a pickup method, a pickup device, computer equipment and a computer readable storage medium based on double microphones, wherein sound signals are received through the double microphones, then the sound signals are converted into double-channel frequency domain signals, and fixed beams are made on the double-channel frequency domain data, so that first single-channel frequency domain signals are generated. And denoising the first single-channel frequency domain signal according to a preset algorithm to obtain a second single-channel frequency domain signal. And finally, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal, and finishing the whole pickup process of the double microphones. This application is realizing the in-process of pickup, only needs two microphones just can accomplish whole pickup process, effectively reduces hardware manufacturing cost. In the noise reduction process, the preset algorithm calculates the voice existence probability and updates the noise spectrum by using the double-microphone coherent function, so that the robustness to far-field reverberation and noise is greatly improved under the condition of small calculation amount, and the pickup effect is effectively improved.
Description
Technical Field
The present application relates to the field of audio processing technologies, and in particular, to a sound pickup method and apparatus based on two microphones, and a computer device.
Background
With the rise of intelligent voice, the realization of far-field voice pickup by using a microphone array technology becomes one of the current popular technologies. In order to achieve a good far-field interaction effect, an existing microphone array sound pickup device generally adopts four microphones or six microphones. However, since the number of microphones is large, the system of the microphone array sound pickup device is complicated, and a plurality of parameters such as sound source position information are required for sound pickup, so that the calculation amount is large, the cost is high, and the microphone array sound pickup device cannot be applied to small-sized equipment such as a translator.
Disclosure of Invention
The application mainly aims to provide a pickup method and device based on double microphones and computer equipment, and aims to solve the defects of complex structure, large calculation amount and high cost of the existing microphone array pickup device.
In order to achieve the above object, the present application provides a pickup method based on two microphones, including:
acquiring a sound signal, wherein the sound signal is a two-channel time domain signal;
converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal;
making fixed beams on the two-channel frequency domain signals to generate first single-channel frequency domain signals;
calculating a speech presence probability of the first single-channel frequency-domain signal and a first noise power spectrum of the two-channel frequency-domain signal;
updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal;
performing noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;
and converting the second single-channel frequency domain signal into a time domain to generate a final audio signal.
Further, the step of performing noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal includes:
calculating the prior signal-to-noise ratio of the first single-channel frequency domain signal according to the first single-channel frequency domain signal and the second noise power spectrum;
calculating to obtain a frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio;
and filtering the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal.
Further, the step of calculating the speech existence probability of the first single-channel frequency-domain signal and the first noise power spectrum of the two-channel frequency-domain signal includes:
respectively calculating the self-spectral density and the cross-spectral density of the two-channel frequency domain signal;
according to the self-spectral density and the cross-spectral density, calculating to obtain a complex coherence function of the dual-channel frequency domain signal;
and respectively calculating the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the dual-channel frequency domain signal according to the complex coherence function.
Further, the step of calculating the voice existence probability of the first single-channel frequency domain signal according to the complex coherence function includes:
substituting the complex interference function into a first algorithm to calculate the CDR ratio of the first single-channel frequency domain signal;
and carrying out normalization processing on the CDR ratio to obtain the voice existence probability.
Further, the step of calculating a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density includes:
substituting the self-spectral density and the cross-spectral density into a preset formula, and calculating to obtain an initial complex coherence function;
performing time dimension first-order recursive smoothing on the initial complex coherent function to obtain a secondary complex coherent function;
and performing 5-point median filtering processing on the secondary complex coherence function in frequency dimension to obtain the complex coherence function.
Further, the step of calculating the prior snr of the first single-channel frequency-domain signal according to the first single-channel frequency-domain signal and the second noise power spectrum includes:
substituting the first single-channel frequency domain signal and the second noise power spectrum into a second algorithm, and calculating to obtain a posterior signal-to-noise ratio;
and substituting the posterior signal-to-noise ratio into a third algorithm to calculate to obtain the prior signal-to-noise ratio.
Further, the step of converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal includes:
performing frame windowing on the sound signal to obtain a plurality of frame sound sub-signals;
and respectively carrying out fast Fourier transform on the sound sub-signals of each frame to obtain the dual-channel frequency domain signals, wherein the dual-channel frequency domain signals are a set of the dual-channel frequency domain sub-signals respectively corresponding to the sound sub-signals.
The application also provides a pickup apparatus based on two microphones, include:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a sound signal, and the sound signal is a dual-channel time domain signal;
the first conversion module is used for converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal;
the generating module is used for making fixed beams for the two-channel frequency domain data to generate a first single-channel frequency domain signal;
a calculation module for calculating a speech presence probability of the first single-channel frequency-domain signal and a first noise power spectrum of the two-channel frequency-domain signal;
the updating module is used for updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal;
the noise reduction module is used for carrying out noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;
and the second conversion module is used for converting the second single-channel frequency domain signal into a time domain to generate a final audio signal.
Further, the noise reduction module includes:
a first calculating unit, configured to calculate a priori signal-to-noise ratio of the first single-channel frequency-domain signal according to the first single-channel frequency-domain signal and the second noise power spectrum;
the second calculation unit is used for calculating the frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio;
and the filtering unit is used for filtering the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal.
Further, the calculation module includes:
the third calculating unit is used for respectively calculating the self-spectral density and the cross-spectral density of the two-channel frequency domain signal;
the fourth calculating unit is used for calculating a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density;
and the sixth calculating unit is used for respectively calculating the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the dual-channel frequency domain signal according to the complex coherence function.
Further, the sixth calculating unit includes:
the first calculating subunit is configured to substitute the complex interference function into a first algorithm to calculate a CDR ratio of the first single-channel frequency-domain signal;
and the normalizing subunit is used for carrying out normalization processing on the CDR ratio to obtain the voice existence probability.
Further, the fourth calculating unit includes:
the second calculating subunit is used for substituting the self-spectral density and the cross-spectral density into a preset formula to calculate and obtain an initial complex coherence function;
the recursion subunit is used for performing time dimension first-order recursion smoothing on the initial complex coherent function to obtain a secondary complex coherent function;
and the filtering subunit is used for performing 5-point median filtering processing on the secondary complex coherence function in a frequency dimension to obtain the complex coherence function.
Further, the first computing unit includes:
the second calculating subunit is configured to substitute the first single-channel frequency domain signal and the second noise power spectrum into a second algorithm, and calculate to obtain an a posteriori signal-to-noise ratio;
and the third calculation subunit is used for substituting the posterior signal-to-noise ratio into a third algorithm to calculate and obtain the prior signal-to-noise ratio.
Further, the first conversion module includes:
the framing unit is used for framing and windowing the sound signals to obtain a plurality of frame sound sub-signals;
and the first conversion unit is used for respectively carrying out fast Fourier transform on the sound sub-signals of each frame to obtain the dual-channel frequency domain signals, wherein the dual-channel frequency domain signals are a set of dual-channel frequency domain sub-signals respectively corresponding to the sound sub-signals.
The present application further provides a computer device comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of any one of the above methods when executing the computer program.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the method of any of the above.
According to the pickup method, the pickup device and the computer equipment based on the double microphones, the sound signals are received through the double microphones, then the sound signals are converted into the double-channel frequency domain signals, the fixed wave beams are made for the double-channel frequency domain data, and therefore the first single-channel frequency domain signal is generated. And denoising the first single-channel frequency domain signal according to a preset algorithm to obtain a second single-channel frequency domain signal. And finally, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal, and finishing the whole pickup process of the double microphones. This application is realizing the in-process of pickup, only needs two microphones just can accomplish whole pickup process, effectively reduces hardware manufacturing cost. In the noise reduction process, the preset algorithm calculates the voice existence probability and updates the noise spectrum by using the double-microphone coherent function, so that the robustness to far-field reverberation and noise is greatly improved under the condition of small calculation amount, and the pickup effect is effectively improved.
Drawings
Fig. 1 is a schematic diagram illustrating steps of a method for picking up sound based on two microphones according to an embodiment of the present application;
fig. 2 is a block diagram illustrating an overall structure of a sound pickup apparatus based on two microphones according to an embodiment of the present application;
fig. 3 is a block diagram schematically illustrating a structure of a computer device according to an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Referring to fig. 1, an embodiment of the present application provides a method for picking up sound based on two microphones, including:
s1, acquiring a sound signal, wherein the sound signal is a two-channel time domain signal;
s2, converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal;
s3, making fixed wave beams for the two-channel frequency domain signals to generate first single-channel frequency domain signals;
s4, calculating the voice existence probability of the first single-channel frequency domain signal and a first noise power spectrum of the two-channel frequency domain signal;
s5, updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal;
s6, carrying out noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;
and S7, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal.
In this embodiment, the sound pickup system receives a sound signal through two microphones. Wherein, the sound signal is a dual-channel time domain signal. The system firstly carries out framing and windowing on the dual-channel time domain signal to obtain a plurality of frames of dual-channel time domain sub-signals. And then performing fast Fourier transform on each frame of dual-channel time domain sub-signals, transforming each frame of dual-channel time domain sub-signals to a frequency domain to obtain dual-channel frequency domain sub-signals respectively corresponding to each frame of dual-channel time domain sub-signals, wherein the set of each frame of dual-channel frequency domain sub-signals forms the dual-channel frequency domain signals corresponding to the sound signals transformed to the frequency domain. Then, the system makes fixed wave beam to the dual-channel frequency domain signal, namely, the dual-channel frequency domain sub-signals of each frame are respectively input into a first formula to be correspondingly calculated, so that a first single-channel frequency domain signal is generated. Specifically, the first formula is:wherein, in the step (A),、respectively a microphone 1 and a microphone 2Frame, short-time spectrum at k frequency points,is a first single channel frequency domain signal. After the system obtains the first single-channel frequency domain signal, the noise of the first single-channel frequency domain signal is reduced according to a preset algorithm, and therefore a second single-channel frequency domain signal is obtained. The specific process is as follows: the system firstly calculates the dual-channel frequency domain signals through first-order recursive smoothing respectively to obtain the corresponding self-spectral density and cross-spectral density. And then, according to the self-spectral density and the cross-spectral density, calculating to obtain a complex phase interference function of the dual-channel frequency domain signals, wherein the complex phase interference function is used for representing the correlation among all frequencies of the dual-channel signals. The system respectively calculates and obtains the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the double-channel frequency domain signal according to the complex coherence function. And then, updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal. Specifically, the system updates the first noise power spectrum by using the voice existence probability as a smoothing factor, inputs the first single-channel frequency domain signal and the voice existence probability into a fourth formula, and calculates to obtain a second noise power spectrum of the single-channel frequency domain signal. Wherein the fourth formula is:
,is the updated second noise power spectrum. Specifically, the first noise power spectrum is a parameter calculated based on a two-channel frequency domain signal, the second noise power spectrum is a parameter calculated based on a single-channel frequency domain signal, the two-channel frequency domain signal and the single-channel frequency domain signal are both vectors, for example, the single-channel frequency domain signal is 256 × 1, and the two channels are 256 × 2, the first noise power spectrum is calculated from 256 × 2 to obtain 256 × 1, and the second noise power spectrum is calculated from 256 × 1 to obtain 256 × 1. The system calculates to obtain the prior signal-to-noise ratio of the first single-channel frequency domain signal according to the first single-channel frequency domain signal and the second noise power spectrum. Based on the prior informationAnd calculating the noise ratio to obtain the frequency domain filter coefficient of the first single-channel frequency domain signal. And finally, the system carries out filtering and noise reduction on the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal. Because the set of the two-channel frequency domain sub-signals of each frame forms the corresponding two-channel frequency domain signal after the sound signal is transformed to the frequency domain, the second channel signal after noise reduction is actually the set of the second single-channel frequency domain sub-signals corresponding to the two-channel frequency domain sub-signals after noise reduction. And the system respectively performs inverse Fourier transform on each second single-channel frequency domain sub-signal to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal. And then, overlapping and adding the second single-channel time domain sub-signals to obtain a final audio signal, and finishing the whole pickup process.
Further, the step of performing noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal includes:
s601, calculating the prior signal-to-noise ratio of the first single-channel frequency domain signal according to the first single-channel frequency domain signal and the second noise power spectrum;
s602, calculating a frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio;
s603, filtering the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal.
In this embodiment, the system inputs the first single-channel frequency domain signal and the second noise power spectrum into the second algorithm, calculates to obtain the posterior signal-to-noise ratio, and then inputs the posterior signal-to-noise ratio into the third algorithm, thereby calculating to obtain the prior signal-to-noise ratio of the first single-channel frequency domain signal. And the system inputs the prior signal-to-noise ratio into a fifth formula, and calculates to obtain the frequency domain filter coefficient of the first single-channel frequency domain signal. Wherein the fifth formula is:
The system filters the first single-channel frequency domain signal according to the frequency domain filter coefficient, thereby obtaining a second single-channel frequency domain signal. Wherein, the second single-channel frequency domain signal after noise reduction is:
further, the step of calculating the speech existence probability of the first single-channel frequency-domain signal and the first noise power spectrum of the two-channel frequency-domain signal includes:
s401, respectively calculating the self-spectral density and the cross-spectral density of the dual-channel frequency domain signal;
s402, calculating to obtain a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density;
and S403, respectively calculating the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the dual-channel frequency domain signal according to the complex coherence function.
In this embodiment, the system performs corresponding calculation on the dual-channel frequency domain signal through first-order recursive smoothing, to obtain the self-spectral density and the cross-spectral density of the dual-channel frequency domain signal, where the calculation formula corresponding to the self-spectral density is:
the formula of the cross-spectral density is:
the power spectral density function is represented by a function of,in order to smooth out the coefficients of the coefficients,in order to be a self-spectral density,is the cross-spectral density.
Then, the system substitutes the self-spectral density and the cross-spectral density into a second formula, and calculates to obtain a complex coherence function of the dual-channel frequency domain signal, which is used for representing the correlation between each frequency of the dual-channel signal. Wherein the second formula is:
The system substitutes the complex coherence function into a first algorithm, calculates the CDR ratio of the first single-channel frequency domain signal, and then normalizes the CDR ratio to obtain the voice existence probability of the first single-channel frequency domain signal. And substituting the complex interference function into a third formula to calculate and obtain a first noise power spectrum of the dual-channel frequency domain signal. Wherein the third formula is:
Further, the step of calculating the voice existence probability of the first single-channel frequency domain signal according to the complex coherence function includes:
s4031, substituting the complex interference function into a first algorithm, and calculating to obtain a CDR ratio of the first single-channel frequency domain signal;
s4032, normalization processing is carried out on the CDR ratio to obtain the voice existence probability.
In this embodiment, the system substitutes the complex coherence function into the sixth formula to calculate the CDR ratio. Wherein the sixth formula is:
,,for the coherence function of the diffuse noise field, f is the signal frequency, d is the microphone separation, and c is the speed of sound propagation in air. After the CDR ratio is obtained through calculation, the system carries out normalization processing on the CDR ratio, and substitutes the CDR ratio into a seventh formula, so that the existence probability of the voice is obtained through calculation. Wherein the seventh formula is:
Further, the step of calculating a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density includes:
s4021, substituting the self-spectral density and the cross-spectral density into a preset formula, and calculating to obtain an initial complex coherence function;
s4022, performing time dimension first-order recursive smoothing on the initial complex coherent function to obtain a secondary complex coherent function;
s4023, the secondary complex coherence function is subjected to 5-point median filtering of frequency dimension to obtain the complex coherence function.
In this embodiment, after the system calculates the initial complex coherence function according to the self-spectral density and the cross-spectral density, the initial complex coherence function may contain much noise. In order to obtain better noise reduction effect, the system can further filter the initial complex coherence function. Specifically, the system performs time dimension first-order recursive smoothing on the initial complex coherence function, substitutes the initial complex coherence function into an eighth formula, and calculates to obtain a secondary complex coherence function. Wherein the eighth formula is:
Then, the system performs 5-point median filtering processing on the secondary complex phase dry function according to a frequency dimension, and performs corresponding calculation according to a ninth formula, so as to obtain a filtered complex phase dry function, namely the complex phase dry function. Wherein the ninth formula is:
wherein, the number of the median filtering points is determined by the staff through the related experiments and then is input into the system,in the subsequent calculation, the filtered complex coherence function may be used to perform the subsequent corresponding calculation. The filtered complex phase interference function can track the environmental noise change more quickly by matching with a smaller smooth coefficient, so that the noise reduction effect is effectively improved.
Further, the step of calculating the prior snr of the first single-channel frequency-domain signal according to the first single-channel frequency-domain signal and the second noise power spectrum includes:
s6011, substituting the first single-channel frequency domain signal and the second noise power spectrum into a second algorithm, and calculating to obtain a posterior signal-to-noise ratio;
and S6012, substituting the posterior signal-to-noise ratio into a third algorithm, and calculating to obtain the prior signal-to-noise ratio.
In this embodiment, the system uses a decision-directed method, and first substitutes the first single-channel frequency-domain signal and the second noise power spectrum into the second algorithm, thereby calculating to obtain the posterior signal-to-noise ratio. Wherein the second algorithm is:
And then, the system substitutes the posterior signal-to-noise ratio calculated in the last step into a third algorithm so as to calculate and obtain the prior signal-to-noise ratio. Wherein the third algorithm is:
Further, the step of converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal includes:
s201, performing frame windowing on the sound signal to obtain a plurality of frame sound sub-signals;
s202, performing fast Fourier transform on the sound sub-signals of each frame respectively to obtain the dual-channel frequency domain signals, wherein the dual-channel frequency domain signals are a set of the dual-channel frequency domain sub-signals corresponding to the sound sub-signals respectively.
In this embodiment, the system performs framing and windowing on the dual-channel time domain signal, that is, the sound signal, so as to obtain a plurality of frames of dual-channel time domain sub-signals, which is convenient for subsequent processing such as noise reduction on each frame of dual-channel time domain sub-signal, thereby achieving a better sound pickup effect. The system respectively carries out fast Fourier transform on each frame of double-channel time domain sub-signals, and each frame of double-channel time domain sub-signal is transformed to a frequency domain, so that double-channel frequency domain sub-signals respectively corresponding to each frame of double-channel time domain sub-signal are obtained, and the set of each frame of double-channel frequency domain sub-signal forms a double-channel frequency domain signal corresponding to a sound signal transformed to the frequency domain.
Further, the step of generating a final audio signal by converting the second single-channel frequency-domain signal to the time domain includes:
s701, respectively carrying out inverse Fourier transform on each second single-channel frequency domain sub-signal to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal;
and S702, overlapping and adding the second single-channel time domain sub-signals to obtain the final audio signal.
In this embodiment, since the set of the dual-channel frequency domain sub-signals of each frame forms the corresponding dual-channel frequency domain signal after the sound signal is transformed to the frequency domain, the second channel signal after noise reduction is actually the set of the second single-channel frequency domain sub-signals corresponding to the respective dual-channel frequency domain sub-signals after noise reduction. The system needs to perform inverse fourier transform on each second single-channel frequency domain sub-signal, and convert each second single-channel frequency domain sub-signal into a time domain, so as to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal. And then, the system performs overlapping addition on each second single-channel time domain sub-signal to obtain a final audio signal output, and the whole pickup process is completed.
According to the pickup method based on the double microphones, the sound signals are received through the double microphones, then the sound signals are converted into the double-channel frequency domain signals, fixed beams are made on the double-channel frequency domain data, and therefore the first single-channel frequency domain signal is generated. And denoising the first single-channel frequency domain signal according to a preset algorithm to obtain a second single-channel frequency domain signal. And finally, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal, and finishing the whole pickup process of the double microphones. This application is realizing the in-process of pickup, only needs two microphones just can accomplish whole pickup process, effectively reduces hardware manufacturing cost. In the noise reduction process, the preset algorithm calculates the voice existence probability and updates the noise spectrum by using the double-microphone coherent function, so that the robustness to far-field reverberation and noise is greatly improved under the condition of small calculation amount, and the pickup effect is effectively improved.
Referring to fig. 2, an embodiment of the present application provides a sound pickup apparatus based on two microphones, including:
the system comprises an acquisition module 1, a processing module and a processing module, wherein the acquisition module is used for acquiring a sound signal, and the sound signal is a dual-channel time domain signal;
the first conversion module 2 is configured to convert the sound signal to a frequency domain to obtain a dual-channel frequency domain signal;
the generating module 3 is configured to perform fixed beam processing on the two-channel frequency domain data to generate a first single-channel frequency domain signal;
a calculating module 4, configured to calculate a speech existence probability of the first single-channel frequency-domain signal and a first noise power spectrum of the two-channel frequency-domain signal;
the updating module 5 is configured to update and calculate the first noise power spectrum according to the first single-channel frequency-domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency-domain signal;
the noise reduction module 6 is configured to perform noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;
and a second conversion module 7, configured to convert the second single-channel frequency-domain signal to a time domain, so as to generate a final audio signal.
In this embodiment, the sound pickup system receives a sound signal through two microphones. Wherein, the sound signal is a dual-channel time domain signal. The system firstly carries out framing and windowing on the dual-channel time domain signal to obtain a plurality of frames of dual-channel time domain sub-signals. And then performing fast Fourier transform on each frame of dual-channel time domain sub-signals, transforming each frame of dual-channel time domain sub-signals to a frequency domain to obtain dual-channel frequency domain sub-signals respectively corresponding to each frame of dual-channel time domain sub-signals, wherein the set of each frame of dual-channel frequency domain sub-signals forms the dual-channel frequency domain signals corresponding to the sound signals transformed to the frequency domain. Then, the system makes fixed beam to the dual-channel frequency domain signal, i.e. each frame is dualAnd respectively inputting the channel frequency domain sub-signals into a first formula to perform corresponding calculation, thereby generating a first single-channel frequency domain signal. Specifically, the first formula is:wherein, in the step (A),、respectively a microphone 1 and a microphone 2Frame, short-time spectrum at k frequency points,is a first single channel frequency domain signal. After the system obtains the first single-channel frequency domain signal, the noise of the first single-channel frequency domain signal is reduced according to a preset algorithm, and therefore a second single-channel frequency domain signal is obtained. The specific process is as follows: the system firstly calculates the dual-channel frequency domain signals through first-order recursive smoothing respectively to obtain the corresponding self-spectral density and cross-spectral density. And then, according to the self-spectral density and the cross-spectral density, calculating to obtain a complex phase interference function of the dual-channel frequency domain signals, wherein the complex phase interference function is used for representing the correlation among all frequencies of the dual-channel signals. The system respectively calculates and obtains the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the double-channel frequency domain signal according to the complex coherence function. And then, updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal. Specifically, the system updates the first noise power spectrum by using the voice existence probability as a smoothing factor, inputs the first single-channel frequency domain signal and the voice existence probability into a fourth formula, and calculates to obtain a second noise power spectrum of the single-channel frequency domain signal. Wherein the fourth formula is:
,is the updated second noise power spectrum. Specifically, the first noise power spectrum is a parameter calculated based on a two-channel frequency domain signal, the second noise power spectrum is a parameter calculated based on a single-channel frequency domain signal, the two-channel frequency domain signal and the single-channel frequency domain signal are both vectors, for example, the single-channel frequency domain signal is 256 × 1, and the two channels are 256 × 2, the first noise power spectrum is calculated from 256 × 2 to obtain 256 × 1, and the second noise power spectrum is calculated from 256 × 1 to obtain 256 × 1. The system calculates to obtain the prior signal-to-noise ratio of the first single-channel frequency domain signal according to the first single-channel frequency domain signal and the second noise power spectrum. And calculating the frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio. And finally, the system carries out filtering and noise reduction on the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal. Because the set of the two-channel frequency domain sub-signals of each frame forms the corresponding two-channel frequency domain signal after the sound signal is transformed to the frequency domain, the second channel signal after noise reduction is actually the set of the second single-channel frequency domain sub-signals corresponding to the two-channel frequency domain sub-signals after noise reduction. And the system respectively performs inverse Fourier transform on each second single-channel frequency domain sub-signal to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal. And then, overlapping and adding the second single-channel time domain sub-signals to obtain a final audio signal, and finishing the whole pickup process.
Further, the noise reduction module 6 includes:
a first calculating unit, configured to calculate a priori signal-to-noise ratio of the first single-channel frequency-domain signal according to the first single-channel frequency-domain signal and the second noise power spectrum;
the second calculation unit is used for calculating the frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio;
and the filtering unit is used for filtering the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal.
In this embodiment, the system inputs the first single-channel frequency domain signal and the second noise power spectrum into the second algorithm, calculates to obtain the posterior signal-to-noise ratio, and then inputs the posterior signal-to-noise ratio into the third algorithm, thereby calculating to obtain the prior signal-to-noise ratio of the first single-channel frequency domain signal. And the system inputs the prior signal-to-noise ratio into a fifth formula, and calculates to obtain the frequency domain filter coefficient of the first single-channel frequency domain signal. Wherein the fifth formula is:
The system filters the first single-channel frequency domain signal according to the frequency domain filter coefficient, thereby obtaining a second single-channel frequency domain signal. Wherein, the second single-channel frequency domain signal after noise reduction is:
further, the calculating module 4 includes:
the third calculating unit is used for respectively calculating the self-spectral density and the cross-spectral density of the two-channel frequency domain signal;
the fourth calculating unit is used for calculating a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density;
and the sixth calculating unit is used for respectively calculating the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the dual-channel frequency domain signal according to the complex coherence function.
In this embodiment, the system performs corresponding calculation on the dual-channel frequency domain signal through first-order recursive smoothing, to obtain the self-spectral density and the cross-spectral density of the dual-channel frequency domain signal, where the calculation formula corresponding to the self-spectral density is:
the formula of the cross-spectral density is:
the power spectral density function is represented by a function of,in order to smooth out the coefficients of the coefficients,in order to be a self-spectral density,is the cross-spectral density.
Then, the system substitutes the self-spectral density and the cross-spectral density into a second formula, and calculates to obtain a complex coherence function of the dual-channel frequency domain signal, which is used for representing the correlation between each frequency of the dual-channel signal. Wherein the second formula is:
The system substitutes the complex coherence function into a first algorithm, calculates the CDR ratio of the first single-channel frequency domain signal, and then normalizes the CDR ratio to obtain the voice existence probability of the first single-channel frequency domain signal. And substituting the complex interference function into a third formula to calculate and obtain a first noise power spectrum of the dual-channel frequency domain signal. Wherein the third formula is:
Further, the sixth calculating unit includes:
the first calculating subunit is configured to substitute the complex interference function into a first algorithm to calculate a CDR ratio of the first single-channel frequency-domain signal;
and the normalizing subunit is used for carrying out normalization processing on the CDR ratio to obtain the voice existence probability.
In this embodiment, the system substitutes the complex coherence function into the sixth formula to calculate the CDR ratio. Wherein the sixth formula is:
,,for the coherence function of the diffuse noise field, f is the signal frequency, d is the microphone separation, and c is the speed of sound propagation in air. After the CDR ratio is obtained through calculation, the system carries out normalization processing on the CDR ratio, and substitutes the CDR ratio into a seventh formula, so that the existence probability of the voice is obtained through calculation. Wherein the seventh formula is:
Further, the fourth calculating unit includes:
the second calculating subunit is used for substituting the self-spectral density and the cross-spectral density into a preset formula to calculate and obtain an initial complex coherence function;
the recursion subunit is used for performing time dimension first-order recursion smoothing on the initial complex coherent function to obtain a secondary complex coherent function;
and the filtering subunit is used for performing 5-point median filtering processing on the secondary complex coherence function in a frequency dimension to obtain the complex coherence function.
In this embodiment, after the system calculates the initial complex coherence function according to the self-spectral density and the cross-spectral density, the initial complex coherence function may contain much noise. In order to obtain better noise reduction effect, the system can further filter the initial complex coherence function. Specifically, the system performs time dimension first-order recursive smoothing on the initial complex coherence function, substitutes the initial complex coherence function into an eighth formula, and calculates to obtain a secondary complex coherence function. Wherein the eighth formula is:
Then, the system performs 5-point median filtering processing on the secondary complex phase dry function according to a frequency dimension, and performs corresponding calculation according to a ninth formula, so as to obtain the filtered complex phase dry function. Wherein the ninth formula is:
wherein, the number of the median filtering points is determined by the staff through the related experiments and then is input into the system,in the subsequent calculation, the filtered complex coherence function may be used to perform the subsequent corresponding calculation. The filtered complex phase interference function can track the environmental noise change more quickly by matching with a smaller smooth coefficient, so that the noise reduction effect is effectively improved.
Further, the first computing unit includes:
the second calculating subunit is configured to substitute the first single-channel frequency domain signal and the second noise power spectrum into a second algorithm, and calculate to obtain an a posteriori signal-to-noise ratio;
and the third calculation subunit is used for substituting the posterior signal-to-noise ratio into a third algorithm to calculate and obtain the prior signal-to-noise ratio.
In this embodiment, the system uses a decision-directed method, and first substitutes the first single-channel frequency-domain signal and the second noise power spectrum into the second algorithm, thereby calculating to obtain the posterior signal-to-noise ratio. Wherein the second algorithm is:
And then, the system substitutes the posterior signal-to-noise ratio calculated in the last step into a third algorithm so as to calculate and obtain the prior signal-to-noise ratio. Wherein the third algorithm is:
Further, the first conversion module 2 includes:
the framing unit is used for framing and windowing the sound signals to obtain a plurality of frame sound sub-signals;
and the first conversion unit is used for respectively carrying out fast Fourier transform on the sound sub-signals of each frame to obtain the dual-channel frequency domain signals, wherein the dual-channel frequency domain signals are a set of dual-channel frequency domain sub-signals respectively corresponding to the sound sub-signals.
In this embodiment, the system performs framing and windowing on the dual-channel time domain signal, that is, the sound signal, so as to obtain a plurality of frames of dual-channel time domain sub-signals, which is convenient for subsequent processing such as noise reduction on each frame of dual-channel time domain sub-signal, thereby achieving a better sound pickup effect. The system respectively carries out fast Fourier transform on each frame of double-channel time domain sub-signals, and each frame of double-channel time domain sub-signal is transformed to a frequency domain, so that double-channel frequency domain sub-signals respectively corresponding to each frame of double-channel time domain sub-signal are obtained, and the set of each frame of double-channel frequency domain sub-signal forms a double-channel frequency domain signal corresponding to a sound signal transformed to the frequency domain.
Further, the second single-channel frequency domain signal is a set of second single-channel frequency domain sub-signals corresponding to each of the two-channel frequency domain sub-signals after noise reduction, and the second converting module 7 includes:
the second conversion unit is used for respectively performing inverse Fourier transform on each second single-channel frequency domain sub-signal to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal;
and the superposition unit is used for performing superposition addition on each second single-channel time domain sub-signal to obtain the final audio signal.
In this embodiment, since the set of the dual-channel frequency domain sub-signals of each frame forms the corresponding dual-channel frequency domain signal after the sound signal is transformed to the frequency domain, the second channel signal after noise reduction is actually the set of the second single-channel frequency domain sub-signals corresponding to the respective dual-channel frequency domain sub-signals after noise reduction. The system needs to perform inverse fourier transform on each second single-channel frequency domain sub-signal, and convert each second single-channel frequency domain sub-signal into a time domain, so as to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal. And then, the system performs overlapping addition on each second single-channel time domain sub-signal to obtain a final audio signal output, and the whole pickup process is completed.
The pickup device based on two microphones that provides in this application receives sound signal through two microphones, then converts sound signal into binary channels frequency domain signal, does fixed beam to binary channels frequency domain data to generate first single channel frequency domain signal. And denoising the first single-channel frequency domain signal according to a preset algorithm to obtain a second single-channel frequency domain signal. And finally, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal, and finishing the whole pickup process of the double microphones. This application is realizing the in-process of pickup, only needs two microphones just can accomplish whole pickup process, effectively reduces hardware manufacturing cost. In the noise reduction process, the preset algorithm calculates the voice existence probability and updates the noise spectrum by using the double-microphone coherent function, so that the robustness to far-field reverberation and noise is greatly improved under the condition of small calculation amount, and the pickup effect is effectively improved.
Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data such as smoothing coefficients. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a two-microphone based sound pickup method.
The processor executes the steps of the sound pickup method based on the two microphones:
s1, acquiring a sound signal, wherein the sound signal is a two-channel time domain signal;
s2, converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal;
s3, making fixed wave beams for the two-channel frequency domain signals to generate first single-channel frequency domain signals;
s4, calculating the voice existence probability of the first single-channel frequency domain signal and a first noise power spectrum of the two-channel frequency domain signal;
s5, updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal;
s6, carrying out noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;
and S7, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal.
Further, the step of performing noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal includes:
s601, calculating the prior signal-to-noise ratio of the first single-channel frequency domain signal according to the first single-channel frequency domain signal and the second noise power spectrum;
s602, calculating a frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio;
s603, filtering the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal.
Further, the step of calculating the speech existence probability of the first single-channel frequency-domain signal and the first noise power spectrum of the two-channel frequency-domain signal includes:
s401, respectively calculating the self-spectral density and the cross-spectral density of the dual-channel frequency domain signal;
s402, calculating to obtain a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density;
and S403, respectively calculating the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the dual-channel frequency domain signal according to the complex coherence function.
Further, the step of calculating the voice existence probability of the first single-channel frequency domain signal according to the complex coherence function includes:
s4031, substituting the complex interference function into a first algorithm, and calculating to obtain a CDR ratio of the first single-channel frequency domain signal;
s4032, normalization processing is carried out on the CDR ratio to obtain the voice existence probability.
Further, the step of calculating a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density includes:
s4021, substituting the self-spectral density and the cross-spectral density into a preset formula, and calculating to obtain an initial complex coherence function;
s4022, performing time dimension first-order recursive smoothing on the initial complex coherent function to obtain a secondary complex coherent function;
s4023, the secondary complex coherence function is subjected to 5-point median filtering of frequency dimension to obtain the complex coherence function.
Further, the step of calculating the prior snr of the first single-channel frequency-domain signal according to the first single-channel frequency-domain signal and the second noise power spectrum includes:
s6011, substituting the first single-channel frequency domain signal and the second noise power spectrum into a second algorithm, and calculating to obtain a posterior signal-to-noise ratio;
and S6012, substituting the posterior signal-to-noise ratio into a third algorithm, and calculating to obtain the prior signal-to-noise ratio.
Further, the step of converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal includes:
s201, performing frame windowing on the sound signal to obtain a plurality of frame sound sub-signals;
s202, performing fast Fourier transform on the sound sub-signals of each frame respectively to obtain the dual-channel frequency domain signals, wherein the dual-channel frequency domain signals are a set of the dual-channel frequency domain sub-signals corresponding to the sound sub-signals respectively.
Further, the step of generating a final audio signal by converting the second single-channel frequency-domain signal to the time domain includes:
s701, respectively carrying out inverse Fourier transform on each second single-channel frequency domain sub-signal to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal;
and S702, overlapping and adding the second single-channel time domain sub-signals to obtain the final audio signal.
An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a sound pickup method based on a dual microphone, and specifically:
s1, acquiring a sound signal, wherein the sound signal is a two-channel time domain signal;
s2, converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal;
s3, making fixed wave beams for the two-channel frequency domain signals to generate first single-channel frequency domain signals;
s4, calculating the voice existence probability of the first single-channel frequency domain signal and a first noise power spectrum of the two-channel frequency domain signal;
s5, updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal;
s6, carrying out noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;
and S7, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal.
Further, the step of performing noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal includes:
s601, calculating the prior signal-to-noise ratio of the first single-channel frequency domain signal according to the first single-channel frequency domain signal and the second noise power spectrum;
s602, calculating a frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio;
s603, filtering the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal.
Further, the step of calculating the speech existence probability of the first single-channel frequency-domain signal and the first noise power spectrum of the two-channel frequency-domain signal includes:
s401, respectively calculating the self-spectral density and the cross-spectral density of the dual-channel frequency domain signal;
s402, calculating to obtain a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density;
and S403, respectively calculating the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the dual-channel frequency domain signal according to the complex coherence function.
Further, the step of calculating the voice existence probability of the first single-channel frequency domain signal according to the complex coherence function includes:
s4031, substituting the complex interference function into a first algorithm, and calculating to obtain a CDR ratio of the first single-channel frequency domain signal;
s4032, normalization processing is carried out on the CDR ratio to obtain the voice existence probability.
Further, the step of calculating a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density includes:
s4021, substituting the self-spectral density and the cross-spectral density into a preset formula, and calculating to obtain an initial complex coherence function;
s4022, performing time dimension first-order recursive smoothing on the initial complex coherent function to obtain a secondary complex coherent function;
s4023, the secondary complex coherence function is subjected to 5-point median filtering of frequency dimension to obtain the complex coherence function.
Further, the step of calculating the prior snr of the first single-channel frequency-domain signal according to the first single-channel frequency-domain signal and the second noise power spectrum includes:
s6011, substituting the first single-channel frequency domain signal and the second noise power spectrum into a second algorithm, and calculating to obtain a posterior signal-to-noise ratio;
and S6012, substituting the posterior signal-to-noise ratio into a third algorithm, and calculating to obtain the prior signal-to-noise ratio.
Further, the step of converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal includes:
s201, performing frame windowing on the sound signal to obtain a plurality of frame sound sub-signals;
s202, performing fast Fourier transform on the sound sub-signals of each frame respectively to obtain the dual-channel frequency domain signals, wherein the dual-channel frequency domain signals are a set of the dual-channel frequency domain sub-signals corresponding to the sound sub-signals respectively.
Further, the step of generating a final audio signal by converting the second single-channel frequency-domain signal to the time domain includes:
s701, respectively carrying out inverse Fourier transform on each second single-channel frequency domain sub-signal to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal;
and S702, overlapping and adding the second single-channel time domain sub-signals to obtain the final audio signal.
It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by hardware associated with instructions of a computer program, which may be stored on a non-volatile computer-readable storage medium, and when executed, may include processes of the above embodiments of the methods. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only for the preferred embodiment of the present application and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.
Claims (10)
1. A pickup method based on two microphones is characterized by comprising the following steps:
acquiring a sound signal, wherein the sound signal is a two-channel time domain signal;
converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal;
making fixed beams on the two-channel frequency domain signals to generate first single-channel frequency domain signals;
calculating a speech presence probability of the first single-channel frequency-domain signal and a first noise power spectrum of the two-channel frequency-domain signal;
updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal;
performing noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;
and converting the second single-channel frequency domain signal into a time domain to generate a final audio signal.
2. The method as claimed in claim 1, wherein the step of denoising the first single-channel frequency-domain signal according to the second noise power spectrum to obtain a second single-channel frequency-domain signal comprises:
calculating the prior signal-to-noise ratio of the first single-channel frequency domain signal according to the first single-channel frequency domain signal and the second noise power spectrum;
calculating to obtain a frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio;
and filtering the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal.
3. The dual-microphone based sound pickup method as claimed in claim 1, wherein the step of calculating the voice existence probability of the first single-channel frequency-domain signal and the first noise power spectrum of the two-channel frequency-domain signal comprises:
respectively calculating the self-spectral density and the cross-spectral density of the two-channel frequency domain signal;
according to the self-spectral density and the cross-spectral density, calculating to obtain a complex coherence function of the dual-channel frequency domain signal;
and respectively calculating the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the dual-channel frequency domain signal according to the complex coherence function.
4. The dual-microphone based sound pickup method as claimed in claim 3, wherein the step of calculating the speech existence probability of the first single-channel frequency-domain signal according to the complex coherence function comprises:
substituting the complex interference function into a first algorithm to calculate the CDR ratio of the first single-channel frequency domain signal;
and carrying out normalization processing on the CDR ratio to obtain the voice existence probability.
5. The dual-microphone based sound pickup method as claimed in claim 3, wherein the step of calculating a complex coherence function of the dual-channel frequency-domain signal according to the self-spectral density and the cross-spectral density comprises:
substituting the self-spectral density and the cross-spectral density into a preset formula, and calculating to obtain an initial complex coherence function;
performing time dimension first-order recursive smoothing on the initial complex coherent function to obtain a secondary complex coherent function;
and performing 5-point median filtering processing on the secondary complex coherence function in frequency dimension to obtain the complex coherence function.
6. The dual-microphone based sound pickup method as claimed in claim 2, wherein the step of calculating the prior snr of the first single-channel frequency-domain signal according to the first single-channel frequency-domain signal and the second noise power spectrum comprises:
substituting the first single-channel frequency domain signal and the second noise power spectrum into a second algorithm, and calculating to obtain a posterior signal-to-noise ratio;
and substituting the posterior signal-to-noise ratio into a third algorithm to calculate to obtain the prior signal-to-noise ratio.
7. The dual-microphone based sound pickup method as claimed in claim 1, wherein the step of converting the sound signal to a frequency domain to obtain a dual-channel frequency domain signal comprises:
performing frame windowing on the sound signal to obtain a plurality of frame sound sub-signals;
and respectively carrying out fast Fourier transform on the sound sub-signals of each frame to obtain the dual-channel frequency domain signals, wherein the dual-channel frequency domain signals are a set of the dual-channel frequency domain sub-signals respectively corresponding to the sound sub-signals.
8. A dual microphone based sound pickup apparatus comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a sound signal, and the sound signal is a dual-channel time domain signal;
the first conversion module is used for converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal;
the generating module is used for making fixed beams for the two-channel frequency domain data to generate a first single-channel frequency domain signal;
a calculation module for calculating a speech presence probability of the first single-channel frequency-domain signal and a first noise power spectrum of the two-channel frequency-domain signal;
the updating module is used for updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal;
the noise reduction module is used for carrying out noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;
and the second conversion module is used for converting the second single-channel frequency domain signal into a time domain to generate a final audio signal.
9. A computer device comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010171449.XA CN111048106B (en) | 2020-03-12 | 2020-03-12 | Pickup method and apparatus based on double microphones and computer device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010171449.XA CN111048106B (en) | 2020-03-12 | 2020-03-12 | Pickup method and apparatus based on double microphones and computer device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111048106A true CN111048106A (en) | 2020-04-21 |
CN111048106B CN111048106B (en) | 2020-06-16 |
Family
ID=70231145
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010171449.XA Active CN111048106B (en) | 2020-03-12 | 2020-03-12 | Pickup method and apparatus based on double microphones and computer device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111048106B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111489753A (en) * | 2020-06-24 | 2020-08-04 | 深圳市友杰智新科技有限公司 | Anti-noise sound source positioning method and device and computer equipment |
CN111986693A (en) * | 2020-08-10 | 2020-11-24 | 北京小米松果电子有限公司 | Audio signal processing method and device, terminal equipment and storage medium |
CN112946576A (en) * | 2020-12-10 | 2021-06-11 | 北京有竹居网络技术有限公司 | Sound source positioning method and device and electronic equipment |
CN113160846A (en) * | 2021-04-22 | 2021-07-23 | 维沃移动通信有限公司 | Noise suppression method and electronic device |
CN113362808A (en) * | 2021-06-02 | 2021-09-07 | 云知声智能科技股份有限公司 | Target direction voice extraction method and device, electronic equipment and storage medium |
CN113380266A (en) * | 2021-05-28 | 2021-09-10 | 中国电子科技集团公司第三研究所 | Miniature double-microphone voice enhancement method and miniature double-microphone |
CN115132220A (en) * | 2022-08-25 | 2022-09-30 | 深圳市友杰智新科技有限公司 | Method, device, equipment and storage medium for restraining double-microphone awakening of television noise |
CN115361617A (en) * | 2022-08-15 | 2022-11-18 | 音曼(北京)科技有限公司 | Non-blind area multi-microphone environmental noise suppression method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8239196B1 (en) * | 2011-07-28 | 2012-08-07 | Google Inc. | System and method for multi-channel multi-feature speech/noise classification for noise suppression |
CN105206281A (en) * | 2015-09-14 | 2015-12-30 | 胡旻波 | Voice enhancement device based on distributed microphone array network |
CN106448692A (en) * | 2016-07-04 | 2017-02-22 | Tcl集团股份有限公司 | RETF reverberation elimination method and system optimized by use of voice existence probability |
CN107301869A (en) * | 2017-08-17 | 2017-10-27 | 珠海全志科技股份有限公司 | Microphone array sound pick-up method, processor and its storage medium |
CN108922554A (en) * | 2018-06-04 | 2018-11-30 | 南京信息工程大学 | The constant Wave beam forming voice enhancement algorithm of LCMV frequency based on logarithm Power estimation |
CN109817209A (en) * | 2019-01-16 | 2019-05-28 | 深圳市友杰智新科技有限公司 | A kind of intelligent speech interactive system based on two-microphone array |
-
2020
- 2020-03-12 CN CN202010171449.XA patent/CN111048106B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8239196B1 (en) * | 2011-07-28 | 2012-08-07 | Google Inc. | System and method for multi-channel multi-feature speech/noise classification for noise suppression |
CN105206281A (en) * | 2015-09-14 | 2015-12-30 | 胡旻波 | Voice enhancement device based on distributed microphone array network |
CN106448692A (en) * | 2016-07-04 | 2017-02-22 | Tcl集团股份有限公司 | RETF reverberation elimination method and system optimized by use of voice existence probability |
CN107301869A (en) * | 2017-08-17 | 2017-10-27 | 珠海全志科技股份有限公司 | Microphone array sound pick-up method, processor and its storage medium |
CN108922554A (en) * | 2018-06-04 | 2018-11-30 | 南京信息工程大学 | The constant Wave beam forming voice enhancement algorithm of LCMV frequency based on logarithm Power estimation |
CN109817209A (en) * | 2019-01-16 | 2019-05-28 | 深圳市友杰智新科技有限公司 | A kind of intelligent speech interactive system based on two-microphone array |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111489753A (en) * | 2020-06-24 | 2020-08-04 | 深圳市友杰智新科技有限公司 | Anti-noise sound source positioning method and device and computer equipment |
CN111986693A (en) * | 2020-08-10 | 2020-11-24 | 北京小米松果电子有限公司 | Audio signal processing method and device, terminal equipment and storage medium |
CN112946576A (en) * | 2020-12-10 | 2021-06-11 | 北京有竹居网络技术有限公司 | Sound source positioning method and device and electronic equipment |
CN112946576B (en) * | 2020-12-10 | 2023-04-14 | 北京有竹居网络技术有限公司 | Sound source positioning method and device and electronic equipment |
CN113160846A (en) * | 2021-04-22 | 2021-07-23 | 维沃移动通信有限公司 | Noise suppression method and electronic device |
CN113160846B (en) * | 2021-04-22 | 2024-05-17 | 维沃移动通信有限公司 | Noise suppression method and electronic equipment |
CN113380266B (en) * | 2021-05-28 | 2022-06-28 | 中国电子科技集团公司第三研究所 | Miniature dual-microphone speech enhancement method and miniature dual-microphone |
CN113380266A (en) * | 2021-05-28 | 2021-09-10 | 中国电子科技集团公司第三研究所 | Miniature double-microphone voice enhancement method and miniature double-microphone |
CN113362808B (en) * | 2021-06-02 | 2023-03-21 | 云知声智能科技股份有限公司 | Target direction voice extraction method and device, electronic equipment and storage medium |
CN113362808A (en) * | 2021-06-02 | 2021-09-07 | 云知声智能科技股份有限公司 | Target direction voice extraction method and device, electronic equipment and storage medium |
CN115361617A (en) * | 2022-08-15 | 2022-11-18 | 音曼(北京)科技有限公司 | Non-blind area multi-microphone environmental noise suppression method |
CN115132220A (en) * | 2022-08-25 | 2022-09-30 | 深圳市友杰智新科技有限公司 | Method, device, equipment and storage medium for restraining double-microphone awakening of television noise |
CN115132220B (en) * | 2022-08-25 | 2023-02-28 | 深圳市友杰智新科技有限公司 | Method, device, equipment and storage medium for restraining double-microphone awakening of television noise |
Also Published As
Publication number | Publication date |
---|---|
CN111048106B (en) | 2020-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111048106B (en) | Pickup method and apparatus based on double microphones and computer device | |
Weninger et al. | Discriminatively trained recurrent neural networks for single-channel speech separation | |
CN113270106B (en) | Dual-microphone wind noise suppression method, device, equipment and storage medium | |
CN110931031A (en) | Deep learning voice extraction and noise reduction method fusing bone vibration sensor and microphone signals | |
CN108200522B (en) | Regularization proportion normalization subband self-adaptive filtering method | |
KR20060086303A (en) | Apparatus and method for separating audio signals | |
JP5195979B2 (en) | Signal separation device, signal separation method, and computer program | |
CN111128220A (en) | Dereverberation method, apparatus, device and storage medium | |
CN112331226B (en) | Voice enhancement system and method for active noise reduction system | |
Mohammadiha et al. | Joint acoustic and spectral modeling for speech dereverberation using non-negative representations | |
CN112435685A (en) | Blind source separation method and device for strong reverberation environment, voice equipment and storage medium | |
US11647344B2 (en) | Hearing device with end-to-end neural network | |
CN112530451A (en) | Speech enhancement method based on denoising autoencoder | |
Li et al. | Online Directional Speech Enhancement Using Geometrically Constrained Independent Vector Analysis. | |
Qi et al. | Exploring deep hybrid tensor-to-vector network architectures for regression based speech enhancement | |
US11622208B2 (en) | Apparatus and method for own voice suppression | |
CN111696573B (en) | Sound source signal processing method and device, electronic equipment and storage medium | |
Albataineh et al. | A RobustICA-based algorithmic system for blind separation of convolutive mixtures | |
Thien et al. | Inter-frequency phase difference for phase reconstruction using deep neural networks and maximum likelihood | |
JP4946330B2 (en) | Signal separation apparatus and method | |
Hossain et al. | Dual-transform source separation using sparse nonnegative matrix factorization | |
CN113724727A (en) | Long-short time memory network voice separation algorithm based on beam forming | |
Yang et al. | Speech dereverberation using weighted prediction error with prior learnt from data | |
Itzhak et al. | Quadratic beamforming for magnitude estimation | |
CN113132848A (en) | Filter design method and device and in-ear active noise reduction earphone |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |