CN111048106A

CN111048106A - Pickup method and apparatus based on double microphones and computer device

Info

Publication number: CN111048106A
Application number: CN202010171449.XA
Authority: CN
Inventors: 王维; 王广新; 杨汉丹
Original assignee: Shenzhen Youjie Zhixin Technology Co ltd
Current assignee: Shenzhen Youjie Zhixin Technology Co ltd
Priority date: 2020-03-12
Filing date: 2020-03-12
Publication date: 2020-04-21
Anticipated expiration: 2040-03-12
Also published as: CN111048106B

Abstract

The application provides a pickup method, a pickup device, computer equipment and a computer readable storage medium based on double microphones, wherein sound signals are received through the double microphones, then the sound signals are converted into double-channel frequency domain signals, and fixed beams are made on the double-channel frequency domain data, so that first single-channel frequency domain signals are generated. And denoising the first single-channel frequency domain signal according to a preset algorithm to obtain a second single-channel frequency domain signal. And finally, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal, and finishing the whole pickup process of the double microphones. This application is realizing the in-process of pickup, only needs two microphones just can accomplish whole pickup process, effectively reduces hardware manufacturing cost. In the noise reduction process, the preset algorithm calculates the voice existence probability and updates the noise spectrum by using the double-microphone coherent function, so that the robustness to far-field reverberation and noise is greatly improved under the condition of small calculation amount, and the pickup effect is effectively improved.

Description

Pickup method and apparatus based on double microphones and computer device

Technical Field

The present application relates to the field of audio processing technologies, and in particular, to a sound pickup method and apparatus based on two microphones, and a computer device.

Background

With the rise of intelligent voice, the realization of far-field voice pickup by using a microphone array technology becomes one of the current popular technologies. In order to achieve a good far-field interaction effect, an existing microphone array sound pickup device generally adopts four microphones or six microphones. However, since the number of microphones is large, the system of the microphone array sound pickup device is complicated, and a plurality of parameters such as sound source position information are required for sound pickup, so that the calculation amount is large, the cost is high, and the microphone array sound pickup device cannot be applied to small-sized equipment such as a translator.

Disclosure of Invention

The application mainly aims to provide a pickup method and device based on double microphones and computer equipment, and aims to solve the defects of complex structure, large calculation amount and high cost of the existing microphone array pickup device.

In order to achieve the above object, the present application provides a pickup method based on two microphones, including:

acquiring a sound signal, wherein the sound signal is a two-channel time domain signal;

converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal;

making fixed beams on the two-channel frequency domain signals to generate first single-channel frequency domain signals;

calculating a speech presence probability of the first single-channel frequency-domain signal and a first noise power spectrum of the two-channel frequency-domain signal;

updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal;

performing noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;

and converting the second single-channel frequency domain signal into a time domain to generate a final audio signal.

Further, the step of performing noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal includes:

calculating the prior signal-to-noise ratio of the first single-channel frequency domain signal according to the first single-channel frequency domain signal and the second noise power spectrum;

calculating to obtain a frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio;

and filtering the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal.

Further, the step of calculating the speech existence probability of the first single-channel frequency-domain signal and the first noise power spectrum of the two-channel frequency-domain signal includes:

respectively calculating the self-spectral density and the cross-spectral density of the two-channel frequency domain signal;

according to the self-spectral density and the cross-spectral density, calculating to obtain a complex coherence function of the dual-channel frequency domain signal;

and respectively calculating the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the dual-channel frequency domain signal according to the complex coherence function.

Further, the step of calculating the voice existence probability of the first single-channel frequency domain signal according to the complex coherence function includes:

substituting the complex interference function into a first algorithm to calculate the CDR ratio of the first single-channel frequency domain signal;

and carrying out normalization processing on the CDR ratio to obtain the voice existence probability.

Further, the step of calculating a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density includes:

substituting the self-spectral density and the cross-spectral density into a preset formula, and calculating to obtain an initial complex coherence function;

performing time dimension first-order recursive smoothing on the initial complex coherent function to obtain a secondary complex coherent function;

and performing 5-point median filtering processing on the secondary complex coherence function in frequency dimension to obtain the complex coherence function.

Further, the step of calculating the prior snr of the first single-channel frequency-domain signal according to the first single-channel frequency-domain signal and the second noise power spectrum includes:

substituting the first single-channel frequency domain signal and the second noise power spectrum into a second algorithm, and calculating to obtain a posterior signal-to-noise ratio;

and substituting the posterior signal-to-noise ratio into a third algorithm to calculate to obtain the prior signal-to-noise ratio.

Further, the step of converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal includes:

performing frame windowing on the sound signal to obtain a plurality of frame sound sub-signals;

and respectively carrying out fast Fourier transform on the sound sub-signals of each frame to obtain the dual-channel frequency domain signals, wherein the dual-channel frequency domain signals are a set of the dual-channel frequency domain sub-signals respectively corresponding to the sound sub-signals.

The application also provides a pickup apparatus based on two microphones, include:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a sound signal, and the sound signal is a dual-channel time domain signal;

the first conversion module is used for converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal;

the generating module is used for making fixed beams for the two-channel frequency domain data to generate a first single-channel frequency domain signal;

a calculation module for calculating a speech presence probability of the first single-channel frequency-domain signal and a first noise power spectrum of the two-channel frequency-domain signal;

the updating module is used for updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal;

the noise reduction module is used for carrying out noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;

and the second conversion module is used for converting the second single-channel frequency domain signal into a time domain to generate a final audio signal.

Further, the noise reduction module includes:

a first calculating unit, configured to calculate a priori signal-to-noise ratio of the first single-channel frequency-domain signal according to the first single-channel frequency-domain signal and the second noise power spectrum;

the second calculation unit is used for calculating the frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio;

and the filtering unit is used for filtering the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal.

Further, the calculation module includes:

the third calculating unit is used for respectively calculating the self-spectral density and the cross-spectral density of the two-channel frequency domain signal;

the fourth calculating unit is used for calculating a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density;

and the sixth calculating unit is used for respectively calculating the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the dual-channel frequency domain signal according to the complex coherence function.

Further, the sixth calculating unit includes:

the first calculating subunit is configured to substitute the complex interference function into a first algorithm to calculate a CDR ratio of the first single-channel frequency-domain signal;

and the normalizing subunit is used for carrying out normalization processing on the CDR ratio to obtain the voice existence probability.

Further, the fourth calculating unit includes:

the second calculating subunit is used for substituting the self-spectral density and the cross-spectral density into a preset formula to calculate and obtain an initial complex coherence function;

the recursion subunit is used for performing time dimension first-order recursion smoothing on the initial complex coherent function to obtain a secondary complex coherent function;

and the filtering subunit is used for performing 5-point median filtering processing on the secondary complex coherence function in a frequency dimension to obtain the complex coherence function.

Further, the first computing unit includes:

the second calculating subunit is configured to substitute the first single-channel frequency domain signal and the second noise power spectrum into a second algorithm, and calculate to obtain an a posteriori signal-to-noise ratio;

and the third calculation subunit is used for substituting the posterior signal-to-noise ratio into a third algorithm to calculate and obtain the prior signal-to-noise ratio.

Further, the first conversion module includes:

the framing unit is used for framing and windowing the sound signals to obtain a plurality of frame sound sub-signals;

and the first conversion unit is used for respectively carrying out fast Fourier transform on the sound sub-signals of each frame to obtain the dual-channel frequency domain signals, wherein the dual-channel frequency domain signals are a set of dual-channel frequency domain sub-signals respectively corresponding to the sound sub-signals.

The present application further provides a computer device comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of any one of the above methods when executing the computer program.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the method of any of the above.

According to the pickup method, the pickup device and the computer equipment based on the double microphones, the sound signals are received through the double microphones, then the sound signals are converted into the double-channel frequency domain signals, the fixed wave beams are made for the double-channel frequency domain data, and therefore the first single-channel frequency domain signal is generated. And denoising the first single-channel frequency domain signal according to a preset algorithm to obtain a second single-channel frequency domain signal. And finally, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal, and finishing the whole pickup process of the double microphones. This application is realizing the in-process of pickup, only needs two microphones just can accomplish whole pickup process, effectively reduces hardware manufacturing cost. In the noise reduction process, the preset algorithm calculates the voice existence probability and updates the noise spectrum by using the double-microphone coherent function, so that the robustness to far-field reverberation and noise is greatly improved under the condition of small calculation amount, and the pickup effect is effectively improved.

Drawings

Fig. 1 is a schematic diagram illustrating steps of a method for picking up sound based on two microphones according to an embodiment of the present application;

fig. 2 is a block diagram illustrating an overall structure of a sound pickup apparatus based on two microphones according to an embodiment of the present application;

fig. 3 is a block diagram schematically illustrating a structure of a computer device according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Referring to fig. 1, an embodiment of the present application provides a method for picking up sound based on two microphones, including:

s1, acquiring a sound signal, wherein the sound signal is a two-channel time domain signal;

s2, converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal;

s3, making fixed wave beams for the two-channel frequency domain signals to generate first single-channel frequency domain signals;

s4, calculating the voice existence probability of the first single-channel frequency domain signal and a first noise power spectrum of the two-channel frequency domain signal;

s5, updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal;

s6, carrying out noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;

and S7, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal.

In this embodiment, the sound pickup system receives a sound signal through two microphones. Wherein, the sound signal is a dual-channel time domain signal. The system firstly carries out framing and windowing on the dual-channel time domain signal to obtain a plurality of frames of dual-channel time domain sub-signals. And then performing fast Fourier transform on each frame of dual-channel time domain sub-signals, transforming each frame of dual-channel time domain sub-signals to a frequency domain to obtain dual-channel frequency domain sub-signals respectively corresponding to each frame of dual-channel time domain sub-signals, wherein the set of each frame of dual-channel frequency domain sub-signals forms the dual-channel frequency domain signals corresponding to the sound signals transformed to the frequency domain. Then, the system makes fixed wave beam to the dual-channel frequency domain signal, namely, the dual-channel frequency domain sub-signals of each frame are respectively input into a first formula to be correspondingly calculated, so that a first single-channel frequency domain signal is generated. Specifically, the first formula is:

wherein, in the step (A),

、

respectively a microphone 1 and a microphone 2

Frame, short-time spectrum at k frequency points,

is a first single channel frequency domain signal. After the system obtains the first single-channel frequency domain signal, the noise of the first single-channel frequency domain signal is reduced according to a preset algorithm, and therefore a second single-channel frequency domain signal is obtained. The specific process is as follows: the system firstly calculates the dual-channel frequency domain signals through first-order recursive smoothing respectively to obtain the corresponding self-spectral density and cross-spectral density. And then, according to the self-spectral density and the cross-spectral density, calculating to obtain a complex phase interference function of the dual-channel frequency domain signals, wherein the complex phase interference function is used for representing the correlation among all frequencies of the dual-channel signals. The system respectively calculates and obtains the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the double-channel frequency domain signal according to the complex coherence function. And then, updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal. Specifically, the system updates the first noise power spectrum by using the voice existence probability as a smoothing factor, inputs the first single-channel frequency domain signal and the voice existence probability into a fourth formula, and calculates to obtain a second noise power spectrum of the single-channel frequency domain signal. Wherein the fourth formula is:

，

is the updated second noise power spectrum. Specifically, the first noise power spectrum is a parameter calculated based on a two-channel frequency domain signal, the second noise power spectrum is a parameter calculated based on a single-channel frequency domain signal, the two-channel frequency domain signal and the single-channel frequency domain signal are both vectors, for example, the single-channel frequency domain signal is 256 × 1, and the two channels are 256 × 2, the first noise power spectrum is calculated from 256 × 2 to obtain 256 × 1, and the second noise power spectrum is calculated from 256 × 1 to obtain 256 × 1. The system calculates to obtain the prior signal-to-noise ratio of the first single-channel frequency domain signal according to the first single-channel frequency domain signal and the second noise power spectrum. Based on the prior informationAnd calculating the noise ratio to obtain the frequency domain filter coefficient of the first single-channel frequency domain signal. And finally, the system carries out filtering and noise reduction on the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal. Because the set of the two-channel frequency domain sub-signals of each frame forms the corresponding two-channel frequency domain signal after the sound signal is transformed to the frequency domain, the second channel signal after noise reduction is actually the set of the second single-channel frequency domain sub-signals corresponding to the two-channel frequency domain sub-signals after noise reduction. And the system respectively performs inverse Fourier transform on each second single-channel frequency domain sub-signal to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal. And then, overlapping and adding the second single-channel time domain sub-signals to obtain a final audio signal, and finishing the whole pickup process.

s601, calculating the prior signal-to-noise ratio of the first single-channel frequency domain signal according to the first single-channel frequency domain signal and the second noise power spectrum;

s602, calculating a frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio;

s603, filtering the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal.

In this embodiment, the system inputs the first single-channel frequency domain signal and the second noise power spectrum into the second algorithm, calculates to obtain the posterior signal-to-noise ratio, and then inputs the posterior signal-to-noise ratio into the third algorithm, thereby calculating to obtain the prior signal-to-noise ratio of the first single-channel frequency domain signal. And the system inputs the prior signal-to-noise ratio into a fifth formula, and calculates to obtain the frequency domain filter coefficient of the first single-channel frequency domain signal. Wherein the fifth formula is:

，

are frequency domain filter coefficients.

The system filters the first single-channel frequency domain signal according to the frequency domain filter coefficient, thereby obtaining a second single-channel frequency domain signal. Wherein, the second single-channel frequency domain signal after noise reduction is:

。

s401, respectively calculating the self-spectral density and the cross-spectral density of the dual-channel frequency domain signal;

s402, calculating to obtain a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density;

and S403, respectively calculating the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the dual-channel frequency domain signal according to the complex coherence function.

In this embodiment, the system performs corresponding calculation on the dual-channel frequency domain signal through first-order recursive smoothing, to obtain the self-spectral density and the cross-spectral density of the dual-channel frequency domain signal, where the calculation formula corresponding to the self-spectral density is:

；

the formula of the cross-spectral density is:

；

the power spectral density function is represented by a function of,

in order to smooth out the coefficients of the coefficients,

in order to be a self-spectral density,

is the cross-spectral density.

Then, the system substitutes the self-spectral density and the cross-spectral density into a second formula, and calculates to obtain a complex coherence function of the dual-channel frequency domain signal, which is used for representing the correlation between each frequency of the dual-channel signal. Wherein the second formula is:

，

is a complex coherence function.

The system substitutes the complex coherence function into a first algorithm, calculates the CDR ratio of the first single-channel frequency domain signal, and then normalizes the CDR ratio to obtain the voice existence probability of the first single-channel frequency domain signal. And substituting the complex interference function into a third formula to calculate and obtain a first noise power spectrum of the dual-channel frequency domain signal. Wherein the third formula is:

，

is a first noise power spectrum.

s4031, substituting the complex interference function into a first algorithm, and calculating to obtain a CDR ratio of the first single-channel frequency domain signal;

s4032, normalization processing is carried out on the CDR ratio to obtain the voice existence probability.

In this embodiment, the system substitutes the complex coherence function into the sixth formula to calculate the CDR ratio. Wherein the sixth formula is:

，

，

for the coherence function of the diffuse noise field, f is the signal frequency, d is the microphone separation, and c is the speed of sound propagation in air. After the CDR ratio is obtained through calculation, the system carries out normalization processing on the CDR ratio, and substitutes the CDR ratio into a seventh formula, so that the existence probability of the voice is obtained through calculation. Wherein the seventh formula is:

and P is the speech presence probability.

s4021, substituting the self-spectral density and the cross-spectral density into a preset formula, and calculating to obtain an initial complex coherence function;

s4022, performing time dimension first-order recursive smoothing on the initial complex coherent function to obtain a secondary complex coherent function;

s4023, the secondary complex coherence function is subjected to 5-point median filtering of frequency dimension to obtain the complex coherence function.

In this embodiment, after the system calculates the initial complex coherence function according to the self-spectral density and the cross-spectral density, the initial complex coherence function may contain much noise. In order to obtain better noise reduction effect, the system can further filter the initial complex coherence function. Specifically, the system performs time dimension first-order recursive smoothing on the initial complex coherence function, substitutes the initial complex coherence function into an eighth formula, and calculates to obtain a secondary complex coherence function. Wherein the eighth formula is:

，

i.e. the quadratic complex coherence function.

Then, the system performs 5-point median filtering processing on the secondary complex phase dry function according to a frequency dimension, and performs corresponding calculation according to a ninth formula, so as to obtain a filtered complex phase dry function, namely the complex phase dry function. Wherein the ninth formula is:

wherein, the number of the median filtering points is determined by the staff through the related experiments and then is input into the system,

in the subsequent calculation, the filtered complex coherence function may be used to perform the subsequent corresponding calculation. The filtered complex phase interference function can track the environmental noise change more quickly by matching with a smaller smooth coefficient, so that the noise reduction effect is effectively improved.

s6011, substituting the first single-channel frequency domain signal and the second noise power spectrum into a second algorithm, and calculating to obtain a posterior signal-to-noise ratio;

and S6012, substituting the posterior signal-to-noise ratio into a third algorithm, and calculating to obtain the prior signal-to-noise ratio.

In this embodiment, the system uses a decision-directed method, and first substitutes the first single-channel frequency-domain signal and the second noise power spectrum into the second algorithm, thereby calculating to obtain the posterior signal-to-noise ratio. Wherein the second algorithm is:

，

is the posterior signal-to-noise ratio.

And then, the system substitutes the posterior signal-to-noise ratio calculated in the last step into a third algorithm so as to calculate and obtain the prior signal-to-noise ratio. Wherein the third algorithm is:

，

is the weight at the last moment.

s201, performing frame windowing on the sound signal to obtain a plurality of frame sound sub-signals;

s202, performing fast Fourier transform on the sound sub-signals of each frame respectively to obtain the dual-channel frequency domain signals, wherein the dual-channel frequency domain signals are a set of the dual-channel frequency domain sub-signals corresponding to the sound sub-signals respectively.

In this embodiment, the system performs framing and windowing on the dual-channel time domain signal, that is, the sound signal, so as to obtain a plurality of frames of dual-channel time domain sub-signals, which is convenient for subsequent processing such as noise reduction on each frame of dual-channel time domain sub-signal, thereby achieving a better sound pickup effect. The system respectively carries out fast Fourier transform on each frame of double-channel time domain sub-signals, and each frame of double-channel time domain sub-signal is transformed to a frequency domain, so that double-channel frequency domain sub-signals respectively corresponding to each frame of double-channel time domain sub-signal are obtained, and the set of each frame of double-channel frequency domain sub-signal forms a double-channel frequency domain signal corresponding to a sound signal transformed to the frequency domain.

Further, the step of generating a final audio signal by converting the second single-channel frequency-domain signal to the time domain includes:

s701, respectively carrying out inverse Fourier transform on each second single-channel frequency domain sub-signal to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal;

and S702, overlapping and adding the second single-channel time domain sub-signals to obtain the final audio signal.

In this embodiment, since the set of the dual-channel frequency domain sub-signals of each frame forms the corresponding dual-channel frequency domain signal after the sound signal is transformed to the frequency domain, the second channel signal after noise reduction is actually the set of the second single-channel frequency domain sub-signals corresponding to the respective dual-channel frequency domain sub-signals after noise reduction. The system needs to perform inverse fourier transform on each second single-channel frequency domain sub-signal, and convert each second single-channel frequency domain sub-signal into a time domain, so as to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal. And then, the system performs overlapping addition on each second single-channel time domain sub-signal to obtain a final audio signal output, and the whole pickup process is completed.

According to the pickup method based on the double microphones, the sound signals are received through the double microphones, then the sound signals are converted into the double-channel frequency domain signals, fixed beams are made on the double-channel frequency domain data, and therefore the first single-channel frequency domain signal is generated. And denoising the first single-channel frequency domain signal according to a preset algorithm to obtain a second single-channel frequency domain signal. And finally, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal, and finishing the whole pickup process of the double microphones. This application is realizing the in-process of pickup, only needs two microphones just can accomplish whole pickup process, effectively reduces hardware manufacturing cost. In the noise reduction process, the preset algorithm calculates the voice existence probability and updates the noise spectrum by using the double-microphone coherent function, so that the robustness to far-field reverberation and noise is greatly improved under the condition of small calculation amount, and the pickup effect is effectively improved.

Referring to fig. 2, an embodiment of the present application provides a sound pickup apparatus based on two microphones, including:

the system comprises an acquisition module 1, a processing module and a processing module, wherein the acquisition module is used for acquiring a sound signal, and the sound signal is a dual-channel time domain signal;

the first conversion module 2 is configured to convert the sound signal to a frequency domain to obtain a dual-channel frequency domain signal;

the generating module 3 is configured to perform fixed beam processing on the two-channel frequency domain data to generate a first single-channel frequency domain signal;

a calculating module 4, configured to calculate a speech existence probability of the first single-channel frequency-domain signal and a first noise power spectrum of the two-channel frequency-domain signal;

the updating module 5 is configured to update and calculate the first noise power spectrum according to the first single-channel frequency-domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency-domain signal;

the noise reduction module 6 is configured to perform noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;

and a second conversion module 7, configured to convert the second single-channel frequency-domain signal to a time domain, so as to generate a final audio signal.

In this embodiment, the sound pickup system receives a sound signal through two microphones. Wherein, the sound signal is a dual-channel time domain signal. The system firstly carries out framing and windowing on the dual-channel time domain signal to obtain a plurality of frames of dual-channel time domain sub-signals. And then performing fast Fourier transform on each frame of dual-channel time domain sub-signals, transforming each frame of dual-channel time domain sub-signals to a frequency domain to obtain dual-channel frequency domain sub-signals respectively corresponding to each frame of dual-channel time domain sub-signals, wherein the set of each frame of dual-channel frequency domain sub-signals forms the dual-channel frequency domain signals corresponding to the sound signals transformed to the frequency domain. Then, the system makes fixed beam to the dual-channel frequency domain signal, i.e. each frame is dualAnd respectively inputting the channel frequency domain sub-signals into a first formula to perform corresponding calculation, thereby generating a first single-channel frequency domain signal. Specifically, the first formula is:

wherein, in the step (A),

、

respectively a microphone 1 and a microphone 2

Frame, short-time spectrum at k frequency points,

，

is the updated second noise power spectrum. Specifically, the first noise power spectrum is a parameter calculated based on a two-channel frequency domain signal, the second noise power spectrum is a parameter calculated based on a single-channel frequency domain signal, the two-channel frequency domain signal and the single-channel frequency domain signal are both vectors, for example, the single-channel frequency domain signal is 256 × 1, and the two channels are 256 × 2, the first noise power spectrum is calculated from 256 × 2 to obtain 256 × 1, and the second noise power spectrum is calculated from 256 × 1 to obtain 256 × 1. The system calculates to obtain the prior signal-to-noise ratio of the first single-channel frequency domain signal according to the first single-channel frequency domain signal and the second noise power spectrum. And calculating the frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio. And finally, the system carries out filtering and noise reduction on the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal. Because the set of the two-channel frequency domain sub-signals of each frame forms the corresponding two-channel frequency domain signal after the sound signal is transformed to the frequency domain, the second channel signal after noise reduction is actually the set of the second single-channel frequency domain sub-signals corresponding to the two-channel frequency domain sub-signals after noise reduction. And the system respectively performs inverse Fourier transform on each second single-channel frequency domain sub-signal to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal. And then, overlapping and adding the second single-channel time domain sub-signals to obtain a final audio signal, and finishing the whole pickup process.

Further, the noise reduction module 6 includes:

，

are frequency domain filter coefficients.

。

further, the calculating module 4 includes:

；

the formula of the cross-spectral density is:

；

the power spectral density function is represented by a function of,

in order to smooth out the coefficients of the coefficients,

in order to be a self-spectral density,

is the cross-spectral density.

，

is a complex coherence function.

，

is a first noise power spectrum.

Further, the sixth calculating unit includes:

，

，

and P is the speech presence probability.

Further, the fourth calculating unit includes:

，

i.e. the quadratic complex coherence function.

Then, the system performs 5-point median filtering processing on the secondary complex phase dry function according to a frequency dimension, and performs corresponding calculation according to a ninth formula, so as to obtain the filtered complex phase dry function. Wherein the ninth formula is:

Further, the first computing unit includes:

，

is the posterior signal-to-noise ratio.

，

is the weight at the last moment.

Further, the first conversion module 2 includes:

Further, the second single-channel frequency domain signal is a set of second single-channel frequency domain sub-signals corresponding to each of the two-channel frequency domain sub-signals after noise reduction, and the second converting module 7 includes:

the second conversion unit is used for respectively performing inverse Fourier transform on each second single-channel frequency domain sub-signal to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal;

and the superposition unit is used for performing superposition addition on each second single-channel time domain sub-signal to obtain the final audio signal.

The pickup device based on two microphones that provides in this application receives sound signal through two microphones, then converts sound signal into binary channels frequency domain signal, does fixed beam to binary channels frequency domain data to generate first single channel frequency domain signal. And denoising the first single-channel frequency domain signal according to a preset algorithm to obtain a second single-channel frequency domain signal. And finally, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal, and finishing the whole pickup process of the double microphones. This application is realizing the in-process of pickup, only needs two microphones just can accomplish whole pickup process, effectively reduces hardware manufacturing cost. In the noise reduction process, the preset algorithm calculates the voice existence probability and updates the noise spectrum by using the double-microphone coherent function, so that the robustness to far-field reverberation and noise is greatly improved under the condition of small calculation amount, and the pickup effect is effectively improved.

Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data such as smoothing coefficients. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a two-microphone based sound pickup method.

The processor executes the steps of the sound pickup method based on the two microphones:

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a sound pickup method based on a dual microphone, and specifically:

It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by hardware associated with instructions of a computer program, which may be stored on a non-volatile computer-readable storage medium, and when executed, may include processes of the above embodiments of the methods. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above description is only for the preferred embodiment of the present application and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.

Claims

1. A pickup method based on two microphones is characterized by comprising the following steps:

2. The method as claimed in claim 1, wherein the step of denoising the first single-channel frequency-domain signal according to the second noise power spectrum to obtain a second single-channel frequency-domain signal comprises:

3. The dual-microphone based sound pickup method as claimed in claim 1, wherein the step of calculating the voice existence probability of the first single-channel frequency-domain signal and the first noise power spectrum of the two-channel frequency-domain signal comprises:

4. The dual-microphone based sound pickup method as claimed in claim 3, wherein the step of calculating the speech existence probability of the first single-channel frequency-domain signal according to the complex coherence function comprises:

5. The dual-microphone based sound pickup method as claimed in claim 3, wherein the step of calculating a complex coherence function of the dual-channel frequency-domain signal according to the self-spectral density and the cross-spectral density comprises:

6. The dual-microphone based sound pickup method as claimed in claim 2, wherein the step of calculating the prior snr of the first single-channel frequency-domain signal according to the first single-channel frequency-domain signal and the second noise power spectrum comprises:

7. The dual-microphone based sound pickup method as claimed in claim 1, wherein the step of converting the sound signal to a frequency domain to obtain a dual-channel frequency domain signal comprises:

8. A dual microphone based sound pickup apparatus comprising:

9. A computer device comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method according to any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.