CN111048106A - Pickup method and apparatus based on double microphones and computer device - Google Patents

Pickup method and apparatus based on double microphones and computer device Download PDF

Info

Publication number
CN111048106A
CN111048106A CN202010171449.XA CN202010171449A CN111048106A CN 111048106 A CN111048106 A CN 111048106A CN 202010171449 A CN202010171449 A CN 202010171449A CN 111048106 A CN111048106 A CN 111048106A
Authority
CN
China
Prior art keywords
channel frequency
frequency domain
domain signal
signal
dual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010171449.XA
Other languages
Chinese (zh)
Other versions
CN111048106B (en
Inventor
王维
王广新
杨汉丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Youjie Zhixin Technology Co ltd
Original Assignee
Shenzhen Youjie Zhixin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Youjie Zhixin Technology Co ltd filed Critical Shenzhen Youjie Zhixin Technology Co ltd
Priority to CN202010171449.XA priority Critical patent/CN111048106B/en
Publication of CN111048106A publication Critical patent/CN111048106A/en
Application granted granted Critical
Publication of CN111048106B publication Critical patent/CN111048106B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The application provides a pickup method, a pickup device, computer equipment and a computer readable storage medium based on double microphones, wherein sound signals are received through the double microphones, then the sound signals are converted into double-channel frequency domain signals, and fixed beams are made on the double-channel frequency domain data, so that first single-channel frequency domain signals are generated. And denoising the first single-channel frequency domain signal according to a preset algorithm to obtain a second single-channel frequency domain signal. And finally, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal, and finishing the whole pickup process of the double microphones. This application is realizing the in-process of pickup, only needs two microphones just can accomplish whole pickup process, effectively reduces hardware manufacturing cost. In the noise reduction process, the preset algorithm calculates the voice existence probability and updates the noise spectrum by using the double-microphone coherent function, so that the robustness to far-field reverberation and noise is greatly improved under the condition of small calculation amount, and the pickup effect is effectively improved.

Description

Pickup method and apparatus based on double microphones and computer device
Technical Field
The present application relates to the field of audio processing technologies, and in particular, to a sound pickup method and apparatus based on two microphones, and a computer device.
Background
With the rise of intelligent voice, the realization of far-field voice pickup by using a microphone array technology becomes one of the current popular technologies. In order to achieve a good far-field interaction effect, an existing microphone array sound pickup device generally adopts four microphones or six microphones. However, since the number of microphones is large, the system of the microphone array sound pickup device is complicated, and a plurality of parameters such as sound source position information are required for sound pickup, so that the calculation amount is large, the cost is high, and the microphone array sound pickup device cannot be applied to small-sized equipment such as a translator.
Disclosure of Invention
The application mainly aims to provide a pickup method and device based on double microphones and computer equipment, and aims to solve the defects of complex structure, large calculation amount and high cost of the existing microphone array pickup device.
In order to achieve the above object, the present application provides a pickup method based on two microphones, including:
acquiring a sound signal, wherein the sound signal is a two-channel time domain signal;
converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal;
making fixed beams on the two-channel frequency domain signals to generate first single-channel frequency domain signals;
calculating a speech presence probability of the first single-channel frequency-domain signal and a first noise power spectrum of the two-channel frequency-domain signal;
updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal;
performing noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;
and converting the second single-channel frequency domain signal into a time domain to generate a final audio signal.
Further, the step of performing noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal includes:
calculating the prior signal-to-noise ratio of the first single-channel frequency domain signal according to the first single-channel frequency domain signal and the second noise power spectrum;
calculating to obtain a frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio;
and filtering the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal.
Further, the step of calculating the speech existence probability of the first single-channel frequency-domain signal and the first noise power spectrum of the two-channel frequency-domain signal includes:
respectively calculating the self-spectral density and the cross-spectral density of the two-channel frequency domain signal;
according to the self-spectral density and the cross-spectral density, calculating to obtain a complex coherence function of the dual-channel frequency domain signal;
and respectively calculating the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the dual-channel frequency domain signal according to the complex coherence function.
Further, the step of calculating the voice existence probability of the first single-channel frequency domain signal according to the complex coherence function includes:
substituting the complex interference function into a first algorithm to calculate the CDR ratio of the first single-channel frequency domain signal;
and carrying out normalization processing on the CDR ratio to obtain the voice existence probability.
Further, the step of calculating a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density includes:
substituting the self-spectral density and the cross-spectral density into a preset formula, and calculating to obtain an initial complex coherence function;
performing time dimension first-order recursive smoothing on the initial complex coherent function to obtain a secondary complex coherent function;
and performing 5-point median filtering processing on the secondary complex coherence function in frequency dimension to obtain the complex coherence function.
Further, the step of calculating the prior snr of the first single-channel frequency-domain signal according to the first single-channel frequency-domain signal and the second noise power spectrum includes:
substituting the first single-channel frequency domain signal and the second noise power spectrum into a second algorithm, and calculating to obtain a posterior signal-to-noise ratio;
and substituting the posterior signal-to-noise ratio into a third algorithm to calculate to obtain the prior signal-to-noise ratio.
Further, the step of converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal includes:
performing frame windowing on the sound signal to obtain a plurality of frame sound sub-signals;
and respectively carrying out fast Fourier transform on the sound sub-signals of each frame to obtain the dual-channel frequency domain signals, wherein the dual-channel frequency domain signals are a set of the dual-channel frequency domain sub-signals respectively corresponding to the sound sub-signals.
The application also provides a pickup apparatus based on two microphones, include:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a sound signal, and the sound signal is a dual-channel time domain signal;
the first conversion module is used for converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal;
the generating module is used for making fixed beams for the two-channel frequency domain data to generate a first single-channel frequency domain signal;
a calculation module for calculating a speech presence probability of the first single-channel frequency-domain signal and a first noise power spectrum of the two-channel frequency-domain signal;
the updating module is used for updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal;
the noise reduction module is used for carrying out noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;
and the second conversion module is used for converting the second single-channel frequency domain signal into a time domain to generate a final audio signal.
Further, the noise reduction module includes:
a first calculating unit, configured to calculate a priori signal-to-noise ratio of the first single-channel frequency-domain signal according to the first single-channel frequency-domain signal and the second noise power spectrum;
the second calculation unit is used for calculating the frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio;
and the filtering unit is used for filtering the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal.
Further, the calculation module includes:
the third calculating unit is used for respectively calculating the self-spectral density and the cross-spectral density of the two-channel frequency domain signal;
the fourth calculating unit is used for calculating a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density;
and the sixth calculating unit is used for respectively calculating the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the dual-channel frequency domain signal according to the complex coherence function.
Further, the sixth calculating unit includes:
the first calculating subunit is configured to substitute the complex interference function into a first algorithm to calculate a CDR ratio of the first single-channel frequency-domain signal;
and the normalizing subunit is used for carrying out normalization processing on the CDR ratio to obtain the voice existence probability.
Further, the fourth calculating unit includes:
the second calculating subunit is used for substituting the self-spectral density and the cross-spectral density into a preset formula to calculate and obtain an initial complex coherence function;
the recursion subunit is used for performing time dimension first-order recursion smoothing on the initial complex coherent function to obtain a secondary complex coherent function;
and the filtering subunit is used for performing 5-point median filtering processing on the secondary complex coherence function in a frequency dimension to obtain the complex coherence function.
Further, the first computing unit includes:
the second calculating subunit is configured to substitute the first single-channel frequency domain signal and the second noise power spectrum into a second algorithm, and calculate to obtain an a posteriori signal-to-noise ratio;
and the third calculation subunit is used for substituting the posterior signal-to-noise ratio into a third algorithm to calculate and obtain the prior signal-to-noise ratio.
Further, the first conversion module includes:
the framing unit is used for framing and windowing the sound signals to obtain a plurality of frame sound sub-signals;
and the first conversion unit is used for respectively carrying out fast Fourier transform on the sound sub-signals of each frame to obtain the dual-channel frequency domain signals, wherein the dual-channel frequency domain signals are a set of dual-channel frequency domain sub-signals respectively corresponding to the sound sub-signals.
The present application further provides a computer device comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of any one of the above methods when executing the computer program.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the method of any of the above.
According to the pickup method, the pickup device and the computer equipment based on the double microphones, the sound signals are received through the double microphones, then the sound signals are converted into the double-channel frequency domain signals, the fixed wave beams are made for the double-channel frequency domain data, and therefore the first single-channel frequency domain signal is generated. And denoising the first single-channel frequency domain signal according to a preset algorithm to obtain a second single-channel frequency domain signal. And finally, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal, and finishing the whole pickup process of the double microphones. This application is realizing the in-process of pickup, only needs two microphones just can accomplish whole pickup process, effectively reduces hardware manufacturing cost. In the noise reduction process, the preset algorithm calculates the voice existence probability and updates the noise spectrum by using the double-microphone coherent function, so that the robustness to far-field reverberation and noise is greatly improved under the condition of small calculation amount, and the pickup effect is effectively improved.
Drawings
Fig. 1 is a schematic diagram illustrating steps of a method for picking up sound based on two microphones according to an embodiment of the present application;
fig. 2 is a block diagram illustrating an overall structure of a sound pickup apparatus based on two microphones according to an embodiment of the present application;
fig. 3 is a block diagram schematically illustrating a structure of a computer device according to an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Referring to fig. 1, an embodiment of the present application provides a method for picking up sound based on two microphones, including:
s1, acquiring a sound signal, wherein the sound signal is a two-channel time domain signal;
s2, converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal;
s3, making fixed wave beams for the two-channel frequency domain signals to generate first single-channel frequency domain signals;
s4, calculating the voice existence probability of the first single-channel frequency domain signal and a first noise power spectrum of the two-channel frequency domain signal;
s5, updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal;
s6, carrying out noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;
and S7, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal.
In this embodiment, the sound pickup system receives a sound signal through two microphones. Wherein, the sound signal is a dual-channel time domain signal. The system firstly carries out framing and windowing on the dual-channel time domain signal to obtain a plurality of frames of dual-channel time domain sub-signals. And then performing fast Fourier transform on each frame of dual-channel time domain sub-signals, transforming each frame of dual-channel time domain sub-signals to a frequency domain to obtain dual-channel frequency domain sub-signals respectively corresponding to each frame of dual-channel time domain sub-signals, wherein the set of each frame of dual-channel frequency domain sub-signals forms the dual-channel frequency domain signals corresponding to the sound signals transformed to the frequency domain. Then, the system makes fixed wave beam to the dual-channel frequency domain signal, namely, the dual-channel frequency domain sub-signals of each frame are respectively input into a first formula to be correspondingly calculated, so that a first single-channel frequency domain signal is generated. Specifically, the first formula is:
Figure 801129DEST_PATH_IMAGE001
wherein, in the step (A),
Figure 311744DEST_PATH_IMAGE002
Figure 357061DEST_PATH_IMAGE003
respectively a microphone 1 and a microphone 2
Figure 221112DEST_PATH_IMAGE004
Frame, short-time spectrum at k frequency points,
Figure 628959DEST_PATH_IMAGE005
is a first single channel frequency domain signal. After the system obtains the first single-channel frequency domain signal, the noise of the first single-channel frequency domain signal is reduced according to a preset algorithm, and therefore a second single-channel frequency domain signal is obtained. The specific process is as follows: the system firstly calculates the dual-channel frequency domain signals through first-order recursive smoothing respectively to obtain the corresponding self-spectral density and cross-spectral density. And then, according to the self-spectral density and the cross-spectral density, calculating to obtain a complex phase interference function of the dual-channel frequency domain signals, wherein the complex phase interference function is used for representing the correlation among all frequencies of the dual-channel signals. The system respectively calculates and obtains the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the double-channel frequency domain signal according to the complex coherence function. And then, updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal. Specifically, the system updates the first noise power spectrum by using the voice existence probability as a smoothing factor, inputs the first single-channel frequency domain signal and the voice existence probability into a fourth formula, and calculates to obtain a second noise power spectrum of the single-channel frequency domain signal. Wherein the fourth formula is:
Figure 248159DEST_PATH_IMAGE006
Figure 984034DEST_PATH_IMAGE007
is the updated second noise power spectrum. Specifically, the first noise power spectrum is a parameter calculated based on a two-channel frequency domain signal, the second noise power spectrum is a parameter calculated based on a single-channel frequency domain signal, the two-channel frequency domain signal and the single-channel frequency domain signal are both vectors, for example, the single-channel frequency domain signal is 256 × 1, and the two channels are 256 × 2, the first noise power spectrum is calculated from 256 × 2 to obtain 256 × 1, and the second noise power spectrum is calculated from 256 × 1 to obtain 256 × 1. The system calculates to obtain the prior signal-to-noise ratio of the first single-channel frequency domain signal according to the first single-channel frequency domain signal and the second noise power spectrum. Based on the prior informationAnd calculating the noise ratio to obtain the frequency domain filter coefficient of the first single-channel frequency domain signal. And finally, the system carries out filtering and noise reduction on the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal. Because the set of the two-channel frequency domain sub-signals of each frame forms the corresponding two-channel frequency domain signal after the sound signal is transformed to the frequency domain, the second channel signal after noise reduction is actually the set of the second single-channel frequency domain sub-signals corresponding to the two-channel frequency domain sub-signals after noise reduction. And the system respectively performs inverse Fourier transform on each second single-channel frequency domain sub-signal to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal. And then, overlapping and adding the second single-channel time domain sub-signals to obtain a final audio signal, and finishing the whole pickup process.
Further, the step of performing noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal includes:
s601, calculating the prior signal-to-noise ratio of the first single-channel frequency domain signal according to the first single-channel frequency domain signal and the second noise power spectrum;
s602, calculating a frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio;
s603, filtering the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal.
In this embodiment, the system inputs the first single-channel frequency domain signal and the second noise power spectrum into the second algorithm, calculates to obtain the posterior signal-to-noise ratio, and then inputs the posterior signal-to-noise ratio into the third algorithm, thereby calculating to obtain the prior signal-to-noise ratio of the first single-channel frequency domain signal. And the system inputs the prior signal-to-noise ratio into a fifth formula, and calculates to obtain the frequency domain filter coefficient of the first single-channel frequency domain signal. Wherein the fifth formula is:
Figure 41989DEST_PATH_IMAGE008
Figure 507605DEST_PATH_IMAGE009
are frequency domain filter coefficients.
The system filters the first single-channel frequency domain signal according to the frequency domain filter coefficient, thereby obtaining a second single-channel frequency domain signal. Wherein, the second single-channel frequency domain signal after noise reduction is:
Figure 969811DEST_PATH_IMAGE010
further, the step of calculating the speech existence probability of the first single-channel frequency-domain signal and the first noise power spectrum of the two-channel frequency-domain signal includes:
s401, respectively calculating the self-spectral density and the cross-spectral density of the dual-channel frequency domain signal;
s402, calculating to obtain a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density;
and S403, respectively calculating the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the dual-channel frequency domain signal according to the complex coherence function.
In this embodiment, the system performs corresponding calculation on the dual-channel frequency domain signal through first-order recursive smoothing, to obtain the self-spectral density and the cross-spectral density of the dual-channel frequency domain signal, where the calculation formula corresponding to the self-spectral density is:
Figure 317615DEST_PATH_IMAGE011
the formula of the cross-spectral density is:
Figure 523469DEST_PATH_IMAGE012
Figure 109171DEST_PATH_IMAGE013
the power spectral density function is represented by a function of,
Figure 866911DEST_PATH_IMAGE014
in order to smooth out the coefficients of the coefficients,
Figure 702012DEST_PATH_IMAGE015
in order to be a self-spectral density,
Figure 180398DEST_PATH_IMAGE016
is the cross-spectral density.
Then, the system substitutes the self-spectral density and the cross-spectral density into a second formula, and calculates to obtain a complex coherence function of the dual-channel frequency domain signal, which is used for representing the correlation between each frequency of the dual-channel signal. Wherein the second formula is:
Figure 429063DEST_PATH_IMAGE017
Figure 826546DEST_PATH_IMAGE018
is a complex coherence function.
The system substitutes the complex coherence function into a first algorithm, calculates the CDR ratio of the first single-channel frequency domain signal, and then normalizes the CDR ratio to obtain the voice existence probability of the first single-channel frequency domain signal. And substituting the complex interference function into a third formula to calculate and obtain a first noise power spectrum of the dual-channel frequency domain signal. Wherein the third formula is:
Figure 289889DEST_PATH_IMAGE019
Figure 962178DEST_PATH_IMAGE020
is a first noise power spectrum.
Further, the step of calculating the voice existence probability of the first single-channel frequency domain signal according to the complex coherence function includes:
s4031, substituting the complex interference function into a first algorithm, and calculating to obtain a CDR ratio of the first single-channel frequency domain signal;
s4032, normalization processing is carried out on the CDR ratio to obtain the voice existence probability.
In this embodiment, the system substitutes the complex coherence function into the sixth formula to calculate the CDR ratio. Wherein the sixth formula is:
Figure 256894DEST_PATH_IMAGE021
Figure 90857DEST_PATH_IMAGE022
Figure 775917DEST_PATH_IMAGE023
for the coherence function of the diffuse noise field, f is the signal frequency, d is the microphone separation, and c is the speed of sound propagation in air. After the CDR ratio is obtained through calculation, the system carries out normalization processing on the CDR ratio, and substitutes the CDR ratio into a seventh formula, so that the existence probability of the voice is obtained through calculation. Wherein the seventh formula is:
Figure 251897DEST_PATH_IMAGE024
and P is the speech presence probability.
Further, the step of calculating a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density includes:
s4021, substituting the self-spectral density and the cross-spectral density into a preset formula, and calculating to obtain an initial complex coherence function;
s4022, performing time dimension first-order recursive smoothing on the initial complex coherent function to obtain a secondary complex coherent function;
s4023, the secondary complex coherence function is subjected to 5-point median filtering of frequency dimension to obtain the complex coherence function.
In this embodiment, after the system calculates the initial complex coherence function according to the self-spectral density and the cross-spectral density, the initial complex coherence function may contain much noise. In order to obtain better noise reduction effect, the system can further filter the initial complex coherence function. Specifically, the system performs time dimension first-order recursive smoothing on the initial complex coherence function, substitutes the initial complex coherence function into an eighth formula, and calculates to obtain a secondary complex coherence function. Wherein the eighth formula is:
Figure 73223DEST_PATH_IMAGE025
Figure 812509DEST_PATH_IMAGE026
i.e. the quadratic complex coherence function.
Then, the system performs 5-point median filtering processing on the secondary complex phase dry function according to a frequency dimension, and performs corresponding calculation according to a ninth formula, so as to obtain a filtered complex phase dry function, namely the complex phase dry function. Wherein the ninth formula is:
Figure 109498DEST_PATH_IMAGE027
wherein, the number of the median filtering points is determined by the staff through the related experiments and then is input into the system,
Figure 998956DEST_PATH_IMAGE028
in the subsequent calculation, the filtered complex coherence function may be used to perform the subsequent corresponding calculation. The filtered complex phase interference function can track the environmental noise change more quickly by matching with a smaller smooth coefficient, so that the noise reduction effect is effectively improved.
Further, the step of calculating the prior snr of the first single-channel frequency-domain signal according to the first single-channel frequency-domain signal and the second noise power spectrum includes:
s6011, substituting the first single-channel frequency domain signal and the second noise power spectrum into a second algorithm, and calculating to obtain a posterior signal-to-noise ratio;
and S6012, substituting the posterior signal-to-noise ratio into a third algorithm, and calculating to obtain the prior signal-to-noise ratio.
In this embodiment, the system uses a decision-directed method, and first substitutes the first single-channel frequency-domain signal and the second noise power spectrum into the second algorithm, thereby calculating to obtain the posterior signal-to-noise ratio. Wherein the second algorithm is:
Figure 65001DEST_PATH_IMAGE029
Figure 975189DEST_PATH_IMAGE030
is the posterior signal-to-noise ratio.
And then, the system substitutes the posterior signal-to-noise ratio calculated in the last step into a third algorithm so as to calculate and obtain the prior signal-to-noise ratio. Wherein the third algorithm is:
Figure 369261DEST_PATH_IMAGE031
Figure 452623DEST_PATH_IMAGE032
is the weight at the last moment.
Further, the step of converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal includes:
s201, performing frame windowing on the sound signal to obtain a plurality of frame sound sub-signals;
s202, performing fast Fourier transform on the sound sub-signals of each frame respectively to obtain the dual-channel frequency domain signals, wherein the dual-channel frequency domain signals are a set of the dual-channel frequency domain sub-signals corresponding to the sound sub-signals respectively.
In this embodiment, the system performs framing and windowing on the dual-channel time domain signal, that is, the sound signal, so as to obtain a plurality of frames of dual-channel time domain sub-signals, which is convenient for subsequent processing such as noise reduction on each frame of dual-channel time domain sub-signal, thereby achieving a better sound pickup effect. The system respectively carries out fast Fourier transform on each frame of double-channel time domain sub-signals, and each frame of double-channel time domain sub-signal is transformed to a frequency domain, so that double-channel frequency domain sub-signals respectively corresponding to each frame of double-channel time domain sub-signal are obtained, and the set of each frame of double-channel frequency domain sub-signal forms a double-channel frequency domain signal corresponding to a sound signal transformed to the frequency domain.
Further, the step of generating a final audio signal by converting the second single-channel frequency-domain signal to the time domain includes:
s701, respectively carrying out inverse Fourier transform on each second single-channel frequency domain sub-signal to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal;
and S702, overlapping and adding the second single-channel time domain sub-signals to obtain the final audio signal.
In this embodiment, since the set of the dual-channel frequency domain sub-signals of each frame forms the corresponding dual-channel frequency domain signal after the sound signal is transformed to the frequency domain, the second channel signal after noise reduction is actually the set of the second single-channel frequency domain sub-signals corresponding to the respective dual-channel frequency domain sub-signals after noise reduction. The system needs to perform inverse fourier transform on each second single-channel frequency domain sub-signal, and convert each second single-channel frequency domain sub-signal into a time domain, so as to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal. And then, the system performs overlapping addition on each second single-channel time domain sub-signal to obtain a final audio signal output, and the whole pickup process is completed.
According to the pickup method based on the double microphones, the sound signals are received through the double microphones, then the sound signals are converted into the double-channel frequency domain signals, fixed beams are made on the double-channel frequency domain data, and therefore the first single-channel frequency domain signal is generated. And denoising the first single-channel frequency domain signal according to a preset algorithm to obtain a second single-channel frequency domain signal. And finally, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal, and finishing the whole pickup process of the double microphones. This application is realizing the in-process of pickup, only needs two microphones just can accomplish whole pickup process, effectively reduces hardware manufacturing cost. In the noise reduction process, the preset algorithm calculates the voice existence probability and updates the noise spectrum by using the double-microphone coherent function, so that the robustness to far-field reverberation and noise is greatly improved under the condition of small calculation amount, and the pickup effect is effectively improved.
Referring to fig. 2, an embodiment of the present application provides a sound pickup apparatus based on two microphones, including:
the system comprises an acquisition module 1, a processing module and a processing module, wherein the acquisition module is used for acquiring a sound signal, and the sound signal is a dual-channel time domain signal;
the first conversion module 2 is configured to convert the sound signal to a frequency domain to obtain a dual-channel frequency domain signal;
the generating module 3 is configured to perform fixed beam processing on the two-channel frequency domain data to generate a first single-channel frequency domain signal;
a calculating module 4, configured to calculate a speech existence probability of the first single-channel frequency-domain signal and a first noise power spectrum of the two-channel frequency-domain signal;
the updating module 5 is configured to update and calculate the first noise power spectrum according to the first single-channel frequency-domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency-domain signal;
the noise reduction module 6 is configured to perform noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;
and a second conversion module 7, configured to convert the second single-channel frequency-domain signal to a time domain, so as to generate a final audio signal.
In this embodiment, the sound pickup system receives a sound signal through two microphones. Wherein, the sound signal is a dual-channel time domain signal. The system firstly carries out framing and windowing on the dual-channel time domain signal to obtain a plurality of frames of dual-channel time domain sub-signals. And then performing fast Fourier transform on each frame of dual-channel time domain sub-signals, transforming each frame of dual-channel time domain sub-signals to a frequency domain to obtain dual-channel frequency domain sub-signals respectively corresponding to each frame of dual-channel time domain sub-signals, wherein the set of each frame of dual-channel frequency domain sub-signals forms the dual-channel frequency domain signals corresponding to the sound signals transformed to the frequency domain. Then, the system makes fixed beam to the dual-channel frequency domain signal, i.e. each frame is dualAnd respectively inputting the channel frequency domain sub-signals into a first formula to perform corresponding calculation, thereby generating a first single-channel frequency domain signal. Specifically, the first formula is:
Figure 576437DEST_PATH_IMAGE033
wherein, in the step (A),
Figure 329630DEST_PATH_IMAGE034
Figure 460265DEST_PATH_IMAGE035
respectively a microphone 1 and a microphone 2
Figure 957106DEST_PATH_IMAGE036
Frame, short-time spectrum at k frequency points,
Figure 732164DEST_PATH_IMAGE037
is a first single channel frequency domain signal. After the system obtains the first single-channel frequency domain signal, the noise of the first single-channel frequency domain signal is reduced according to a preset algorithm, and therefore a second single-channel frequency domain signal is obtained. The specific process is as follows: the system firstly calculates the dual-channel frequency domain signals through first-order recursive smoothing respectively to obtain the corresponding self-spectral density and cross-spectral density. And then, according to the self-spectral density and the cross-spectral density, calculating to obtain a complex phase interference function of the dual-channel frequency domain signals, wherein the complex phase interference function is used for representing the correlation among all frequencies of the dual-channel signals. The system respectively calculates and obtains the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the double-channel frequency domain signal according to the complex coherence function. And then, updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal. Specifically, the system updates the first noise power spectrum by using the voice existence probability as a smoothing factor, inputs the first single-channel frequency domain signal and the voice existence probability into a fourth formula, and calculates to obtain a second noise power spectrum of the single-channel frequency domain signal. Wherein the fourth formula is:
Figure 718574DEST_PATH_IMAGE038
Figure 87239DEST_PATH_IMAGE039
is the updated second noise power spectrum. Specifically, the first noise power spectrum is a parameter calculated based on a two-channel frequency domain signal, the second noise power spectrum is a parameter calculated based on a single-channel frequency domain signal, the two-channel frequency domain signal and the single-channel frequency domain signal are both vectors, for example, the single-channel frequency domain signal is 256 × 1, and the two channels are 256 × 2, the first noise power spectrum is calculated from 256 × 2 to obtain 256 × 1, and the second noise power spectrum is calculated from 256 × 1 to obtain 256 × 1. The system calculates to obtain the prior signal-to-noise ratio of the first single-channel frequency domain signal according to the first single-channel frequency domain signal and the second noise power spectrum. And calculating the frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio. And finally, the system carries out filtering and noise reduction on the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal. Because the set of the two-channel frequency domain sub-signals of each frame forms the corresponding two-channel frequency domain signal after the sound signal is transformed to the frequency domain, the second channel signal after noise reduction is actually the set of the second single-channel frequency domain sub-signals corresponding to the two-channel frequency domain sub-signals after noise reduction. And the system respectively performs inverse Fourier transform on each second single-channel frequency domain sub-signal to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal. And then, overlapping and adding the second single-channel time domain sub-signals to obtain a final audio signal, and finishing the whole pickup process.
Further, the noise reduction module 6 includes:
a first calculating unit, configured to calculate a priori signal-to-noise ratio of the first single-channel frequency-domain signal according to the first single-channel frequency-domain signal and the second noise power spectrum;
the second calculation unit is used for calculating the frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio;
and the filtering unit is used for filtering the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal.
In this embodiment, the system inputs the first single-channel frequency domain signal and the second noise power spectrum into the second algorithm, calculates to obtain the posterior signal-to-noise ratio, and then inputs the posterior signal-to-noise ratio into the third algorithm, thereby calculating to obtain the prior signal-to-noise ratio of the first single-channel frequency domain signal. And the system inputs the prior signal-to-noise ratio into a fifth formula, and calculates to obtain the frequency domain filter coefficient of the first single-channel frequency domain signal. Wherein the fifth formula is:
Figure 246825DEST_PATH_IMAGE040
Figure 17335DEST_PATH_IMAGE041
are frequency domain filter coefficients.
The system filters the first single-channel frequency domain signal according to the frequency domain filter coefficient, thereby obtaining a second single-channel frequency domain signal. Wherein, the second single-channel frequency domain signal after noise reduction is:
Figure 971384DEST_PATH_IMAGE042
further, the calculating module 4 includes:
the third calculating unit is used for respectively calculating the self-spectral density and the cross-spectral density of the two-channel frequency domain signal;
the fourth calculating unit is used for calculating a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density;
and the sixth calculating unit is used for respectively calculating the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the dual-channel frequency domain signal according to the complex coherence function.
In this embodiment, the system performs corresponding calculation on the dual-channel frequency domain signal through first-order recursive smoothing, to obtain the self-spectral density and the cross-spectral density of the dual-channel frequency domain signal, where the calculation formula corresponding to the self-spectral density is:
Figure 624082DEST_PATH_IMAGE043
the formula of the cross-spectral density is:
Figure 197146DEST_PATH_IMAGE044
Figure 681217DEST_PATH_IMAGE013
the power spectral density function is represented by a function of,
Figure 743851DEST_PATH_IMAGE014
in order to smooth out the coefficients of the coefficients,
Figure 555949DEST_PATH_IMAGE045
in order to be a self-spectral density,
Figure 995021DEST_PATH_IMAGE046
is the cross-spectral density.
Then, the system substitutes the self-spectral density and the cross-spectral density into a second formula, and calculates to obtain a complex coherence function of the dual-channel frequency domain signal, which is used for representing the correlation between each frequency of the dual-channel signal. Wherein the second formula is:
Figure 333598DEST_PATH_IMAGE047
Figure 567134DEST_PATH_IMAGE048
is a complex coherence function.
The system substitutes the complex coherence function into a first algorithm, calculates the CDR ratio of the first single-channel frequency domain signal, and then normalizes the CDR ratio to obtain the voice existence probability of the first single-channel frequency domain signal. And substituting the complex interference function into a third formula to calculate and obtain a first noise power spectrum of the dual-channel frequency domain signal. Wherein the third formula is:
Figure 866528DEST_PATH_IMAGE049
Figure 843711DEST_PATH_IMAGE050
is a first noise power spectrum.
Further, the sixth calculating unit includes:
the first calculating subunit is configured to substitute the complex interference function into a first algorithm to calculate a CDR ratio of the first single-channel frequency-domain signal;
and the normalizing subunit is used for carrying out normalization processing on the CDR ratio to obtain the voice existence probability.
In this embodiment, the system substitutes the complex coherence function into the sixth formula to calculate the CDR ratio. Wherein the sixth formula is:
Figure 302374DEST_PATH_IMAGE051
Figure 644494DEST_PATH_IMAGE052
Figure 821397DEST_PATH_IMAGE053
for the coherence function of the diffuse noise field, f is the signal frequency, d is the microphone separation, and c is the speed of sound propagation in air. After the CDR ratio is obtained through calculation, the system carries out normalization processing on the CDR ratio, and substitutes the CDR ratio into a seventh formula, so that the existence probability of the voice is obtained through calculation. Wherein the seventh formula is:
Figure 336692DEST_PATH_IMAGE054
and P is the speech presence probability.
Further, the fourth calculating unit includes:
the second calculating subunit is used for substituting the self-spectral density and the cross-spectral density into a preset formula to calculate and obtain an initial complex coherence function;
the recursion subunit is used for performing time dimension first-order recursion smoothing on the initial complex coherent function to obtain a secondary complex coherent function;
and the filtering subunit is used for performing 5-point median filtering processing on the secondary complex coherence function in a frequency dimension to obtain the complex coherence function.
In this embodiment, after the system calculates the initial complex coherence function according to the self-spectral density and the cross-spectral density, the initial complex coherence function may contain much noise. In order to obtain better noise reduction effect, the system can further filter the initial complex coherence function. Specifically, the system performs time dimension first-order recursive smoothing on the initial complex coherence function, substitutes the initial complex coherence function into an eighth formula, and calculates to obtain a secondary complex coherence function. Wherein the eighth formula is:
Figure 525228DEST_PATH_IMAGE055
Figure 428462DEST_PATH_IMAGE056
i.e. the quadratic complex coherence function.
Then, the system performs 5-point median filtering processing on the secondary complex phase dry function according to a frequency dimension, and performs corresponding calculation according to a ninth formula, so as to obtain the filtered complex phase dry function. Wherein the ninth formula is:
Figure 764766DEST_PATH_IMAGE057
wherein, the number of the median filtering points is determined by the staff through the related experiments and then is input into the system,
Figure 21435DEST_PATH_IMAGE058
in the subsequent calculation, the filtered complex coherence function may be used to perform the subsequent corresponding calculation. The filtered complex phase interference function can track the environmental noise change more quickly by matching with a smaller smooth coefficient, so that the noise reduction effect is effectively improved.
Further, the first computing unit includes:
the second calculating subunit is configured to substitute the first single-channel frequency domain signal and the second noise power spectrum into a second algorithm, and calculate to obtain an a posteriori signal-to-noise ratio;
and the third calculation subunit is used for substituting the posterior signal-to-noise ratio into a third algorithm to calculate and obtain the prior signal-to-noise ratio.
In this embodiment, the system uses a decision-directed method, and first substitutes the first single-channel frequency-domain signal and the second noise power spectrum into the second algorithm, thereby calculating to obtain the posterior signal-to-noise ratio. Wherein the second algorithm is:
Figure 454690DEST_PATH_IMAGE059
Figure 200929DEST_PATH_IMAGE060
is the posterior signal-to-noise ratio.
And then, the system substitutes the posterior signal-to-noise ratio calculated in the last step into a third algorithm so as to calculate and obtain the prior signal-to-noise ratio. Wherein the third algorithm is:
Figure 696632DEST_PATH_IMAGE061
Figure 147205DEST_PATH_IMAGE062
is the weight at the last moment.
Further, the first conversion module 2 includes:
the framing unit is used for framing and windowing the sound signals to obtain a plurality of frame sound sub-signals;
and the first conversion unit is used for respectively carrying out fast Fourier transform on the sound sub-signals of each frame to obtain the dual-channel frequency domain signals, wherein the dual-channel frequency domain signals are a set of dual-channel frequency domain sub-signals respectively corresponding to the sound sub-signals.
In this embodiment, the system performs framing and windowing on the dual-channel time domain signal, that is, the sound signal, so as to obtain a plurality of frames of dual-channel time domain sub-signals, which is convenient for subsequent processing such as noise reduction on each frame of dual-channel time domain sub-signal, thereby achieving a better sound pickup effect. The system respectively carries out fast Fourier transform on each frame of double-channel time domain sub-signals, and each frame of double-channel time domain sub-signal is transformed to a frequency domain, so that double-channel frequency domain sub-signals respectively corresponding to each frame of double-channel time domain sub-signal are obtained, and the set of each frame of double-channel frequency domain sub-signal forms a double-channel frequency domain signal corresponding to a sound signal transformed to the frequency domain.
Further, the second single-channel frequency domain signal is a set of second single-channel frequency domain sub-signals corresponding to each of the two-channel frequency domain sub-signals after noise reduction, and the second converting module 7 includes:
the second conversion unit is used for respectively performing inverse Fourier transform on each second single-channel frequency domain sub-signal to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal;
and the superposition unit is used for performing superposition addition on each second single-channel time domain sub-signal to obtain the final audio signal.
In this embodiment, since the set of the dual-channel frequency domain sub-signals of each frame forms the corresponding dual-channel frequency domain signal after the sound signal is transformed to the frequency domain, the second channel signal after noise reduction is actually the set of the second single-channel frequency domain sub-signals corresponding to the respective dual-channel frequency domain sub-signals after noise reduction. The system needs to perform inverse fourier transform on each second single-channel frequency domain sub-signal, and convert each second single-channel frequency domain sub-signal into a time domain, so as to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal. And then, the system performs overlapping addition on each second single-channel time domain sub-signal to obtain a final audio signal output, and the whole pickup process is completed.
The pickup device based on two microphones that provides in this application receives sound signal through two microphones, then converts sound signal into binary channels frequency domain signal, does fixed beam to binary channels frequency domain data to generate first single channel frequency domain signal. And denoising the first single-channel frequency domain signal according to a preset algorithm to obtain a second single-channel frequency domain signal. And finally, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal, and finishing the whole pickup process of the double microphones. This application is realizing the in-process of pickup, only needs two microphones just can accomplish whole pickup process, effectively reduces hardware manufacturing cost. In the noise reduction process, the preset algorithm calculates the voice existence probability and updates the noise spectrum by using the double-microphone coherent function, so that the robustness to far-field reverberation and noise is greatly improved under the condition of small calculation amount, and the pickup effect is effectively improved.
Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data such as smoothing coefficients. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a two-microphone based sound pickup method.
The processor executes the steps of the sound pickup method based on the two microphones:
s1, acquiring a sound signal, wherein the sound signal is a two-channel time domain signal;
s2, converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal;
s3, making fixed wave beams for the two-channel frequency domain signals to generate first single-channel frequency domain signals;
s4, calculating the voice existence probability of the first single-channel frequency domain signal and a first noise power spectrum of the two-channel frequency domain signal;
s5, updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal;
s6, carrying out noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;
and S7, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal.
Further, the step of performing noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal includes:
s601, calculating the prior signal-to-noise ratio of the first single-channel frequency domain signal according to the first single-channel frequency domain signal and the second noise power spectrum;
s602, calculating a frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio;
s603, filtering the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal.
Further, the step of calculating the speech existence probability of the first single-channel frequency-domain signal and the first noise power spectrum of the two-channel frequency-domain signal includes:
s401, respectively calculating the self-spectral density and the cross-spectral density of the dual-channel frequency domain signal;
s402, calculating to obtain a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density;
and S403, respectively calculating the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the dual-channel frequency domain signal according to the complex coherence function.
Further, the step of calculating the voice existence probability of the first single-channel frequency domain signal according to the complex coherence function includes:
s4031, substituting the complex interference function into a first algorithm, and calculating to obtain a CDR ratio of the first single-channel frequency domain signal;
s4032, normalization processing is carried out on the CDR ratio to obtain the voice existence probability.
Further, the step of calculating a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density includes:
s4021, substituting the self-spectral density and the cross-spectral density into a preset formula, and calculating to obtain an initial complex coherence function;
s4022, performing time dimension first-order recursive smoothing on the initial complex coherent function to obtain a secondary complex coherent function;
s4023, the secondary complex coherence function is subjected to 5-point median filtering of frequency dimension to obtain the complex coherence function.
Further, the step of calculating the prior snr of the first single-channel frequency-domain signal according to the first single-channel frequency-domain signal and the second noise power spectrum includes:
s6011, substituting the first single-channel frequency domain signal and the second noise power spectrum into a second algorithm, and calculating to obtain a posterior signal-to-noise ratio;
and S6012, substituting the posterior signal-to-noise ratio into a third algorithm, and calculating to obtain the prior signal-to-noise ratio.
Further, the step of converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal includes:
s201, performing frame windowing on the sound signal to obtain a plurality of frame sound sub-signals;
s202, performing fast Fourier transform on the sound sub-signals of each frame respectively to obtain the dual-channel frequency domain signals, wherein the dual-channel frequency domain signals are a set of the dual-channel frequency domain sub-signals corresponding to the sound sub-signals respectively.
Further, the step of generating a final audio signal by converting the second single-channel frequency-domain signal to the time domain includes:
s701, respectively carrying out inverse Fourier transform on each second single-channel frequency domain sub-signal to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal;
and S702, overlapping and adding the second single-channel time domain sub-signals to obtain the final audio signal.
An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a sound pickup method based on a dual microphone, and specifically:
s1, acquiring a sound signal, wherein the sound signal is a two-channel time domain signal;
s2, converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal;
s3, making fixed wave beams for the two-channel frequency domain signals to generate first single-channel frequency domain signals;
s4, calculating the voice existence probability of the first single-channel frequency domain signal and a first noise power spectrum of the two-channel frequency domain signal;
s5, updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal;
s6, carrying out noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;
and S7, converting the second single-channel frequency domain signal into a time domain to generate a final audio signal.
Further, the step of performing noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal includes:
s601, calculating the prior signal-to-noise ratio of the first single-channel frequency domain signal according to the first single-channel frequency domain signal and the second noise power spectrum;
s602, calculating a frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio;
s603, filtering the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal.
Further, the step of calculating the speech existence probability of the first single-channel frequency-domain signal and the first noise power spectrum of the two-channel frequency-domain signal includes:
s401, respectively calculating the self-spectral density and the cross-spectral density of the dual-channel frequency domain signal;
s402, calculating to obtain a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density;
and S403, respectively calculating the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the dual-channel frequency domain signal according to the complex coherence function.
Further, the step of calculating the voice existence probability of the first single-channel frequency domain signal according to the complex coherence function includes:
s4031, substituting the complex interference function into a first algorithm, and calculating to obtain a CDR ratio of the first single-channel frequency domain signal;
s4032, normalization processing is carried out on the CDR ratio to obtain the voice existence probability.
Further, the step of calculating a complex coherence function of the dual-channel frequency domain signal according to the self-spectral density and the cross-spectral density includes:
s4021, substituting the self-spectral density and the cross-spectral density into a preset formula, and calculating to obtain an initial complex coherence function;
s4022, performing time dimension first-order recursive smoothing on the initial complex coherent function to obtain a secondary complex coherent function;
s4023, the secondary complex coherence function is subjected to 5-point median filtering of frequency dimension to obtain the complex coherence function.
Further, the step of calculating the prior snr of the first single-channel frequency-domain signal according to the first single-channel frequency-domain signal and the second noise power spectrum includes:
s6011, substituting the first single-channel frequency domain signal and the second noise power spectrum into a second algorithm, and calculating to obtain a posterior signal-to-noise ratio;
and S6012, substituting the posterior signal-to-noise ratio into a third algorithm, and calculating to obtain the prior signal-to-noise ratio.
Further, the step of converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal includes:
s201, performing frame windowing on the sound signal to obtain a plurality of frame sound sub-signals;
s202, performing fast Fourier transform on the sound sub-signals of each frame respectively to obtain the dual-channel frequency domain signals, wherein the dual-channel frequency domain signals are a set of the dual-channel frequency domain sub-signals corresponding to the sound sub-signals respectively.
Further, the step of generating a final audio signal by converting the second single-channel frequency-domain signal to the time domain includes:
s701, respectively carrying out inverse Fourier transform on each second single-channel frequency domain sub-signal to obtain a second single-channel time domain sub-signal corresponding to each second single-channel frequency domain sub-signal;
and S702, overlapping and adding the second single-channel time domain sub-signals to obtain the final audio signal.
It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by hardware associated with instructions of a computer program, which may be stored on a non-volatile computer-readable storage medium, and when executed, may include processes of the above embodiments of the methods. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only for the preferred embodiment of the present application and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.

Claims (10)

1. A pickup method based on two microphones is characterized by comprising the following steps:
acquiring a sound signal, wherein the sound signal is a two-channel time domain signal;
converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal;
making fixed beams on the two-channel frequency domain signals to generate first single-channel frequency domain signals;
calculating a speech presence probability of the first single-channel frequency-domain signal and a first noise power spectrum of the two-channel frequency-domain signal;
updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal;
performing noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;
and converting the second single-channel frequency domain signal into a time domain to generate a final audio signal.
2. The method as claimed in claim 1, wherein the step of denoising the first single-channel frequency-domain signal according to the second noise power spectrum to obtain a second single-channel frequency-domain signal comprises:
calculating the prior signal-to-noise ratio of the first single-channel frequency domain signal according to the first single-channel frequency domain signal and the second noise power spectrum;
calculating to obtain a frequency domain filter coefficient of the first single-channel frequency domain signal according to the prior signal-to-noise ratio;
and filtering the first single-channel frequency domain signal according to the frequency domain filter coefficient to obtain a second single-channel frequency domain signal.
3. The dual-microphone based sound pickup method as claimed in claim 1, wherein the step of calculating the voice existence probability of the first single-channel frequency-domain signal and the first noise power spectrum of the two-channel frequency-domain signal comprises:
respectively calculating the self-spectral density and the cross-spectral density of the two-channel frequency domain signal;
according to the self-spectral density and the cross-spectral density, calculating to obtain a complex coherence function of the dual-channel frequency domain signal;
and respectively calculating the voice existence probability of the first single-channel frequency domain signal and the first noise power spectrum of the dual-channel frequency domain signal according to the complex coherence function.
4. The dual-microphone based sound pickup method as claimed in claim 3, wherein the step of calculating the speech existence probability of the first single-channel frequency-domain signal according to the complex coherence function comprises:
substituting the complex interference function into a first algorithm to calculate the CDR ratio of the first single-channel frequency domain signal;
and carrying out normalization processing on the CDR ratio to obtain the voice existence probability.
5. The dual-microphone based sound pickup method as claimed in claim 3, wherein the step of calculating a complex coherence function of the dual-channel frequency-domain signal according to the self-spectral density and the cross-spectral density comprises:
substituting the self-spectral density and the cross-spectral density into a preset formula, and calculating to obtain an initial complex coherence function;
performing time dimension first-order recursive smoothing on the initial complex coherent function to obtain a secondary complex coherent function;
and performing 5-point median filtering processing on the secondary complex coherence function in frequency dimension to obtain the complex coherence function.
6. The dual-microphone based sound pickup method as claimed in claim 2, wherein the step of calculating the prior snr of the first single-channel frequency-domain signal according to the first single-channel frequency-domain signal and the second noise power spectrum comprises:
substituting the first single-channel frequency domain signal and the second noise power spectrum into a second algorithm, and calculating to obtain a posterior signal-to-noise ratio;
and substituting the posterior signal-to-noise ratio into a third algorithm to calculate to obtain the prior signal-to-noise ratio.
7. The dual-microphone based sound pickup method as claimed in claim 1, wherein the step of converting the sound signal to a frequency domain to obtain a dual-channel frequency domain signal comprises:
performing frame windowing on the sound signal to obtain a plurality of frame sound sub-signals;
and respectively carrying out fast Fourier transform on the sound sub-signals of each frame to obtain the dual-channel frequency domain signals, wherein the dual-channel frequency domain signals are a set of the dual-channel frequency domain sub-signals respectively corresponding to the sound sub-signals.
8. A dual microphone based sound pickup apparatus comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a sound signal, and the sound signal is a dual-channel time domain signal;
the first conversion module is used for converting the sound signal into a frequency domain to obtain a dual-channel frequency domain signal;
the generating module is used for making fixed beams for the two-channel frequency domain data to generate a first single-channel frequency domain signal;
a calculation module for calculating a speech presence probability of the first single-channel frequency-domain signal and a first noise power spectrum of the two-channel frequency-domain signal;
the updating module is used for updating and calculating the first noise power spectrum according to the first single-channel frequency domain signal and the voice existence probability to obtain a second noise power spectrum of the single-channel frequency domain signal;
the noise reduction module is used for carrying out noise reduction processing on the first single-channel frequency domain signal according to the second noise power spectrum to obtain a second single-channel frequency domain signal;
and the second conversion module is used for converting the second single-channel frequency domain signal into a time domain to generate a final audio signal.
9. A computer device comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202010171449.XA 2020-03-12 2020-03-12 Pickup method and apparatus based on double microphones and computer device Active CN111048106B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010171449.XA CN111048106B (en) 2020-03-12 2020-03-12 Pickup method and apparatus based on double microphones and computer device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010171449.XA CN111048106B (en) 2020-03-12 2020-03-12 Pickup method and apparatus based on double microphones and computer device

Publications (2)

Publication Number Publication Date
CN111048106A true CN111048106A (en) 2020-04-21
CN111048106B CN111048106B (en) 2020-06-16

Family

ID=70231145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010171449.XA Active CN111048106B (en) 2020-03-12 2020-03-12 Pickup method and apparatus based on double microphones and computer device

Country Status (1)

Country Link
CN (1) CN111048106B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111489753A (en) * 2020-06-24 2020-08-04 深圳市友杰智新科技有限公司 Anti-noise sound source positioning method and device and computer equipment
CN111986693A (en) * 2020-08-10 2020-11-24 北京小米松果电子有限公司 Audio signal processing method and device, terminal equipment and storage medium
CN112946576A (en) * 2020-12-10 2021-06-11 北京有竹居网络技术有限公司 Sound source positioning method and device and electronic equipment
CN113160846A (en) * 2021-04-22 2021-07-23 维沃移动通信有限公司 Noise suppression method and electronic device
CN113362808A (en) * 2021-06-02 2021-09-07 云知声智能科技股份有限公司 Target direction voice extraction method and device, electronic equipment and storage medium
CN113380266A (en) * 2021-05-28 2021-09-10 中国电子科技集团公司第三研究所 Miniature double-microphone voice enhancement method and miniature double-microphone
CN115132220A (en) * 2022-08-25 2022-09-30 深圳市友杰智新科技有限公司 Method, device, equipment and storage medium for restraining double-microphone awakening of television noise
CN115361617A (en) * 2022-08-15 2022-11-18 音曼(北京)科技有限公司 Non-blind area multi-microphone environmental noise suppression method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8239196B1 (en) * 2011-07-28 2012-08-07 Google Inc. System and method for multi-channel multi-feature speech/noise classification for noise suppression
CN105206281A (en) * 2015-09-14 2015-12-30 胡旻波 Voice enhancement device based on distributed microphone array network
CN106448692A (en) * 2016-07-04 2017-02-22 Tcl集团股份有限公司 RETF reverberation elimination method and system optimized by use of voice existence probability
CN107301869A (en) * 2017-08-17 2017-10-27 珠海全志科技股份有限公司 Microphone array sound pick-up method, processor and its storage medium
CN108922554A (en) * 2018-06-04 2018-11-30 南京信息工程大学 The constant Wave beam forming voice enhancement algorithm of LCMV frequency based on logarithm Power estimation
CN109817209A (en) * 2019-01-16 2019-05-28 深圳市友杰智新科技有限公司 A kind of intelligent speech interactive system based on two-microphone array

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8239196B1 (en) * 2011-07-28 2012-08-07 Google Inc. System and method for multi-channel multi-feature speech/noise classification for noise suppression
CN105206281A (en) * 2015-09-14 2015-12-30 胡旻波 Voice enhancement device based on distributed microphone array network
CN106448692A (en) * 2016-07-04 2017-02-22 Tcl集团股份有限公司 RETF reverberation elimination method and system optimized by use of voice existence probability
CN107301869A (en) * 2017-08-17 2017-10-27 珠海全志科技股份有限公司 Microphone array sound pick-up method, processor and its storage medium
CN108922554A (en) * 2018-06-04 2018-11-30 南京信息工程大学 The constant Wave beam forming voice enhancement algorithm of LCMV frequency based on logarithm Power estimation
CN109817209A (en) * 2019-01-16 2019-05-28 深圳市友杰智新科技有限公司 A kind of intelligent speech interactive system based on two-microphone array

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111489753A (en) * 2020-06-24 2020-08-04 深圳市友杰智新科技有限公司 Anti-noise sound source positioning method and device and computer equipment
CN111986693A (en) * 2020-08-10 2020-11-24 北京小米松果电子有限公司 Audio signal processing method and device, terminal equipment and storage medium
CN112946576A (en) * 2020-12-10 2021-06-11 北京有竹居网络技术有限公司 Sound source positioning method and device and electronic equipment
CN112946576B (en) * 2020-12-10 2023-04-14 北京有竹居网络技术有限公司 Sound source positioning method and device and electronic equipment
CN113160846A (en) * 2021-04-22 2021-07-23 维沃移动通信有限公司 Noise suppression method and electronic device
CN113160846B (en) * 2021-04-22 2024-05-17 维沃移动通信有限公司 Noise suppression method and electronic equipment
CN113380266B (en) * 2021-05-28 2022-06-28 中国电子科技集团公司第三研究所 Miniature dual-microphone speech enhancement method and miniature dual-microphone
CN113380266A (en) * 2021-05-28 2021-09-10 中国电子科技集团公司第三研究所 Miniature double-microphone voice enhancement method and miniature double-microphone
CN113362808B (en) * 2021-06-02 2023-03-21 云知声智能科技股份有限公司 Target direction voice extraction method and device, electronic equipment and storage medium
CN113362808A (en) * 2021-06-02 2021-09-07 云知声智能科技股份有限公司 Target direction voice extraction method and device, electronic equipment and storage medium
CN115361617A (en) * 2022-08-15 2022-11-18 音曼(北京)科技有限公司 Non-blind area multi-microphone environmental noise suppression method
CN115132220A (en) * 2022-08-25 2022-09-30 深圳市友杰智新科技有限公司 Method, device, equipment and storage medium for restraining double-microphone awakening of television noise
CN115132220B (en) * 2022-08-25 2023-02-28 深圳市友杰智新科技有限公司 Method, device, equipment and storage medium for restraining double-microphone awakening of television noise

Also Published As

Publication number Publication date
CN111048106B (en) 2020-06-16

Similar Documents

Publication Publication Date Title
CN111048106B (en) Pickup method and apparatus based on double microphones and computer device
Weninger et al. Discriminatively trained recurrent neural networks for single-channel speech separation
CN113270106B (en) Dual-microphone wind noise suppression method, device, equipment and storage medium
CN110931031A (en) Deep learning voice extraction and noise reduction method fusing bone vibration sensor and microphone signals
CN108200522B (en) Regularization proportion normalization subband self-adaptive filtering method
KR20060086303A (en) Apparatus and method for separating audio signals
JP5195979B2 (en) Signal separation device, signal separation method, and computer program
CN111128220A (en) Dereverberation method, apparatus, device and storage medium
CN112331226B (en) Voice enhancement system and method for active noise reduction system
Mohammadiha et al. Joint acoustic and spectral modeling for speech dereverberation using non-negative representations
CN112435685A (en) Blind source separation method and device for strong reverberation environment, voice equipment and storage medium
US11647344B2 (en) Hearing device with end-to-end neural network
CN112530451A (en) Speech enhancement method based on denoising autoencoder
Li et al. Online Directional Speech Enhancement Using Geometrically Constrained Independent Vector Analysis.
Qi et al. Exploring deep hybrid tensor-to-vector network architectures for regression based speech enhancement
US11622208B2 (en) Apparatus and method for own voice suppression
CN111696573B (en) Sound source signal processing method and device, electronic equipment and storage medium
Albataineh et al. A RobustICA-based algorithmic system for blind separation of convolutive mixtures
Thien et al. Inter-frequency phase difference for phase reconstruction using deep neural networks and maximum likelihood
JP4946330B2 (en) Signal separation apparatus and method
Hossain et al. Dual-transform source separation using sparse nonnegative matrix factorization
CN113724727A (en) Long-short time memory network voice separation algorithm based on beam forming
Yang et al. Speech dereverberation using weighted prediction error with prior learnt from data
Itzhak et al. Quadratic beamforming for magnitude estimation
CN113132848A (en) Filter design method and device and in-ear active noise reduction earphone

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant