CN110310650A - A kind of voice enhancement algorithm based on second-order differential microphone array - Google Patents
A kind of voice enhancement algorithm based on second-order differential microphone array Download PDFInfo
- Publication number
- CN110310650A CN110310650A CN201910275383.6A CN201910275383A CN110310650A CN 110310650 A CN110310650 A CN 110310650A CN 201910275383 A CN201910275383 A CN 201910275383A CN 110310650 A CN110310650 A CN 110310650A
- Authority
- CN
- China
- Prior art keywords
- signal
- voice
- microphone
- wave beam
- beam forming
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Abstract
The present invention proposes a kind of voice enhancement algorithm based on second-order differential microphone array, belongs to field of voice signal.This method builds microphone array first and acquires 3 road voice signals of speaker's voice, target voice Wave beam forming signal and noise Wave beam forming signal and framing split-band are extracted using second-order differential algorithm, it is any to choose voice signal framing split-band all the way, it calculates the masking value of each time frequency unit and is smoothed, obtain the time frequency unit value of the enhanced voice of each time frequency unit;Finally by inverse Fourier transform, simultaneously overlap-add obtains the corresponding enhancing signal of speaker's voice to adding window.This method combination beamforming algorithm and Computational auditory scene analysis algorithm, Wave beam forming result is only used as to the estimation of target voice and noise energy, and process is generated to masking value in Computational auditory scene analysis and is optimized, so that masking value is more smoothly suitable for practical application scene, so that reinforcing effect is obvious after final speech synthesis.
Description
Technical field
The invention belongs to audio digital signals process fields, and in particular to a kind of language based on second-order differential microphone array
Sound enhances algorithm.
Background technique
With the development of electronic information technology, voice process technology is used widely in voice interactive system.
And the various noises as present in surroundings are mainly reflected in voice so that the quality of speech processes is greatly affected
The decline of discrimination and the decline of intelligibility.Therefore in speech signal processing, need to be located in advance by speech enhancement technique
Reason removes the noise in voice and interference, to improve the quality of speech processes.
Voice enhancement algorithm based on second-order differential microphone array is more conducive to realize that the voice with good directive property drops
It makes an uproar effect.Wherein mainly use beamforming algorithm and Computational auditory scene analysis algorithm.
Second-order differential microphone array beamforming algorithm often assumes that target sound source direction it is known that by other non-mesh
The voice signal of mark Sounnd source direction carries out different degrees of decaying, to complete the speech enhan-cement on assigned direction, compares single order
Directive property is stronger for algorithm.Assuming that sound is incident on microphone array with the angular direction θ, if signal strength is P0, then when t=0
It carves, first microphone received signal is E1=P0, second microphone received signal is E2(ω, θ)=P0e-jωdcosθ/c, third microphone received signal is E3(ω, θ)=e-jω(d+d′)cosθ/c.Wherein, ω is circular frequency, and d and d ' are
The distance of adjacent microphone, c are the velocity of sound.Its principle is as shown in Figure 1.
Three microphones are arranged as straight line, are chosen wherein adjacent two to (per two adjacent microphone structures
Two first differential Mikes couple are constituted in a pair), by one of signal delay τ of each centering1After (the first delay parameter)
With another signal subtraction, each first differential Mike is calculated separately in this approach to corresponding single order delay differential signal, is incited somebody to action
Any one τ that is delayed again in the two single orders delay differential signal2(the second delay parameter) afterwards with another single order delay inequality
Sub-signal subtracts each other, and does second-order differential, and output signal is transformed to frequency domain and is handled, similar to point of first differential microphone array
The available following analytic process of analysis method:
The signal of first microphone and second microphone are passed through into τ1Time delayed signal subtract each other to obtain:
E12(ω,θ)≈P0ω(τ1+dcosθ/c) (1-2)
Similarly, the signal of second microphone and third microphone are passed through into τ '1Time delayed signal subtract each other to obtain:
E23(ω,θ)≈P0ω(τ1’+d‘cosθ/c)e-jωdcosθ/c (1-4)
By E12Expression formula can be seen that the signal angular frequency for being delayed and subtracting each other in first-order difference microphone array as expression formula
In the factor frequency component that keeps signal different there is different gains, to guarantee that signal Jing Guo difference is undistorted, need
The low-pass filter that a frequency response is 1/ ω is added to eliminate the effect of angular frequency factor pair frequency component.In second-order differential Mike's battle array
In column, second-order differential result is further calculated are as follows:
In order to guarantee that the gain of each frequency component of signal obtained after delay is subtracted each other does not change with the variation of frequency,
E123Angular frequency still only occurs in the form of the factor in (ω, θ) expression formula, if enabling τ1=τ1', d=d',
It is 1/ ω that second differnce signal, which is passed through frequency response,2Low-pass filter obtain:
Computational auditory scene analysis algorithm is simulated by computer and processing of the human ear to sound, the core of the algorithm are
How the information of same sound source is extracted from from the sound bite in the time frequency unit after decomposition.In view of listening for human ear
Feel masking characteristics, the calculating target of Computational auditory scene analysis is exactly desired proportions masking (IRM) value, and expression formula is as follows:
X (m, c) indicates the target voice estimation in m frame c frequency range in formula, and n (m, c) indicates the noise in m frame c frequency range
Estimation.For each time frequency unit, wherein how many partly belongs to the sound source that we want to hear to computer " judgement " out, gives
It gives appropriate masking value to be weighted, the time frequency unit after all maskings is finally reassembled into voice, target can be recovered
The all information of sound source.
The shortcomings that above method, is:
Refer to 1. beamforming algorithm can efficiently use different microphones and collect signal difference in time and carry out height
Tropism denoising, but acquired results and frequency term ω in formula2Correlation, if then output signal can be with without correlation filtering operation
The variation of frequency have serious distortion, and when design of correlation filter is hardware realization, complex is difficult to realize.
2. IRM calculation is more inflexible in Computational auditory scene analysis algorithm, target voice and noise energy first is estimated
Meter may not be accurate, may have an impact to the estimation of masking value, and secondly in practical applications, often target voice and noise come
Source direction difference is and on Annual distribution and uneven away from larger, in all time frequency units with IRM value directly as masking value come
It is weighted and is difficult to be optimal performance.
Summary of the invention
The purpose of the present invention is the shortcomings to overcome prior art, propose a kind of based on second-order differential microphone array
Voice enhancement algorithm.This method novelty combines beamforming algorithm and Computational auditory scene analysis algorithm, by wave beam
The estimation that result is only used as target voice and noise energy is formed, eliminates the influence of frequency term, and divide auditory scene is calculated
Masking value generates process and is optimized and post-processes in analysis, so that masking value is more smoothly suitable for practical application scene, makes
Reinforcing effect is obvious after obtaining final speech synthesis.
The present invention proposes a kind of voice enhancement algorithm based on second-order differential microphone array, which is characterized in that including with
Lower step:
1) microphone array being made of 3 or 4 microphones is constructed;If microphone is 3,3 microphones are lined up
One column, and the 1st microphone is to distance and the 2nd microphone being equidistant to the 3rd microphone of the 2nd microphone;If
Microphone be 4, then 4 microphones form a line, and the 1st microphone to the 2nd microphone distance and the 2nd microphone
Distance to the 3rd microphone is identical, the distance of the 1st microphone to the 3rd microphone and the 3rd microphone to the 4th wheat
Gram wind is equidistant;
2) using the microphone array of step 1) building, choosing 3 microphones, acquisition speaker's voice obtains 3 in real time respectively
Road voice signal;When acquiring signal, the adjusting to a line speaker that microphone is arranged, the rectilinear direction that will be arranged along microphone
It is set to 0 ° of angle and 180 ° of angular direction, the direction for the sound that wherein speaker issues is that 0 ° of direction is denoted as enhancing direction, and is said
It is that 180 ° of directions are denoted as main inhibition direction that words people, which makes a sound opposite direction,;
Wherein, if microphone array is classified as 4 microphones, the 1st, 2,3 microphone is selected to acquire voice signal, or select
1st, 3,4 microphone acquires voice signal;
3) the 3 road voice signals obtained using step 2), the wave beam of target voice is extracted using second-order differential algorithm respectively
Form signal and noise Wave beam forming signal;Wherein, 0 ° of direction of beam direction direction is obtained into target voice Wave beam forming signal,
180 ° of directions of beam direction direction are obtained into noise Wave beam forming signal;Using target voice Wave beam forming signal as target voice
Energy estimation, using noise Wave beam forming signal as estimation of noise energy;
4) calculating of masking value and smooth;Specific step is as follows:
4.1) framing split-band is carried out to the target voice Wave beam forming signal that step 3) obtains and obtains S (λ, μ), to step
3) the noise speech Wave beam forming signal obtained carries out framing split-band and obtains N (λ, μ), the 3 road voices letter obtained from step 2)
Optionally voice signal progress framing split-band obtains Y (λ, μ) all the way in number, and wherein S (λ, μ) indicates target voice Wave beam forming letter
Speech energy on number the μ frequency band of λ frame, N (λ, μ) indicate the noise on noise Wave beam forming signal λ the μ frequency band of frame
Energy, Y (λ, μ) indicate the energy on selected voice signal λ the μ frequency band of frame;
The framing split-band method is as follows: to each signal, enabling 0.02 second as 1 frame, it is 0.01 second that frame, which moves, then to every
One frame carries out Fast Fourier Transform (FFT), each frame is divided into M frequency band, so that it is several multiplied by frequency band that each signal is divided into frame number
Time frequency unit;
4.2) masking value is calculated;It is as follows to shelter function expression:
Wherein, G (λ, μ) indicates the masking value on the μ frequency band of λ frame;
4.3) masking value is smooth;
Signal enhancing ratio δ (λ) is calculated to each frame signal first, expression formula is as follows:
Wherein, M is total frequency band number;
Smoothing filter is calculated, expression formula is as follows:
Wherein, round is the function to round up, and N (λ) is intermediate variable;
Shown in the smoothing method of masking value such as formula (5):
GPF(λ, μ)=| G (λ, μ) | | * H (μ) (5)
Wherein, GPF(λ, μ) indicates that smoothed out masking value on the μ frequency band of λ frame, * indicate convolution operation;
5) speech synthesis;
5.1) each time frequency unit value Y of voice signal selected by step 4.1) (λ, μ) is corresponding with the unit smoothed out
Masking value GPF(λ, μ) is multiplied, and obtains the time frequency unit value of the enhanced voice of the unit;
5.2) speech synthesis:
The time frequency unit of each enhanced voice is subjected to inverse Fourier transform, then simultaneously overlap-add is walked adding window
The corresponding enhancing signal of speaker's voice of rapid 2) acquisition.
The features of the present invention and beneficial effect are:
1) what this method was innovative combines beamforming algorithm and Computational auditory scene analysis algorithm, by Wave beam forming knot
Fruit is only used as the estimation of target voice and noise energy, eliminates the influence of frequency term, and to covering in Computational auditory scene analysis
It covers value generation process to be optimized and post-process, so that masking value is more smoothly suitable for practical application scene.
2) present invention has very outstanding removal effect, signal-to-noise performance on main lobe direction to the noise on fixed-direction
It is excellent.
3) all operations of the invention are completed in the time domain, are avoided and are grasped using Fourier transformation and inverse transformation etc.
Make.
4) present invention solves Computational auditory scene analysis since masking value discontinuously may be used by the method for correcting masking value
It can bring synthesis voice music noise problem.
5) spatial information is more fully utilized by linear three microphones array structure in the present invention, compared with traditional diamylose
Gram wind array structure and single order beamforming algorithm have more excellent directive property, and decaying is brighter when deviateing main lobe direction
It is aobvious.
6) present invention can be used for the microphone module of the equipment such as mobile phone, computer, conference telephone, vehicle-carrying communication, have larger
Practicability.
Detailed description of the invention
Fig. 1 is three microphone second order beamforming algorithm schematic diagram of linear array.
Fig. 2 is target voice in the present invention, noise estimation schematic diagram.
Fig. 3 is to survey τ in the embodiment of the present invention1=0.8d/c, τ2=0 directional pattern.
Fig. 4 is the masking value function schematic diagram that the present invention constructs.
Specific embodiment
The present invention proposes a kind of voice enhancement algorithm based on second-order differential microphone array, with reference to the accompanying drawing and specifically
That the present invention is described in more detail is as follows for embodiment.
This method is combined with Wave beam forming and two kinds of Computational auditory scene analysis calculations with the main distinction of prior art
Method, the main lobe of Wave beam forming are only used for estimation target voice and noise, so according to masking function find out masking value to it is each when
Frequency unit is weighted, and speech enhan-cement post-processing module is added later and is promoted to the sense of hearing comfort level of output voice.
The present invention proposes a kind of voice enhancement algorithm based on second-order differential microphone array, comprising the following steps:
1) microphone array is constructed, the quantity of microphone is 3 or 4 in the microphone array, and model is wanted without special
It asks, if microphone is 3,3 microphone position relationships are to form a line, and first microphone is to second microphone
Distance it is identical at a distance from second microphone to third microphone;If microphone is 4,4 microphone positions are closed
System is forms a line, and first microphone is to the distance and second microphone to third microphone of second microphone
Distance it is identical, while first microphone to third microphone distance and third microphone to the 4th microphone
Apart from identical.(the present embodiment has used 4 mems microphones, and disposing way is to form a line, and enable its successively adjacent spacing
Respectively 1cm, 1cm and 2cm.).
2) using the microphone array of step 1) building, choosing three microphones, acquisition speaker's voice is obtained in real time respectively
Three road voice signals.When acquiring signal, by the adjusting to a line speaker of microphone arrangement, the length of time for obtaining signal is unlimited.
The rectilinear direction arranged along microphone is set to 0 ° of angle and 180 ° of angular direction, the direction for the sound that wherein speaker issues
(0 ° of angle) be enhancing direction, with speaker's exactly opposite direction (180 ° of angles) based on inhibit direction.
Wherein, if microphone array is classified as 4 microphones, the voice signal of first, second and third microphone acquisition can be selected,
Also the voice signal of first and third, four microphone acquisition can be selected.
3) it using three road voice signals obtained in step 2), is calculated using second-order differential algorithm twice, is mentioned respectively
Take the Wave beam forming signal and noise Wave beam forming signal of target voice.
Fig. 2 is target voice in the present invention, noise estimation schematic diagram.Shown in Fig. 2, beam direction is adjusted for the first time and is directed toward 0 °
Target voice, in the present embodiment, set the first delay parameter τ1=0.8d/c, the second delay parameter τ2=0, it can be obtained
Wave beam on the right of Fig. 2 is target voice wave beam.Second of adjustment beam direction is directed toward 180 ° of target voice, in the present embodiment
In, set the first delay parameter τ1=-0.8d/c, the second delay parameter τ2=0, the wave beam that the left side Fig. 2 can be obtained is noise waves
Beam.
Wherein the Wave beam forming signal of 0 ° of main lobe, which can be used as, estimates the energy of target voice, and the wave beam of 180 ° of main lobes
It forms signal and then can be used as and the energy of noise is estimated.
It is noted that being likely to occur the feelings that target voice and noise source direction can not be just opposite in practical application
Condition, but we can fix (0 °) of principal direction of microphone line as enhancing direction, inhibit direction based on opposite direction (180 °).It is real
In the application of border, by microphone array principal direction alignment target sound source, then the noise in all non-main lobe direction sources be will receive not
With the inhibition of degree, deviates the inhibitory effect that the bigger noise of angle of principal direction is subject to and be more obvious, specific inhibitory effect is with angle
The variation relation of degree depends on delay parameter.
In the present embodiment, the selection course of delay parameter is as follows:
By the τ in Fig. 12After being set to 0, totally 4 valve situations of change of second order algorithm are very ingenious, first 0 ° of maximum value of main lobe
By τ2It is fixed, possess 3 secondary lobes in the reverse direction.τ1When being maximized d/c, 180 ° of secondary lobe is eliminated, and only remains 2 side directions
Secondary lobe, and with τ1Reduction, 180 ° of secondary lobes gradually protrude, the package " suction " of main lobe and other 2 secondary lobes are come in, therefore main lobe
It becomes narrow gradually, until τ1Other 2 secondary lobes completely disappear when taking 0, and main lobe also reaches most narrow.Based on such feature and I
Find, two aspect demands of the characteristic extracting module directional of algorithm be it is contradictory, on the one hand wish main lobe as far as possible
It is narrow, then τ1Value as small as possible should just be taken;And on the other hand wish the Wave beam forming on the direction opposite with main lobe
It exports as small as possible, speech energy on opposite direction is all filtered out as far as possible, then τ1Just should be as big as possible, therefore must combine
Practical situations are weighed.τ in experiment1=0.8d/c, τ2The directional pattern of algorithm is as shown in Figure 3 in the case of=0.It can be with
Find out that Wave beam forming completes the extraction to voice on main lobe direction and the inhibition to sidelobe direction signal well.
4) calculating of masking value and smooth;Specific step is as follows:
4.1) framing split-band is carried out to the target voice Wave beam forming signal that step 3) obtains and obtains S (λ, μ), to step
3) the noise speech Wave beam forming signal obtained carries out framing split-band and obtains N (λ, μ), the three road voices letter obtained from step 2)
Optionally voice signal (Noisy Speech Signal) progress framing split-band obtains Y (λ, μ) all the way in number, and wherein S (λ, μ) indicates target
Speech energy on voice Wave beam forming signal λ the μ frequency band of frame, N (λ, μ) indicate noise Wave beam forming signal λ frame μ
Noise energy on a frequency band, Y (λ, μ) indicate the energy on selected Noisy Speech Signal λ the μ frequency band of frame.
The framing split-band method is as follows: to each Wave beam forming signal or voice signal, the calculating side of time frequency unit
Method is that 0.02 second is 1 frame, and it is 0.01 second that frame, which moves, then does Fast Fourier Transform (FFT) to each frame, and each frame is divided into M frequency
Band (usual M can choose 64,128,256 equivalences, be 64 frequency bands in the present embodiment), thus by each Wave beam forming signal/language
Sound signal is divided into frame number multiplied by the several time frequency units of frequency band.
4.2) masking value is calculated.
The present invention, with reference to actual speech characteristic, to reach best masking effect, is specifically calculated when construction shelters function
For function as shown in figure 4, abscissa is the ratio of target voice energy and noise energy in figure, ordinate is covering for the time frequency unit
Cover value.It is as follows to shelter function expression:
Wherein, G (λ, μ) indicates the masking value on the μ frequency band of λ frame.The masking function is with target voice energy and makes an uproar
Acoustic energy ratio is equal to 1 is waypoint, and convex cubic function and convex desired proportions function are fitted under, preferably in high language
Voice is remained in the time frequency unit of sound signal distribution, and with noise signal energy in the time frequency unit for sound signal distribution of speaking in a low voice
The increase of amount is inhibited with being getting faster, and so reaches balance on the signal-to-noise ratio of synthesis voice and sense of hearing comfort level.
4.3) masking value is smooth:
The time frequency unit masking value gone out first according to Wave beam forming feature extraction and Computational auditory scene analysis combined calculation
To estimate the signal-to-noise ratio distribution situation in time domain.Different from the signal-to-noise ratio (SNR) estimation in time frequency unit, time domain signal-to-noise ratio (SNR) estimation needs
Total signal-to-noise ratio of the voice of each frame in time domain in all frequency ranges is calculated, the signal-to-noise ratio between different frame is then compared
Relationship obtains input voice in signal-to-noise ratio distribution situation in different time periods.It is impure due to there was only mixing voice in practice
Voice and pure noise can not directly calculate signal-to-noise ratio, therefore calculate signal enhancing ratio δ for each frame signal in time domain
(λ), expression formula is as follows:
Wherein λ indicates that frame number, μ indicate frequency band number, and M is total frequency band number, and what above formula obtained is to calculate to own in same frame
Frequency band masking after with the energy ratio before masking, as the signal enhancing ratio of this frame, in this, as present frame noise in original signal
The estimation of ratio.Since all masking value G (λ, μ) value ranges are (0,1), so signal enhancing ratio is also between 0 and 1
Number, value illustrates that the frame signal is closer to purified signal in original signal closer to 1, and illustrates original closer to 0
The frame signal is closer to noise signal in beginning signal.
The signal-to-noise ratio distribution situation of the voice signal with noise in the time domain is obtained, so that it may which further corrected Calculation is listened
Feel the masking value in scene analysis.Present invention employs a kind of modification method based on smoothing filter, the ginseng provided with reference to it
The improvement effect that number may be significantly, smoothing filter expression formula are as follows:
Wherein round is the function to round up, and N (λ) is intermediate variable, for calculating smoothing filter H (μ).The filter
Wave device ensure that all values between 0,1, do decaying by a relatively large margin to the masking value on low signal-to-noise ratio frame, and high s/n ratio
On masking value it is then almost unchanged, shown in the smoothing method of masking value such as formula (5):
GPF(λ, μ)=| G (λ, μ) | * H (μ) (5)
Wherein, GPF(λ, μ) indicates smoothed out masking value on the μ frequency band of λ frame.* convolution operation is indicated.
5) speech synthesis;
5.1) each time frequency unit value Y (λ, μ) of Noisy Speech Signal selected by step 4.1) is corresponding with the unit flat
Masking value G after cunningPF(λ, μ) is multiplied, and obtains the time frequency unit value of the enhanced voice of the unit.
5.2) speech synthesis:
The time frequency unit of each enhanced voice is subjected to inverse Fourier transform, then simultaneously overlap-add can obtain adding window
The corresponding enhancing signal of speaker's voice acquired to step 2), the parameter of overlapping is identical as the parameter of framing before, overlapping
Length is the half of frame length, selects Hanning window here, time-domain expression is such as shown in (6):
Wherein, N is the length of window, and numerical value is equal to frame length.N gets N from 1, is the independent variable of window function.
It signal-to-noise ratio improving performance and conventional first order algorithm and is not done in the present embodiment, on final experimental result main lobe direction
The smooth situation comparison of masking value is as shown in table 1.
Main lobe direction signal-to-noise ratio under 1 varying strength noise circumstance of table
It can be seen that the present invention can complete the voice enhanced function on main lobe direction well, for single order or two in table
Order algorithm, and the smooth front and back of masking value is carried out, it can make the maximum enhancing signal-to-noise ratio of main lobe direction promote 10dB or more.
Claims (1)
1. a kind of voice enhancement algorithm based on second-order differential microphone array, which comprises the following steps:
1) microphone array being made of 3 or 4 microphones is constructed;If microphone is 3,3 microphones form a line,
And the 1st microphone is to distance and the 2nd microphone being equidistant to the 3rd microphone of the 2nd microphone;If microphone
It is 4, then 4 microphones form a line, and the 1st microphone is to the distance and the 2nd microphone to the 3rd of the 2nd microphone
The distance of a microphone is identical, the distance of the 1st microphone to the 3rd microphone and the 3rd microphone to the 4th microphone
It is equidistant;
2) using the microphone array of step 1) building, choosing 3 microphones, acquisition speaker's voice obtains 3 road languages in real time respectively
Sound signal;When acquiring signal, the adjusting to a line speaker that microphone is arranged distinguishes the rectilinear direction arranged along microphone
It is set as 0 ° of angle and 180 ° of angular direction, the direction for the sound that wherein speaker issues is that 0 ° of direction is denoted as enhancing direction, with speaker
Making a sound opposite direction is that 180 ° of directions are denoted as main inhibition direction;
Wherein, if microphone array is classified as 4 microphones, the 1st, 2,3 microphone acquisition voice signal of selection, or selection the 1st,
3,4 microphones acquire voice signal;
3) the 3 road voice signals obtained using step 2), the Wave beam forming of target voice is extracted using second-order differential algorithm respectively
Signal and noise Wave beam forming signal;Wherein, 0 ° of direction of beam direction direction is obtained into target voice Wave beam forming signal, by wave
Shu Fangxiang is directed toward 180 ° of directions and obtains noise Wave beam forming signal;Using target voice Wave beam forming signal as target voice energy
Estimation, using noise Wave beam forming signal as estimation of noise energy;
4) calculating of masking value and smooth;Specific step is as follows:
4.1) framing split-band is carried out to the target voice Wave beam forming signal that step 3) obtains and obtains S (λ, μ), step 3) is obtained
To noise speech Wave beam forming signal carry out framing split-band obtain N (λ, μ), from the 3 road voice signals that step 2) obtains
Optionally voice signal carries out framing split-band and obtains Y (λ, μ) all the way, and wherein S (λ, μ) indicates target voice Wave beam forming signal the
Speech energy on the μ frequency band of λ frame, N (λ, μ) indicate the noise energy on noise Wave beam forming signal λ the μ frequency band of frame
Amount, Y (λ, μ) indicate the energy on selected voice signal λ the μ frequency band of frame;
The framing split-band method is as follows: to each signal, enabling 0.02 second as 1 frame, it is 0.01 second that frame, which moves, then to each frame
Fast Fourier Transform (FFT) is carried out, each frame is divided into M frequency band, so that each signal is divided into frame number multiplied by the several time-frequencies of frequency band
Unit;
4.2) masking value is calculated;It is as follows to shelter function expression:
Wherein, G (λ, μ) indicates the masking value on the μ frequency band of λ frame;
4.3) masking value is smooth;
Signal enhancing ratio δ (λ) is calculated to each frame signal first, expression formula is as follows:
Wherein, M is total frequency band number;
Smoothing filter is calculated, expression formula is as follows:
Wherein, round is the function to round up, and N (λ) is intermediate variable;
Shown in the smoothing method of masking value such as formula (5):
GPF(λ, μ)=| G (λ, μ) | * H (μ) (5)
Wherein, GPF(λ, μ) indicates that smoothed out masking value on the μ frequency band of λ frame, * indicate convolution operation;
5) speech synthesis;
5.1) by each time frequency unit value Y of voice signal selected by step 4.1) (λ, μ) smoothed out masking corresponding with the unit
Value GPF(λ, μ) is multiplied, and obtains the time frequency unit value of the enhanced voice of the unit;
5.2) speech synthesis:
The time frequency unit of each enhanced voice is subjected to inverse Fourier transform, then simultaneously overlap-add obtains step 2) to adding window
The corresponding enhancing signal of speaker's voice of acquisition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910275383.6A CN110310650A (en) | 2019-04-08 | 2019-04-08 | A kind of voice enhancement algorithm based on second-order differential microphone array |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910275383.6A CN110310650A (en) | 2019-04-08 | 2019-04-08 | A kind of voice enhancement algorithm based on second-order differential microphone array |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110310650A true CN110310650A (en) | 2019-10-08 |
Family
ID=68074420
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910275383.6A Pending CN110310650A (en) | 2019-04-08 | 2019-04-08 | A kind of voice enhancement algorithm based on second-order differential microphone array |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110310650A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110992963A (en) * | 2019-12-10 | 2020-04-10 | 腾讯科技(深圳)有限公司 | Network communication method, device, computer equipment and storage medium |
CN111863003A (en) * | 2020-07-24 | 2020-10-30 | 苏州思必驰信息科技有限公司 | Voice data enhancement method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080312918A1 (en) * | 2007-06-18 | 2008-12-18 | Samsung Electronics Co., Ltd. | Voice performance evaluation system and method for long-distance voice recognition |
CN101976565A (en) * | 2010-07-09 | 2011-02-16 | 瑞声声学科技(深圳)有限公司 | Dual-microphone-based speech enhancement device and method |
CN102347028A (en) * | 2011-07-14 | 2012-02-08 | 瑞声声学科技(深圳)有限公司 | Double-microphone speech enhancer and speech enhancement method thereof |
CN108389586A (en) * | 2017-05-17 | 2018-08-10 | 宁波桑德纳电子科技有限公司 | A kind of long-range audio collecting device, monitoring device and long-range collection sound method |
CN108806708A (en) * | 2018-06-13 | 2018-11-13 | 中国电子科技集团公司第三研究所 | Voice de-noising method based on Computational auditory scene analysis and generation confrontation network model |
-
2019
- 2019-04-08 CN CN201910275383.6A patent/CN110310650A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080312918A1 (en) * | 2007-06-18 | 2008-12-18 | Samsung Electronics Co., Ltd. | Voice performance evaluation system and method for long-distance voice recognition |
CN101976565A (en) * | 2010-07-09 | 2011-02-16 | 瑞声声学科技(深圳)有限公司 | Dual-microphone-based speech enhancement device and method |
CN102347028A (en) * | 2011-07-14 | 2012-02-08 | 瑞声声学科技(深圳)有限公司 | Double-microphone speech enhancer and speech enhancement method thereof |
CN108389586A (en) * | 2017-05-17 | 2018-08-10 | 宁波桑德纳电子科技有限公司 | A kind of long-range audio collecting device, monitoring device and long-range collection sound method |
CN108806708A (en) * | 2018-06-13 | 2018-11-13 | 中国电子科技集团公司第三研究所 | Voice de-noising method based on Computational auditory scene analysis and generation confrontation network model |
Non-Patent Citations (2)
Title |
---|
GU JUNLONG, ET AL.: "Study of Speech Enhancement Based on the Second-Order Differential Microphone Array", 《2018 2ND INTERNATIONAL CONFERENCE ON IMAGING, SIGNAL PROCESSING AND COMMUNICATION》 * |
THOMAS ESCH,ET AL.: "Efficient musical noise suppression for speech enhancement systems", 《INTERNATIONAL CONFERENCE ON ACOUSTICS SPEECH AND SIGNAL PROCESSING》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110992963A (en) * | 2019-12-10 | 2020-04-10 | 腾讯科技(深圳)有限公司 | Network communication method, device, computer equipment and storage medium |
CN110992963B (en) * | 2019-12-10 | 2023-09-29 | 腾讯科技(深圳)有限公司 | Network communication method, device, computer equipment and storage medium |
CN111863003A (en) * | 2020-07-24 | 2020-10-30 | 苏州思必驰信息科技有限公司 | Voice data enhancement method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8654990B2 (en) | Multiple microphone based directional sound filter | |
CN107221336B (en) | Device and method for enhancing target voice | |
CN102164328B (en) | Audio input system used in home environment based on microphone array | |
JP3484112B2 (en) | Noise component suppression processing apparatus and noise component suppression processing method | |
CN107479030B (en) | Frequency division and improved generalized cross-correlation based binaural time delay estimation method | |
AU2010346387B2 (en) | Device and method for direction dependent spatial noise reduction | |
Lotter et al. | Dual-channel speech enhancement by superdirective beamforming | |
KR101060301B1 (en) | Method and apparatus for adjusting mismatch of device or signal in sensor array | |
US8965003B2 (en) | Signal processing using spatial filter | |
CN103907152B (en) | The method and system suppressing for audio signal noise | |
WO2015196729A1 (en) | Microphone array speech enhancement method and device | |
US9532149B2 (en) | Method of signal processing in a hearing aid system and a hearing aid system | |
WO1995008248A1 (en) | Noise reduction system for binaural hearing aid | |
JP2013543987A (en) | System, method, apparatus and computer readable medium for far-field multi-source tracking and separation | |
AU2011334840A1 (en) | Apparatus and method for spatially selective sound acquisition by acoustic triangulation | |
CN102204281A (en) | A system and method for producing a directional output signal | |
CN110827847B (en) | Microphone array voice denoising and enhancing method with low signal-to-noise ratio and remarkable growth | |
US11381909B2 (en) | Method and apparatus for forming differential beam, method and apparatus for processing signal, and chip | |
CN110310650A (en) | A kind of voice enhancement algorithm based on second-order differential microphone array | |
CN103945291A (en) | Method and device for achieving orientation voice transmission through two microphones | |
US20230319469A1 (en) | Suppressing Spatial Noise in Multi-Microphone Devices | |
Madhu et al. | Localisation-based, situation-adaptive mask generation for source separation | |
CN206728234U (en) | A kind of long-range sound collector of audio/video linkage | |
Jingzhou et al. | End-fire microphone array based on phase difference enhancement algorithm | |
Lotter et al. | A stereo input-output superdirective beamformer for dual channel noise reduction. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20191008 |
|
WD01 | Invention patent application deemed withdrawn after publication |