CN101685638B - Method and device for enhancing voice signals - Google Patents

Method and device for enhancing voice signals Download PDF

Info

Publication number
CN101685638B
CN101685638B CN2008101987725A CN200810198772A CN101685638B CN 101685638 B CN101685638 B CN 101685638B CN 2008101987725 A CN2008101987725 A CN 2008101987725A CN 200810198772 A CN200810198772 A CN 200810198772A CN 101685638 B CN101685638 B CN 101685638B
Authority
CN
China
Prior art keywords
voice signal
signal
ratio
road
noise ratio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2008101987725A
Other languages
Chinese (zh)
Other versions
CN101685638A (en
Inventor
杨毅
张清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN2008101987725A priority Critical patent/CN101685638B/en
Publication of CN101685638A publication Critical patent/CN101685638A/en
Application granted granted Critical
Publication of CN101685638B publication Critical patent/CN101685638B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method and a device for enhancing voice signals. The method comprises: obtaining the prior signal-to-noise ratio (SNR) and the posterior SNR of each voice signal in multiple voice signals; according to model parameters of the statistic models of the multiple voice signals, obtaining signals parameters of the multiple voice signals; according to the signals parameters as well as the prior SNR and the posterior SNR of each voice signal, obtaining a weighting coefficient of each voice signal which is jointly modeled; using the weighting coefficient of each voice signal to perform weighting to each of the multiple voice signals, so as to obtain weighted multiple voice signals; and combining the weighted multiple voice signals to obtain enhanced voice signals. The technical scheme of the embodiment of the invention fully utilizes relevance among the multiple signals to enhance the SNR of the weighted output signals.

Description

A kind of voice signal Enhancement Method and device
Technical field
The present invention relates to field of voice signal, relate in particular to a kind of voice signal Enhancement Method and device.
Background technology
Actual speech communication mostly occurs in noisy noise circumstance, as, the mobile communication in the factory can be subjected to the influence of machine roar; Voice communication meeting in the train driver cabin is subjected to the interference of motor operation and rail clash.In such noise circumstance, for from Noisy Speech Signal, obtaining pure as far as possible raw tone, improve voice quality, improve the sharpness and the intelligibility of voice, need carry out the voice signal enhancement process to the voice signal of input.
According to the microphone number that picks up voice signal, voice strengthen and to be divided into two types of single channel and hyperchannels.The single channel speech-enhancement system only needs a microphone, and hardware resource requires low, and algorithm complexity is less, but the de-noising performance is preferential.
Mainly adopt the method for frequency domain weighting based on the single channel speech enhancement technique of statistical model.As, the time domain voice signal is carried out Fourier transform, obtain the frequency domain voice signal; The reentry SNR estimation of frequency-region signal estimates to calculate weight according to this; Carry out inverse Fourier transform after according to the weight of calculating the frequency domain voice signal being weighted, obtain through the time domain voice signal after the enhancement process.
The multicenter voice enhanced system uses microphone (microphone) array to obtain the multicenter voice signal, has comprised abundant spatial information and temporal information in the multicenter voice signal, has bigger performance boost space.In the microphone array beam-forming technology based on signal and ARRAY PROCESSING theory, microphone array is made up of one group of microphone arranging by certain geometrical shape.With traditional single microphone voice enhancement algorithm, microphone array has spatial direction, can extract the useful signal of specific direction, can suppress noise to a certain extent relatively.It is a kind of direction and shape by the control wave beam that wave beam forms, and extracts the technology of echo signal from disturb.It is the basic wave beam formation method that a kind of signal to each microphone carries out time delay and addition processing that delay-addition wave beam forms, and its principle as shown in Figure 1.
Wherein S (t) is a sound-source signal, and Yn (t) is n the signal that Mike receives, and Wn is n Mike's a weight, and Z (t) is array output, by signal times that each Mike is received with weight and postpone the signal that addition obtains expecting accordingly.
In realizing process of the present invention, the inventor finds, in the speech enhancement technique of existing multicenter voice enhanced system, the calculating of the weighted value of each the road signal by the input of a plurality of passages all is separate with the signal of other passages inputs, do not consider the correlativity of each channel signal, make that output signal-to-noise ratio is still not high after the weighting.
Summary of the invention
Technical matters to be solved by this invention is, a kind of voice signal Enhancement Method and device are provided, and can make full use of the correlativity of each channel signal, makes the signal after this voice signal strengthens have higher signal to noise ratio (S/N ratio).
For this reason, on the one hand, embodiments of the invention provide a kind of voice signal Enhancement Method, comprising: the priori signal to noise ratio (S/N ratio) and the posteriority signal to noise ratio (S/N ratio) that obtain each road voice signal in the multi-path voice signal; Obtain the signal parameter of described multi-path voice signal according to the model parameter of the statistical model of described multi-path voice signal; According to the priori signal to noise ratio (S/N ratio) and the posteriority signal to noise ratio (S/N ratio) of described signal parameter, described each road voice signal,, obtain the weighting coefficient of described each road voice signal of associating modeling based on the signal delay of least mean-square error estimation and each road signal; The weighting coefficient that utilizes described each road voice signal is weighted respectively each road multi-path voice signal of described correspondence, obtains the multi-path voice signal after the weighting; Multi-path voice signal after the described weighting is synthesized the voice signal after obtaining to strengthen.
On the other hand, embodiments of the invention provide a kind of voice signal intensifier, comprising: the signal to noise ratio (S/N ratio) acquiring unit is used for obtaining the priori signal to noise ratio (S/N ratio) and the posteriority signal to noise ratio (S/N ratio) of each road voice signal of multi-path voice signal; Parameter acquiring unit is used for obtaining according to the model parameter of the statistical model of described multi-path voice signal the signal parameter of described multi-path voice signal; Coefficient calculation unit is used for obtaining according to the priori signal to noise ratio (S/N ratio) of described signal parameter and described each road voice signal and posteriority signal to noise ratio (S/N ratio) the weighting coefficient of described each road voice signal of associating modeling; Weighted units, the weighting coefficient that is used to utilize described each road voice signal is weighted respectively each road multi-path voice signal of described correspondence, obtains the multi-path voice signal after the weighting; Synthesis unit is used for the multi-path voice signal after the described weighting is synthesized, the voice signal after obtaining to strengthen;
Wherein, described coefficient calculation unit also is used for according to the priori signal to noise ratio (S/N ratio) of described signal parameter, described each road voice signal and posteriority signal to noise ratio (S/N ratio), based on the signal delay of least mean-square error estimation and each road signal, obtain the weighting coefficient of described each road voice signal of associating modeling.
In the technical scheme that the embodiment of the invention provided, because when obtaining every road weighting coefficient, the priori signal to noise ratio (S/N ratio) and the posteriority signal to noise ratio (S/N ratio) information of the voice signal of each road input have all been considered, the multichannel input signal has been carried out the associating modeling, make full use of the correlativity between multiple signals, improved the signal to noise ratio (S/N ratio) of the output signal after the weighting.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 postpones the principle schematic that the addition wave beam forms in the existing microphone array beam-forming technology;
Fig. 2 is the schematic flow sheet of a specific embodiment of voice signal Enhancement Method among the present invention;
Fig. 3 is the schematic flow sheet of another specific embodiment of voice signal Enhancement Method among the present invention;
Fig. 4 is the composition synoptic diagram of a specific embodiment of voice signal intensifier among the present invention;
Fig. 5 is the composition synoptic diagram of another specific embodiment of voice signal intensifier among the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.
Below with reference to accompanying drawing embodiments of the invention are described.Referring to Fig. 2, be the schematic flow sheet of a specific embodiment of voice signal Enhancement Method among the present invention.This flow process comprises:
201, obtain the priori signal to noise ratio (S/N ratio) and the posteriority signal to noise ratio (S/N ratio) of each road voice signal in the multi-path voice signal.
202, obtain the signal parameter of described multi-path voice signal according to the model parameter of the statistical model of described multi-path voice signal.
The statistical model of described multi-path voice signal can comprise than traditional Gaussian statistics model (or claim Gauss model) speech model more accurately, as this model of superelevation, laplace model and augmentation Gauss model etc., the Gauss model that the model parameter of above-mentioned these models can be more traditional better embodies the characteristics of signals of multi-path voice signal.Described model parameter can comprise the amplitude of multi-path voice signal, the multi-path voice energy spectral density, and the probability density of multi-path voice signal amplitude can also comprise other model parameters such as gamma function information.
By the multi-path voice signal of reality and the situation of actual speech system, can obtain the signal parameter of this multi-path voice signal under current environment according to model parameter.As, superelevation this Model parameter μ and v.
This model of described superelevation can comprise following example but also be not limited only to following expression example,
p ( A ) = μ v + 1 Γ ( v + 1 ) A v σ S 2 + v - 1 exp { - μ A σ S 2 - 1 }
Laplace model can comprise following example but also be not limited only to following expression example,
p ( A ) = 1 σ S exp { - 2 A σ S }
Wherein, the characteristics of signals of more voice signal is represented to comprise by model parameter: the amplitude of A voice signal,
Figure GDA0000080109700000043
Be the speech energy spectral density, Γ is the Gamma function, and μ and v are the parameter of described this model of superelevation, and p (A) is the probability density of A.)
203, obtain the weighting coefficient of the described voice signal in every road of associating modeling according to the priori signal to noise ratio (S/N ratio) of described model parameter and described each described voice signal in road and posteriority signal to noise ratio (S/N ratio).
Wherein, the method for acquisition weighting coefficient can obtain according to the weighting coefficient of single pass statistical model.As, for this model of superelevation, estimate according to least mean-square error, the weighting coefficient under the single channel of acquisition, shown in following example:
G = u + u 2 + v - 1 / 2 2 γ
Wherein, G is the single channel weighting coefficient,
Figure GDA0000080109700000045
μ and v are this Model parameter of superelevation, and ξ and γ are for estimating priori SNR and posteriority SNR.
Consider the signal delay of each road signal again, can obtain the weighting coefficient of the described voice signal in every road of associating modeling, as, following is the weighting coefficient of the m road voice signal of the associating modeling under this model of superelevation:
G m = ξ m γ m 2 Σ i = 1 M ξ i [ Σ i = 1 M ξ i γ i - μ 2 + ( Σ i = 1 M ξ i γ i - μ 2 ) 2 + ( 2 υ - M ) Σ i = 1 M ξ i ]
Wherein, M is total way of voice signal described in the associating modeling, G mBe the weighting coefficient of m road voice signal, ξ mBe the priori signal to noise ratio (S/N ratio) of m road voice signal, γ mBe the posteriority signal to noise ratio (S/N ratio) of m road voice signal, μ and υ are the parameters of this model of superelevation.
ξ iAnd γ iBe respectively the priori signal to noise ratio (S/N ratio) and the posteriority signal to noise ratio (S/N ratio) of each road voice signal of correspondence.In this example, the span of i is 1~M, promptly uses the estimation (ξ of 1~M road signal iAnd γ i) calculate the weighting coefficient of m road voice signal, considered the correlativity of M road signal, realized the associating modeling.
M respectively from 1 to M value, can be obtained the weighting coefficient of each described voice signal in road of all inputs.
204, be weighted the multi-path voice signal after the acquisition weighting respectively according to the multi-path voice signal of described weighting coefficient to described input.
205, the multi-path voice signal after the described weighting is synthesized the voice signal after obtaining to strengthen.Wherein, synthetic method comprises that the adaptive beam of time domain or frequency domain forms disposal route, or delay-addition wave beam forms disposal route; Can be obtained up to the voice signal that Shao Yilu has carried out the voice signal enhancing by synthetic the processing, common output one road voice signal that only needs gets final product, and still also may carry out different synthesizing to multichannel in practice, obtains the voice signal of the enhancing more than a tunnel.
Form processing if carry out the adaptive beam of frequency domain, then in step 201, can carry out time-frequency conversion respectively, obtain multichannel frequency domain voice signal the multichannel time domain voice signal of input; Reentry the priori signal to noise ratio (S/N ratio) and the posteriority signal to noise ratio (S/N ratio) of described multichannel frequency domain voice signal.Then all be that the frequency domain voice signal is handled in the corresponding subsequent step, in step 205, multichannel frequency domain voice signal after the described weighting is carried out the frequency domain adaptive wave beam form processing, obtain at least one road and carried out the frequency domain voice signal that voice signal strengthens; Again the described frequency domain voice signal that has carried out the voice signal enhancing is carried out time-frequency conversion, obtained to have carried out the time domain voice signal that voice signal strengthens.
When specific implementation, because voice signal is to produce in real time, need carry out the windowing process of branch frame to the voice signal of real-time input, carry out enhancement process respectively with each the overlapping speech frame that obtains after minute frame windowing again, and final according to the actual overlapping situation of result in conjunction with each frame, the voice signal after obtaining in real time to strengthen.
As shown in Figure 3, be the schematic flow sheet of another specific embodiment of voice signal Enhancement Method among the present invention.This flow process comprises:
301, divide the frame windowing: system is input as the time domain Noisy Speech Signal, enters M microphone, as M=4.Divide the frame windowing can adopt half Hanning window (hann window) that overlaps in this example, frame length is 512 points, and the frame length of time domain band noise voice signal output one frame one frame after undue frame windowing is 512 a time domain speech frame, obtains M road time domain speech frame signal.
302, Fourier transform (FFT): above-mentioned M road time domain speech frame is carried out the FFT conversion, change time-domain signal into frequency-region signal.
303, M road frequency domain speech frame is carried out SNR estimation respectively.Wherein, carrying out SNR estimation comprises, calculate the noise energy spectral density and the speech energy spectral density of each passage, and the priori SNR and the posteriority SNR that obtain i passage according to the noise energy spectral density and the speech energy spectral density of i passage, as i during, promptly obtain the priori SNR and the posteriority SNR of M road frequency domain speech frame from 1~M value.
Concrete grammar is as follows:
1, obtains the noise energy spectral density of the estimation on m road according to m road frequency domain speech frame
Figure GDA0000080109700000061
With the speech energy spectral density of estimating
2, according to formula
Figure GDA0000080109700000063
Obtain posteriority SNR
Figure GDA0000080109700000064
Wherein, R (k) is discrete fourier (DFT) amplitude of Noisy Speech Signal;
Obtain priori according to following formula
Figure GDA0000080109700000065
ξ ^ ( k ) = α snr A ^ 2 ( k ) σ ^ N 2 ( k ) + ( 1 - α snr ) F [ γ ( k ) - 1 ]
Wherein, A (k) is the DFT amplitude of voice signal,
Figure GDA0000080109700000067
3, m obtains the priori SNR and the posteriority SNR of M road signal respectively in 1~M value.
304, obtain the signal parameter of described multi-path voice signal according to the model parameter of the statistical model of described multi-path voice signal.
How many central limit theory hypothesis no matter power spectral density function (PDF) of time-domain sampling point is, the distribution of the DFT coefficient of voice and noise all will converge on Gauss PDF, and the prerequisite of this hypothesis is that the correlativity that sampled point is added up between independence and the sampled point separately is very little with respect to frame length.The Gauss model of voice signal is as follows:
p ( A ) = 2 A σ S 2 exp { - A 2 σ S 2 }
Wherein A is the amplitude of voice,
Figure GDA0000080109700000069
Be the speech energy spectral density.
But in actual conditions, there are relevant noise or reverberation usually, make Gauss model can not accurately describe the statistical property of signals with noise.Therefore adopt this model of superelevation to be used as the statistical model of multi-path voice signal in the present embodiment to obtain the signal parameter of multi-path voice signal.
Wherein, the model parameter of this model of superelevation is at all road voice signals, and or not to calculate according to a certain road voice signal separately to obtain, but obtain according to actual conditions, as, the empirical value that obtains according to the current environment situation.
This step there is no fixing ordinal relation with step 301~303, as long as obtained before 304.
305, weighting coefficient calculates.
Consider the relative delay of each road signal, the weighting coefficient that can obtain the described voice signal in every road of associating modeling is:
G m = ξ m γ m 2 Σ i = 1 M ξ i [ Σ i = 1 M ξ i γ i - μ 2 + ( Σ i = 1 M ξ i γ i - μ 2 ) 2 + ( 2 υ - M ) Σ i = 1 M ξ i ]
Wherein, G mBe the weighting coefficient of m road voice signal, ξ mBe the priori SNR of m road voice signal, γ mBe the posteriority SNR of m road voice signal, M is the way of total voice signal, and μ and υ are relevant parameters in this model of superelevation, ξ iAnd γ iRepresent the estimation priori SNR and the posteriority SNR of i corresponding passage respectively, the i value is 1~M.
Like this, just can obtain the weighting coefficient G of associating modeling according to following formula mCalculating, obtain G m, m is 1~M.
306, according to G mM road frequency domain speech frame is weighted, and wherein, the m value is 1~M, the signal after the acquisition M road frequency domain speech frame weighting.
307, the M road frequency domain speech frame after the weighting is carried out adaptive beam and form processing, export one road frequency-region signal.The adaptive beam forming method comprises adaptive beam forming method based on the frequency domain least mean-square error, based on the adaptive beam forming method of feature space, based on adaptive beam forming method of microphone array etc.
Wherein, the adaptive beam formation method based on microphone array comprises:
Suppose that array has M microphone, the frequency domain correlation space matrix representation of received signal is:
R(f)=E[x(f)x H(f)]
Wherein x (f) is the signal phasor of frequency f place array received.
Choose a series of frequency f i, i=1,2...D, wherein D is the frequency sum, adaptive beam formation method is chosen the weight vector w (f) that satisfies following formula and is the optimal weight vector of corresponding frequency on each frequency,
w H(f)a(f)=1
Wherein a (f) is the direction vector of voice signal at this frequency.
Then, be weighted back summation at frequency domain, the signal that can obtain after the weighting is:
min w ( f ) w H ( f ) R ( f ) w ( f )
One road frequency-region signal that can obtain exporting.
308, road frequency-region signal to output carries out inverse Fourier transform (IFFT conversion), obtains one road time-domain signal.
Wherein, for multiframe signal, during owing to the windowing of branch frame, each frame has overlapping, so can handle formation one tunnel time domain voice signal by suitable overlap add through road multiframe signal after the enhancement process.
Adopt the technical scheme that the foregoing description provided,, improved estimated accuracy owing to adopt more accurate speech model to replace traditional Gaussian statistics model; M road input signal is carried out the associating modeling obtain weighting coefficient, made full use of the correlativity between the signal of M road, make that the output signal-to-noise ratio after the weighting improves; Simultaneously, because the relative time domain of coefficient of frequency domain concentrates on certain zone more, and the convergence of adaptive algorithm is to embody by the degree of approximation of approaching these desired values, therefore, adopts the frequency domain adaptive beamforming algorithm can obtain speed of convergence faster.
Accordingly, the present invention also provides the specific embodiment of voice signal intensifier, and as shown in Figure 4, this device comprises:
Signal to noise ratio (S/N ratio) acquiring unit 10 is used for obtaining the priori signal to noise ratio (S/N ratio) and the posteriority signal to noise ratio (S/N ratio) of each road voice signal of multi-path voice signal.As shown in Figure 5, signal to noise ratio (S/N ratio) acquiring unit 10 can comprise: the first time-frequency modular converter 101, be used for the multichannel time domain voice signal of input is carried out time-frequency conversion respectively, and obtain multichannel frequency domain voice signal; Signal to noise ratio (S/N ratio) acquisition module 102 is used to obtain the priori signal to noise ratio (S/N ratio) and the posteriority signal to noise ratio (S/N ratio) of described multichannel frequency domain voice signal.Wherein, the first time-frequency modular converter 101 also is used for the multichannel time domain voice signal of input is carried out the windowing of branch frame, obtains multichannel time domain speech frame, and described multichannel time domain speech frame is carried out time-frequency conversion respectively, obtains multichannel frequency domain speech frame.
Parameter acquiring unit 20 is used for obtaining according to the model parameter of the statistical model of described multi-path voice signal the signal parameter of described multi-path voice signal.The statistical model of described multi-path voice signal can comprise than traditional Gaussian statistics model (or claim Gauss model) speech model more accurately, as this model of superelevation, laplace model and augmentation Gauss model etc., the Gauss model that the model parameter of above-mentioned these models can be more traditional better embodies the characteristics of signals of multi-path voice signal.Described model parameter can comprise the amplitude of multi-path voice signal, the multi-path voice energy spectral density, and the probability density of multi-path voice signal amplitude can also comprise other model parameters such as gamma function information.
Wherein, this model of described superelevation can comprise following example but also be not limited only to following expression example,
p ( A ) = μ v + 1 Γ ( v + 1 ) A v σ S 2 + v - 1 exp { - μ A σ S 2 - 1 }
Laplace model can comprise following example but also be not limited only to following expression example,
p ( A ) = 1 σ S exp { - 2 A σ S }
Wherein, the characteristics of signals of more voice signal is represented to comprise by model parameter: the amplitude of A voice signal,
Figure GDA0000080109700000083
Be the speech energy spectral density, Γ is the Gamma function, and μ and v are the parameter of described this model of superelevation, and p (A) is the probability density of A.)
Coefficient calculation unit 30, be used for obtaining the weighting coefficient of the described voice signal in every road of associating modeling according to the priori signal to noise ratio (S/N ratio) of described model parameter and described each described voice signal in road and posteriority signal to noise ratio (S/N ratio), further can be, priori signal to noise ratio (S/N ratio) and posteriority signal to noise ratio (S/N ratio) according to described signal parameter, described each road voice signal, based on the signal delay of least mean-square error estimation and each road signal, obtain the weighting coefficient of described each road voice signal of associating modeling.
As shown in Figure 5, coefficient calculation unit 30 comprises: value module 301 is used for m is carried out value from 1 to M respectively; M road computing module 302 is used for the value according to 301 couples of m of value module, obtains the weighting coefficient of the described voice signal in m road of associating modeling according to following formula:
G m = ξ m γ m 2 Σ i = 1 M ξ i [ Σ i = 1 M ξ i γ i - μ 2 + ( Σ i = 1 M ξ i γ i - μ 2 ) 2 + ( 2 υ - M ) Σ i = 1 M ξ i ]
Wherein, M is total way of voice signal described in the associating modeling, G mBe the weighting coefficient of m road voice signal, ξ mBe the priori signal to noise ratio (S/N ratio) of m road voice signal, γ mBe the posteriority signal to noise ratio (S/N ratio) of m road voice signal, ξ iAnd γ iBe respectively the priori signal to noise ratio (S/N ratio) and the posteriority signal to noise ratio (S/N ratio) of each road voice signal of correspondence, μ and υ are the parameters of this model of superelevation.
Weighted units 40, the weighting coefficient that is used to utilize described each road voice signal is weighted respectively each road multi-path voice signal of described correspondence, obtains the multi-path voice signal after the weighting;
Synthesis unit 50 is used for the multi-path voice signal after the described weighting is synthesized, the voice signal after obtaining to strengthen.Wherein, synthetic method comprises that the adaptive beam of time domain or frequency domain forms disposal route, or delay-addition wave beam forms disposal route; Can be obtained up to the voice signal that Shao Yilu has carried out the voice signal enhancing by synthetic the processing, common output one road voice signal that only needs gets final product, and still also may carry out different synthesizing to multichannel in practice, obtains the voice signal of the enhancing more than a tunnel.
If adopt the adaptive beam formation method of frequency domain, then as shown in Figure 5, synthesis unit 50 can comprise: wave beam forms module 501, be used for that the multichannel frequency domain voice signal after the described weighting is carried out the frequency domain adaptive wave beam and form processing, obtain at least one road and carried out the frequency domain voice signal that voice signal strengthens, its concrete processing procedure can be referring to the related description among Fig. 3; The second time-frequency modular converter 502 is used for the described frequency domain voice signal that has carried out the voice signal enhancing is carried out time-frequency conversion, has obtained to have carried out the time domain voice signal that voice signal strengthens.
In the technical scheme that present embodiment provided,, improved estimated accuracy owing to adopt more accurate speech model to replace traditional Gaussian statistics model; M road input signal is carried out the associating modeling obtain weighting coefficient, made full use of the correlativity between the signal of M road, make that the output signal-to-noise ratio after the weighting improves; Simultaneously, because the relative time domain of coefficient of frequency domain concentrates on certain zone more, and the convergence of adaptive algorithm is to embody by the degree of approximation of approaching these desired values, therefore, adopts the frequency domain adaptive beamforming algorithm can obtain speed of convergence faster.
Device embodiment described above only is schematic, wherein said unit as the separating component explanation can or can not be physically to separate also, the parts that show as the unit can be or can not be physical locations also, promptly can be positioned at a place, perhaps also can be distributed on a plurality of network element.Can select wherein some or all of module to realize the purpose of present embodiment scheme according to the actual needs.Those of ordinary skills promptly can understand and implement under the situation of not paying performing creative labour.
Through the above description of the embodiments, those skilled in the art can be well understood to each embodiment and can realize by the mode that software adds essential general hardware platform, can certainly pass through hardware.Based on such understanding, the part that technique scheme contributes to prior art in essence in other words can embody with the form of software product, this computer software product can be stored in the computer-readable recording medium, as ROM/RAM, magnetic disc, CD etc., comprise that some instructions are with so that a computer equipment (can be a personal computer, server, perhaps network equipment etc.) carry out the described method of some part of each embodiment or embodiment.
Above-described embodiment does not constitute the qualification to this technical scheme protection domain.Any at above-mentioned embodiment spirit and principle within done modification, be equal to and replace and improvement etc., all should be included within the protection domain of this technical scheme.

Claims (14)

1. a voice signal Enhancement Method is characterized in that, described method comprises:
Obtain the priori signal to noise ratio (S/N ratio) and the posteriority signal to noise ratio (S/N ratio) of each road voice signal in the multi-path voice signal;
Obtain the signal parameter of described multi-path voice signal according to the model parameter of the statistical model of described multi-path voice signal;
According to the priori signal to noise ratio (S/N ratio) and the posteriority signal to noise ratio (S/N ratio) of described signal parameter, described each road voice signal,, obtain the weighting coefficient of described each road voice signal of associating modeling based on the signal delay of least mean-square error estimation and each road signal;
The weighting coefficient that utilizes described each road voice signal is weighted respectively each road multi-path voice signal of described correspondence, obtains the multi-path voice signal after the weighting;
Multi-path voice signal after the described weighting is synthesized the voice signal after obtaining to strengthen.
2. the method for claim 1, it is characterized in that, described statistical model comprises this model of superelevation or laplace model or augmentation Gauss model, and the model parameter of described statistical model comprises parameter in this Model parameter of superelevation or the laplace model or the parameter in the augmentation Gauss model.
3. method as claimed in claim 2 is characterized in that, described statistical model is this model of superelevation, and this model of described superelevation can be expressed as:
p ( A ) = μ v + 1 Γ ( v + 1 ) A v σ S 2 + v - 1 exp { - μ A σ S 2 - 1 }
Wherein, A is the amplitude of voice signal,
Figure FDA0000080109690000012
Be the speech energy spectral density, Γ is the Gamma function, and μ and v are the parameter of described this model of superelevation, and p (A) is the probability density of A.
4. as each described method in the claim 1 to 3, it is characterized in that described priori signal to noise ratio (S/N ratio) and the posteriority signal to noise ratio (S/N ratio) weighting coefficient that obtains the described voice signal in every road of associating modeling according to described model parameter and described each described voice signal in road comprise,
The weighting coefficient that obtains the described voice signal in m road of associating modeling according to this model of superelevation is:
G m = ξ m γ m 2 Σ i = 1 M ξ i [ Σ i = 1 M ξ i γ i - μ 2 + ( Σ i = 1 M ξ i γ i - μ 2 ) 2 + ( 2 υ - M ) Σ i = 1 M ξ i ]
Wherein, M is total way of voice signal described in the associating modeling, G mBe the weighting coefficient of m road voice signal, ξ mBe the priori signal to noise ratio (S/N ratio) of m road voice signal, γ mBe the posteriority signal to noise ratio (S/N ratio) of m road voice signal, ξ iAnd γ iBe respectively the priori signal to noise ratio (S/N ratio) and the posteriority signal to noise ratio (S/N ratio) of each road voice signal of correspondence, μ and υ are the signal parameters that obtains described multi-path voice signal according to the model parameter of the statistical model of described multi-path voice signal;
Described m obtains the weighting coefficient of each described voice signal in road of all inputs respectively from 1 to M value.
5. method as claimed in claim 4 is characterized in that, priori signal to noise ratio (S/N ratio) and posteriority signal to noise ratio (S/N ratio) that described multi-path voice signal according to input obtains each described voice signal in road comprise:
Multichannel time domain voice signal to input carries out time-frequency conversion respectively, obtains multichannel frequency domain voice signal;
Obtain the priori signal to noise ratio (S/N ratio) and the posteriority signal to noise ratio (S/N ratio) of described multichannel frequency domain voice signal.
6. method as claimed in claim 5 is characterized in that, described multi-path voice signal after the described weighting is synthesized, and the voice signal after obtaining to strengthen comprises:
Multichannel frequency domain voice signal after the described weighting is carried out the frequency domain adaptive wave beam form processing, obtain at least one road and carried out the frequency domain voice signal that voice signal strengthens;
The described frequency domain voice signal that has carried out the voice signal enhancing is carried out time-frequency conversion, obtained to have carried out the time domain voice signal that voice signal strengthens.
7. method as claimed in claim 5 is characterized in that,
Carry out time-frequency conversion respectively at described multichannel time domain voice signal, obtain to comprise before the multichannel frequency domain voice signal, the multichannel time domain voice signal of importing is carried out the windowing of branch frame, obtain multichannel time domain speech frame input;
Described multichannel time domain voice signal to input carries out time-frequency conversion respectively, obtains multichannel frequency domain voice signal to be, described multichannel time domain speech frame is carried out time-frequency conversion respectively, obtains multichannel frequency domain speech frame.
8. a voice signal intensifier is characterized in that, described device comprises:
The signal to noise ratio (S/N ratio) acquiring unit is used for obtaining the priori signal to noise ratio (S/N ratio) and the posteriority signal to noise ratio (S/N ratio) of each road voice signal of multi-path voice signal;
Parameter acquiring unit is used for obtaining according to the model parameter of the statistical model of described multi-path voice signal the signal parameter of described multi-path voice signal;
Coefficient calculation unit is used for obtaining according to the priori signal to noise ratio (S/N ratio) of described signal parameter and described each road voice signal and posteriority signal to noise ratio (S/N ratio) the weighting coefficient of described each road voice signal of associating modeling;
Weighted units, the weighting coefficient that is used to utilize described each road voice signal is weighted respectively each road multi-path voice signal of described correspondence, obtains the multi-path voice signal after the weighting;
Synthesis unit is used for the multi-path voice signal after the described weighting is synthesized, the voice signal after obtaining to strengthen;
Wherein, described coefficient calculation unit also is used for according to the priori signal to noise ratio (S/N ratio) of described signal parameter, described each road voice signal and posteriority signal to noise ratio (S/N ratio), based on the signal delay of least mean-square error estimation and each road signal, obtain the weighting coefficient of described each road voice signal of associating modeling.
9. device as claimed in claim 8, it is characterized in that parameter acquiring unit can be used for obtaining according to the parameter in the model parameter of this model of superelevation of described multi-path voice signal or parameter in the laplace model or the augmentation Gauss model signal parameter of described multi-path voice signal.
10. device as claimed in claim 9 is characterized in that, described parameter acquiring unit is used for obtaining according to the parameter of this model of superelevation of described multi-path voice signal the signal parameter of described multi-path voice signal, and this model of described superelevation can be expressed as:
p ( A ) = μ v + 1 Γ ( v + 1 ) A v σ S 2 + v - 1 exp { - μ A σ S 2 - 1 }
Wherein, A is the amplitude of voice signal,
Figure FDA0000080109690000032
Be the speech energy spectral density, Γ is the Gamma function, and μ and v are the parameter of described this model of superelevation, and p (A) is the probability density of A.
11., it is characterized in that described coefficient calculation unit comprises as each described device in the claim 8 to 10:
The value module is used for m is carried out value from 1 to M respectively;
M road computing module is used for according to the value of value module to m, and the weighting coefficient that obtains the described voice signal in m road of associating modeling according to this model of superelevation is:
G m = ξ m γ m 2 Σ i = 1 M ξ i [ Σ i = 1 M ξ i γ i - μ 2 + ( Σ i = 1 M ξ i γ i - μ 2 ) 2 + ( 2 υ - M ) Σ i = 1 M ξ i ]
Wherein, M is total way of voice signal described in the associating modeling, G mBe the weighting coefficient of m road voice signal, ξ mBe the priori signal to noise ratio (S/N ratio) of m road voice signal, γ mBe the posteriority signal to noise ratio (S/N ratio) of m road voice signal, ξ iAnd γ iBe respectively the priori signal to noise ratio (S/N ratio) and the posteriority signal to noise ratio (S/N ratio) of each road voice signal of correspondence, μ and υ are the signal parameters that obtains described multi-path voice signal according to the model parameter of the statistical model of described multi-path voice signal.
12. device as claimed in claim 11 is characterized in that, the signal to noise ratio (S/N ratio) acquiring unit comprises:
The first time-frequency modular converter is used for the multichannel time domain voice signal of input is carried out time-frequency conversion respectively, obtains multichannel frequency domain voice signal;
The signal to noise ratio (S/N ratio) acquisition module is used to obtain the priori signal to noise ratio (S/N ratio) and the posteriority signal to noise ratio (S/N ratio) of described multichannel frequency domain voice signal.
13. device as claimed in claim 11, it is characterized in that the described first time-frequency modular converter also is used for the multichannel time domain voice signal of input is carried out the windowing of branch frame, obtains multichannel time domain speech frame, and described multichannel time domain speech frame carried out time-frequency conversion respectively, obtain multichannel frequency domain speech frame.
14. device as claimed in claim 11 is characterized in that, described synthesis unit comprises:
Wave beam forms module, is used for that the multichannel frequency domain voice signal after the described weighting is carried out the frequency domain adaptive wave beam and forms processing, obtains at least one road and has carried out the frequency domain voice signal that voice signal strengthens;
The second time-frequency modular converter is used for the described frequency domain voice signal that has carried out the voice signal enhancing is carried out time-frequency conversion, has obtained to have carried out the time domain voice signal that voice signal strengthens.
CN2008101987725A 2008-09-25 2008-09-25 Method and device for enhancing voice signals Expired - Fee Related CN101685638B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008101987725A CN101685638B (en) 2008-09-25 2008-09-25 Method and device for enhancing voice signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008101987725A CN101685638B (en) 2008-09-25 2008-09-25 Method and device for enhancing voice signals

Publications (2)

Publication Number Publication Date
CN101685638A CN101685638A (en) 2010-03-31
CN101685638B true CN101685638B (en) 2011-12-21

Family

ID=42048758

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101987725A Expired - Fee Related CN101685638B (en) 2008-09-25 2008-09-25 Method and device for enhancing voice signals

Country Status (1)

Country Link
CN (1) CN101685638B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102568491B (en) * 2010-12-14 2015-01-07 联芯科技有限公司 Noise suppression method and equipment
GB2493327B (en) 2011-07-05 2018-06-06 Skype Processing audio signals
GB2495472B (en) 2011-09-30 2019-07-03 Skype Processing audio signals
GB2495128B (en) 2011-09-30 2018-04-04 Skype Processing signals
GB2495129B (en) 2011-09-30 2017-07-19 Skype Processing signals
GB2496660B (en) 2011-11-18 2014-06-04 Skype Processing audio signals
GB201120392D0 (en) 2011-11-25 2012-01-11 Skype Ltd Processing signals
GB2497343B (en) 2011-12-08 2014-11-26 Skype Processing audio signals
CN103915099B (en) * 2012-12-29 2016-12-28 北京百度网讯科技有限公司 Voice fundamental periodicity detection methods and device
CN109767783B (en) 2019-02-15 2021-02-02 深圳市汇顶科技股份有限公司 Voice enhancement method, device, equipment and storage medium
CN109767781A (en) * 2019-03-06 2019-05-17 哈尔滨工业大学(深圳) Speech separating method, system and storage medium based on super-Gaussian priori speech model and deep learning
CN110517703B (en) * 2019-08-15 2021-12-07 北京小米移动软件有限公司 Sound collection method, device and medium
CN110580911B (en) * 2019-09-02 2020-04-21 青岛科技大学 Beam forming method capable of inhibiting multiple unstable sub-Gaussian interferences
CN110808058B (en) * 2019-11-11 2022-06-21 广州国音智能科技有限公司 Voice enhancement method, device, equipment and readable storage medium
CN111681649B (en) * 2020-05-25 2023-05-02 重庆邮电大学 Speech recognition method, interaction system and achievement management system comprising system
WO2022016406A1 (en) * 2020-07-22 2022-01-27 北京小米移动软件有限公司 Information transmission method and apparatus, and communication device
CN111986693A (en) * 2020-08-10 2020-11-24 北京小米松果电子有限公司 Audio signal processing method and device, terminal equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1149536C (en) * 1998-06-22 2004-05-12 Dspc技术有限公司 Noise suppressor having weighted gain smoothing

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1149536C (en) * 1998-06-22 2004-05-12 Dspc技术有限公司 Noise suppressor having weighted gain smoothing

Also Published As

Publication number Publication date
CN101685638A (en) 2010-03-31

Similar Documents

Publication Publication Date Title
CN101685638B (en) Method and device for enhancing voice signals
CN109584903B (en) Multi-user voice separation method based on deep learning
CN104464750B (en) A kind of speech separating method based on binaural sound sources positioning
DE112010005020B4 (en) Speech signal recovery device and speech signal recovery method
CN102739886B (en) Stereo echo offset method based on echo spectrum estimation and speech existence probability
EP3633676A1 (en) Rnn-based noise reduction method and device for real-time conference
CN102945670B (en) Multi-environment characteristic compensation method for voice recognition system
CN102347028A (en) Double-microphone speech enhancer and speech enhancement method thereof
CN109979476B (en) Method and device for removing reverberation of voice
CN106782590A (en) Based on microphone array Beamforming Method under reverberant ambiance
CN101325061A (en) Audio signal processing method and apparatus for the same
CN107993670A (en) Microphone array voice enhancement method based on statistical model
CN102750956A (en) Method and device for removing reverberation of single channel voice
CN106340292A (en) Voice enhancement method based on continuous noise estimation
CN105393305A (en) Method for processing acoustic signal
CN107018470A (en) A kind of voice recording method and system based on annular microphone array
CN101964934A (en) Binary microphone microarray voice beam forming method
CN104835503A (en) Improved GSC self-adaptive speech enhancement method
CN106233382A (en) A kind of signal processing apparatus that several input audio signals are carried out dereverberation
CN101853665A (en) Method for eliminating noise in voice
CN105280193A (en) Prior signal-to-noise ratio estimating method based on MMSE error criterion
CN105679330A (en) Digital hearing aid noise reduction method based on improved sub-band signal-to-noise ratio estimation
CN111816200B (en) Multi-channel speech enhancement method based on time-frequency domain binary mask
JP2012022120A (en) Sound processing device
CN109637554A (en) MCLP speech dereverberation method based on CDR

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20111221

Termination date: 20190925

CF01 Termination of patent right due to non-payment of annual fee