CN103811020A

CN103811020A - Smart voice processing method

Info

Publication number: CN103811020A
Application number: CN201410081493.6A
Authority: CN
Inventors: 王�义; 魏阳杰; 陈瑶; 关楠
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2014-03-05
Filing date: 2014-03-05
Publication date: 2014-05-21
Anticipated expiration: 2034-03-05
Also published as: CN103811020B

Abstract

The invention relates to a smart voice processing method and belongs to the technical field of information processing. The smart voice processing method is characterized by achieving smart identification of identities of multiple talkers in the multi-person voice environment and separating mixed voices by building a talker voice model base so as to obtain independent voice of each talker, amplifying voices of the talkers to be heard according to user demands and eliminating voices of talkers not required by the users. Different from the traditional hearing aid, the smart voice processing method can be used for automatically providing needed voices for the users according to user demands, reduces interference of non-target voices except for noise, and has individuation, interaction and intelligentization.

Description

A kind of intelligent sound disposal route

Technical field

The invention belongs to technical field of information processing, be specifically related to a kind of intelligent sound disposal route.

Background technology

According to the up-to-date assessment data demonstration of the World Health Organization (WHO) (WHO) in 2013 issue, the whole world has at present 3.6 hundred million people and has dysaudia in various degree, accounts for 5% of global total population.The use of hearing aid product can compensate Deaf and Hard of Hearing Talents's hearing loss effectively, improves their live and work quality.But the research of current hearing assistance system correlation technique still concentrates on squelch and source magnitude of sound is amplified two aspects, seldom relates to modeling and many sound sources Technology of Auto Separation based on sound characteristic.In the time that practical application scene is very complicated, for example: when party, multiple speakers are sounding simultaneously, or even with background sounds such as music, because hearing assistance system cannot be isolated interested target voice from mixed Speech input, the hearing burden that simple intensity of sound expanded functionality can only increase user even injures, and can not bring effective Speech input and understanding.Therefore, for the technological deficiency of current hearing assistance system, design a novel hearing assistance system specific sound object recognition function, more intelligent and personalized that has, have very important significance.

Summary of the invention

The deficiency existing for prior art, the present invention proposes a kind of intelligent sound disposal route, guarantees that to reach user obtains pure sound according to the demand of oneself and receives and amplify, and realizes the intellectuality of hearing assistance system, interactive and personalized object.

A kind of intelligent sound disposal route, comprises the following steps:

Step 1, collecting sample voice segments build sample voice storehouse, and sample voice is carried out to feature extraction, obtain characteristic parameter, and characteristic parameter is trained;

Detailed process is as follows:

Step 1-1, collecting sample voice segments, carry out discretize processing by the voice segments of collection, extracts the Mel frequency cepstral coefficient of voice signal as phonic signal character parameter, and set up gauss hybrid models;

Model formation is as follows:

p (XIG) = Σ_{i = 1}^{M} p_{i} b_{i} (X) - - - (1)

Wherein, the probability in the model that p (XIG) expression sample voice characteristic parameter is G in model parameter;

G represents gauss hybrid models parameter set, G={p _i, μ _i, ∑ _i, i=1,2 ..., I;

I represents single Gauss model number in gauss hybrid models;

P _irepresent the weight coefficient of i single Gauss model,

μ _irepresent the mean value vector of i single Gauss model;

∑ _irepresent the covariance matrix of i single Gauss model;

X represents sample voice characteristic parameter, X={x ₁, x ₂..., x _t, the number of T representation feature vector;

B _i(X) density function of i single Gauss model of expression, b _i(X)=N (μ _i, ∑ _i), N (.) represents the density function of standard Gaussian distribution;

Step 1-2, utilize phonic signal character parameter training gauss hybrid models;

Adopt k means clustering algorithm to carry out cluster to phonic signal character parameter, obtain gauss hybrid models parameter set initial value G ₀={ p _i ⁰, μ _i ⁰, ∑ _i ⁰, i=1,2 ..., I; And according to the gauss hybrid models parameter set initial value obtaining, adopt greatest hope algorithm to estimate model, and then obtain gauss hybrid models parameter, complete the training of characteristic parameter;

Step 2, the microphone array collection test environment sound signal that adopts M microphone to form, determine the direction that this ambient sound source of sound number and each sound source wave beam arrive, and sound source is to the incident angle of microphone array;

Detailed process is as follows:

The microphone array of step 2-1, M microphone composition of employing gathers the mixed audio signal of test environment, and the mixed audio signal gathering is carried out to discretize processing, obtains the amplitude of each sampled point;

Step 2-2, the amplitude of each sampled point is carried out to matrixing, obtain the mixed audio matrix that each microphone collects; Above-mentioned mixed audio matrix column number is one, and line number is sampled point number, the amplitude that in matrix, element is each sampled point;

Step 2-3, the mixed audio matrix collecting according to each microphone and microphone number, the estimated value of the vector covariance matrix of the mixed audio signal of acquisition test environment;

The estimated value formula of vector covariance matrix is as follows:

R_{xx} = \frac{1}{M} Σ_{m = 1}^{M} X (m) X^{H} (m) - - - (2)

Wherein, R _xxrepresent the estimated value of the vector covariance matrix of the mixed audio signal of test environment;

X (m) represents m the mixed audio matrix that microphone collects;

X ^h(m) represent m the mixed audio transpose of a matrix matrix that microphone collects;

Step 2-4, the estimated value of vector covariance matrix is carried out to Eigenvalues Decomposition, obtain eigenwert, and eigenwert is sorted from big to small, determine that eigenwert is greater than the number of threshold value, is the number of sound source;

Step 2-5, microphone number is deducted to sound source number obtain noise source number, and then the corresponding noise matrix that obtains;

Step 2-6, obtain the steering vector of microphone array for the orientation angle of array center and the wave beam arrival direction of sound source according to the wavelength of the distance between each microphone and array center, mixed audio signal, microphone, then obtain the angular spectrum function of mixed audio signal according to the steering vector of noise matrix and microphone array;

The angular spectrum function formula of mixed audio signal is as follows:

P (θ) = \frac{1}{α^{H} (θ) V_{u} {V^{H}}_{u} α (θ)} - - - (3)

Wherein, P (θ) represents the angular spectrum function of mixed audio signal;

α (θ) represents the steering vector of microphone array, α (θ)=(α ₁(θ) ..., α _m(θ) ..., α _m(θ)), wherein,

j represents imaginary unit, k=2 π/λ, and λ represents the wavelength of mixed audio signal, d _mrepresent the distance of m microphone and array center,

represent the orientation angle of m microphone for array center;

θ represents the wave beam arrival direction of sound source;

α ^h(θ) transposed matrix of the steering vector of expression microphone array;

V _urepresent noise matrix;

V ^h _urepresent noise transpose of a matrix matrix;

Step 2-7, according to the waveform of the angular spectrum function of mixed audio signal, descending multiple peak values of choosing this waveform, select the number of peak value to be the number of sound source;

Step 2-8, determine choose the angle value that peak value is corresponding, obtain the wave beam arrival direction of each sound source;

Step 3, according to the transformational relation between the sound signal of each sound source, sound source and microphone, obtain the acoustic pressure gradient of microphone the microphone array acoustic pressure, microphone array horizontal direction acoustic pressure gradient and the microphone array vertical direction that receive;

Microphone array sound pressure signal formula is as follows:

p_{w} (t) = Σ_{n = 1}^{N} 0.5 Σ_{m = 1}^{M} h_{mn} (t) s_{n} (t) - - - (4)

Wherein, p _w(t) represent t moment microphone array acoustic pressure;

N represents sound source number;

T represents the time;

S _n(t) sound signal of n sound source of expression;

H _mn(t) represent the transition matrix between n sound source and m microphone, h _mn(t)=p ₀(t) α _m(θ _n(t)), p ₀(t) represent the microphone array center acoustic pressure that the t moment is caused by sound wave; α _m(θ _n(t)) be illustrated in the steering vector of t moment m microphone about n sound source, wherein, θ _n(t) the wave beam arrival direction of expression t moment n sound source;

Microphone array horizontal direction acoustic pressure gradient formula is as follows:

Wherein, p _x(t) represent microphone array horizontal direction acoustic pressure gradient;

The acoustic pressure gradient formula of microphone array vertical direction is as follows:

Wherein, p _y(t) acoustic pressure gradient of expression microphone array vertical direction;

Step 4, employing Fourier transform are transformed into frequency domain by the acoustic pressure gradient of microphone array center acoustic pressure, microphone array horizontal direction acoustic pressure gradient and microphone array vertical direction from time domain;

Step 5, microphone array acoustic pressure, microphone array horizontal direction gradient and microphone array vertical direction acoustic pressure gradient according in frequency domain, obtain the strength vector formula of the sound pressure signal in frequency field, and then derive strength vector direction;

The strength vector formula of the sound pressure signal in frequency field is:

I (ω, t) = \frac{1}{ρ_{0} c} [Re {{p_{w}}^{*} (ω, t) p_{x} (ω, t)} u_{x} + Re {{p_{w}}^{*} (ω, t) p_{y} (ω, t)} u_{y}] - - - (7)

Wherein, I (ω, t) represents the strength vector of the sound pressure signal in frequency field;

P ₀represent test environment atmospheric density;

C represents the velocity of sound;

Re[.] represent to get real part;

P _w ^*(ω, t) represents the conjugate matrices of the microphone array acoustic pressure in frequency domain;

P _x(ω, t) represents the microphone array horizontal direction acoustic pressure gradient in frequency domain;

P _y(ω, t) represents the microphone array vertical direction acoustic pressure gradient in frequency domain;

U _xrepresent abscissa axis direction unit vector;

U _yrepresent axis of ordinates direction unit vector;

Strength vector direction formula is as follows:

γ (ω, t) = \tan^{- 1} [\frac{Re {{p_{w}}^{*} (ω, t) p_{y} (ω, t)}}{{p_{w}}^{*} (ω, t) p_{x} (ω, t)}] - - - (8)

The strength vector direction of the sound pressure signal of the morbid sound that wherein, γ (ω, t) expression microphone array receives;

Step 6, to strength vector direction add up obtain its probability density distribution, adopt mixing Feng meter Xiu Si to distribute and carry out matching, obtain voice strength vector direction and obey the model parameter of mixing Feng meter Xiu Si distribution, and then obtain the strength vector directivity function of each sound pressure signal;

Detailed process is as follows:

Step 6-1, strength vector direction is added up and obtained its probability density distribution, adopt and mix Feng meter Xiu Si and distribute and carry out matching, obtain the model parameter collection that mixing Feng meter Xiu Si that the strength vector direction of voice obeys distributes;

Described mixing Feng meter Xiu Si distributed model formula is as follows:

g (θ) = Σ_{n = 1}^{N} α_{n} f (θ; k_{n}) - - - (10)

Wherein, represent to mix Feng meter Xiu Si distribution probability density;

represent morbid sound orientation angle;

α _nrepresent the weight of the strength vector directivity function of the sound pressure signal of n sound source;

wherein, I ₀(k _n) n the single order modified Bessel function that sound source is corresponding of expression, k _nrepresent the single Feng meter Xiu Si corresponding concentration parameter that distributes that the strength vector direction of n sound source sound pressure signal is obeyed, the inverse of the variance that Feng meter Xiu Si distributes;

Mix Feng meter Xiu Si distribution function parameter set as follows:

Γ＝{α _n，k _n}，i＝1，...，N (11)

Step 6-2, initialization model parameter, obtain initial function parameter set;

Step 6-3, according to the initial model parameter obtaining, adopt greatest hope algorithm to estimate the parameter that obtains mixing Feng meter Xiu Si distributed model;

Step 6-4, according to estimating the mixing Feng meter Xiu Si distributed model parameter that obtains, try to achieve the strength vector directivity function of each sound pressure signal;

The strength vector directivity function formula of sound pressure signal is as follows:

I_{n} (θ; ω, t) = α_{n} f (θ; k_{n}) - - - (12)

Wherein,

represent the strength vector directivity function of n sound source;

Strength vector directivity function and the microphone array acoustic pressure of each sound pressure signal that step 7, basis obtain, obtain each sound source in frequency domain signal, and adopt Fourier inversion that the each sound-source signal in this frequency domain is converted to the sound-source signal in time domain;

The signal formula of each sound source in frequency domain is as follows:

{\tilde{s}}_{n} (ω, t) = p_{w} (ω, t) I_{n} (θ; ω, t) - - - (13)

Wherein,

represent the frequency domain signal of n the sound-source signal obtaining after mixing voice separates;

Will

obtain time-domain signal through Fourier inversion

Step 8, calculate the matching probability of specifying sound source in each sound source signals and sample voice storehouse, selecting the sound source of probable value maximum is target sound source, retains this sound source signals, deletes other non-target sound sources;

In each sound source signals and sample voice storehouse, specify the matching probability formula of sound source as follows:

C ({\tilde{X}}_{n}) = \log [P ({\tilde{X}}_{n} | G_{c})] - - - (14)

In formula:

represent by voice after separating

the speech characteristic parameter extracting, extracts voice

mel frequency cepstral coefficient as voice characteristic parameter;

represent to specify in n sound source signals and sample voice storehouse the matching probability of sound source;

G _crepresent user nominator's acoustic model parameters;

represent to separate the probability that rear voice belong to user nominator sound;

Step 9, to retain sound source signals amplify, complete the amplification to specified voice source in test environment.

Threshold value span described in step 2-4 is 10 ^-2～10 ^-16.

α described in step 6-1 _nget the random number in 0～1, and meet

k _nget the random number in 1～700.

Advantage of the present invention:

A kind of intelligent sound disposal route of the present invention, talk with people's sound model bank by foundation, the identity that realize the multiple dialogue of Intelligent Recognition people under multi-person speech environment separate mixing voice simultaneously and obtain each dialogue people's independent voice, are the voice that the dialogue people that non-user requires eliminated in voice that user amplifies the dialogue people that will listen to simultaneously according to user's request; Different from traditional hearing aid, thus the method can, according to individual subscriber demand automatically for user provides its required sound, reduce the interference of the non-target voice except noise, has embodied the personalization, interactive and intelligent of the method.

Accompanying drawing explanation

Fig. 1 is the intelligent sound process flow figure of an embodiment of the present invention;

Fig. 2 is the modeling sound source data schematic diagram of an embodiment of the present invention, and wherein, figure (a) represents the sound of first man

Schematic diagram data, figure (b) represents second people's voice data schematic diagram, figure (c) represents the 3rd people's voice data schematic diagram;

Fig. 3 is the sound source data schematic diagram of an embodiment of the present invention for sound mix, wherein, figure (a) represents the schematic diagram data of first sound source of sound, figure (b) represents the schematic diagram data of second sound source of sound, and figure (c) represents the schematic diagram data of the 3rd sound source;

Fig. 4 is the microphone array schematic diagram of an embodiment of the present invention;

Fig. 5 is four schematic diagram datas that microphone receives of an embodiment of the present invention, wherein, figure (a) represents the morbid sound signal schematic representation that first microphone receives, figure (b) represents second morbid sound signal schematic representation that microphone receives, figure (c) represents the 3rd the morbid sound signal schematic representation that microphone receives, and figure (d) represents the 4th the morbid sound signal schematic representation that microphone receives;

Fig. 6 is the schematic diagram after the data sampling that receives of four microphones of an embodiment of the present invention, wherein, figure (a) represents schematic diagram after morbid sound signal sampling that first microphone receives, schematic diagram after the morbid sound signal sampling that second microphone of figure (b) expression receives, schematic diagram after the morbid sound signal sampling that the 3rd microphone of figure (c) expression receives, schematic diagram after the morbid sound signal sampling that the 4th microphone of figure (d) expression receives;

Fig. 7 is the Estimation of Spatial Spectrum schematic diagram of the mixed signal of an embodiment of the present invention;

Fig. 8 is the morbid sound direction vector distribution probability density map of an embodiment of the present invention;

Fig. 9 is that the maximum likelihood of an embodiment of the present invention estimates to mix Feng meter Xiu Si model schematic diagram;

Figure 10 be an embodiment of the present invention desirable voice with separate after obtain Speech comparison figure, wherein, figure (a) is the original sound signal of first sound source of sound, figure (b) is the original sound signal of first sound source of sound after separating, figure (c) is the original sound signal of second sound source of sound, figure (d) is the original sound signal of second sound source of sound after separating, figure (e) is the original sound signal of the 3rd sound source, and figure (f) is the original sound signal of the 3rd sound source after separating.

Embodiment

Below in conjunction with accompanying drawing, an embodiment of the present invention is described further.

In the embodiment of the present invention, model system is mainly divided into pronunciation modeling module and two modules of the dynamic real-time processing module of voice, wherein pronunciation modeling module realizes speaker's pronunciation modeling, the dynamic real-time processing module of voice realizes under complicated voice environment, mix the direction of voice and locate and separate, mixing voice identification and extraction (being the extraction amplification of target sound and the shielding of all the other sound).

A kind of intelligent sound disposal route, method flow diagram as shown in Figure 1, comprises the following steps:

Step 1, collecting sample voice segments build sample voice storehouse, and sample voice is carried out to feature extraction, obtain characteristic parameter, and characteristic parameter is trained; Detailed process is as follows:

Step 1-1, record sample voice section in quiet indoor environment, the voice segments of collection is carried out to discretize processing, extract the Mel frequency cepstral coefficient (MFCC) of voice signal as phonic signal character parameter, and set up gauss hybrid models;

In the embodiment of the present invention, adopt windows to carry sound-track engraving apparatus and record respectively 3 people's voice, everyone records 2 sections, and wherein 1 section separates and identification for sound, other 1 section for speaker's pronunciation modeling, Offered target sound source is number one sound source; If figure (a) in Fig. 2 is to as shown in figure (c), get respectively one section of voice of three people, for it sets up gauss hybrid models, and the model parameter obtaining is deposited in model bank.

Model formation is as follows:

p (XIG) = Σ_{i = 1}^{M} p_{i} b_{i} (X) - - - (1)

I represents single Gauss model number in gauss hybrid models;

P _irepresent the weight coefficient of i single Gauss model,

μ _irepresent the mean value vector of i single Gauss model;

∑ _irepresent the covariance matrix of i single Gauss model;

Adopt k means clustering algorithm to carry out cluster to phonic signal character parameter, obtain gauss hybrid models parameter set initial value G ₀={ p _i ⁰, μ _i ⁰, ∑ _i ⁰, i=1,2 ..., I;

In this example, adopt 16 single Gauss model composition gauss hybrid models.16 vectors of random generation are as cluster centre, each vector length is number of speech frames, the characteristic parameter of every frame is assigned to some in 16 cluster centres by minimum distance criterion, then recalculate the central value of each cluster centre vector, set it as new cluster centre, finish until algorithm convergence is calculated, the cluster centre now obtaining is exactly initial Gaussian mixture model Mean Parameters μ _i ⁰, ask characteristic parameter covariance to obtain initial ∑ _i ⁰, p _i ⁰initial value is all

Adopt greatest hope algorithm to estimate model, its principle is exactly the maximum probability that observed reading occurs, by respectively to pattern function about parameter p _i ⁰, μ _i ⁰, ∑ _i ⁰the differentiate calculating parameter p that equals zero _i, μ _i, ∑ _irevaluation value, until algorithm convergence calculate finish, now complete the training of characteristic parameter.

Step 2, the microphone array collection test environment sound signal that adopts 4 microphones to form, determine the direction that this ambient sound source of sound number and each sound source wave beam arrive, and sound source is to the incident angle of microphone array;

Detailed process is as follows:

Step 2-1, the microphone array collection test environment sound signal that adopts 4 microphones to form, and the mixed audio signal gathering is carried out to discretize processing, obtain the amplitude of each sampled point;

In the embodiment of the present invention, if figure (a) in Fig. 3 is to as shown in figure (c), get respectively three people's another section of voice as the sound data sources of mixed audio, adopt 4 microphones, the array of these 4 microphone compositions as shown in Figure 4, a microphone and No. two microphones are symmetrically distributed in horizontal direction both sides centered by array center, are symmetrically distributed in vertical direction both sides No. three with No. four microphones centered by array center; The blended data of 4 microphone receptions is if figure (a) in Fig. 5 is to as shown in figure (d), the voice that 4 microphones are received carry out discretize processing, the frequency of discretize is 12500Hz, and the amplitude of definite each sampled point, if figure (a) in Fig. 6 is to as shown in figure (d).

The estimated value formula of vector covariance matrix is as follows:

R_{xx} = \frac{1}{M} Σ_{m = 1}^{4} X (m) X^{H} (m) - - - (2)

X (m) represents m the mixed audio matrix that microphone collects;

XH (m) represents m the mixed audio transpose of a matrix matrix that microphone collects;

In step 2-4, this example, the estimated value of vector covariance matrix is carried out to Eigenvalues Decomposition, obtain eigenwert [0.0000 0.0190 0.0363 0.1128], and eigenwert is sorted from big to small, with threshold value 10 ^-7relatively, obtain 3 eigenwerts, therefore sound source number is 3;

In the embodiment of the present invention, the eigenwert equating with sound source number 3 and characteristic of correspondence vector are regarded as to signal section space, remaining 4-3,1 eigenwert and proper vector are regarded noise section space as, be that noise source number is 1, the element corresponding according to noise characteristic value can obtain noise matrix

V _u＝[-0.1218-0.4761i-0.1564+0.4659i-0.5070-0.0374i-0.5084]；

As shown in Figure 4, the distance of each microphone and array center is all 0.02m; In the embodiment of the present invention, the wavelength of mixed audio signal is 30000; No. one microphone is 0 ° for the orientation angle of array center, and No. two microphone is 180 ° for the orientation angle of array center, and No. three microphone is 90 ° for the orientation angle of array center, and No. one microphone is 270 ° for the orientation angle of array center;

The angular spectrum function formula of mixed audio signal is as follows:

P (θ) = \frac{1}{α^{H} (θ) V_{u} {V^{H}}_{u} α (θ)} - - - (3)

Wherein, P (θ) represents the angular spectrum function of mixed audio signal;

α (θ) represents the steering vector of microphone array, α (θ)=(α ₁(θ), α ₂(θ), α ₃(θ), α ₄(θ)), wherein, α ₁(θ)=e ^{jk0.02cos (0 °-θ)}, α ₂(θ)=e ^{jk002cos (180 °-θ)}, α ₃(θ)=e ^{jk002cos (90 °-θ)}, α ₄(θ)=e ^{jk002cos (270 °-θ)}, j represents imaginary unit, k=2 π/λ, and λ represents the wavelength of mixed audio signal;

θ represents the wave beam arrival direction of sound source;

V _urepresent noise matrix;

V ^h _urepresent noise transpose of a matrix matrix;

As shown in Figure 7, the waveform of the angular spectrum function P (θ) of mixed audio signal, the wave beam arrival direction of 3 sound sources that obtain existing in this morbid sound is respectively [50 °, 200 °, 300 °].

Microphone array acoustic pressure formula is as follows:

p_{w} (t) = Σ_{n = 1}^{3} 0.5 Σ_{m = 1}^{4} h_{mn} (t) s_{n} (t) - - - (4)

Wherein, p _w(t) represent t moment microphone array acoustic pressure;

N represents sound source number;

T represents the time;

S _n(t) sound signal of n sound source of expression;

Step 5, microphone array acoustic pressure, microphone array horizontal direction gradient and microphone array vertical direction acoustic pressure gradient according in frequency domain, obtain the strength vector formula of the sound pressure signal in frequency field, and then draw strength vector direction;

The strength vector formula of the sound pressure signal in frequency field is:

I (ω, t) = \frac{1}{ρ_{0} c} [Re {{p_{w}}^{*} (ω, t) p_{x} (ω, t)} u_{x} + Re {{p_{w}}^{*} (ω, t) p_{y} (ω, t)} u_{y}] - - - (7)

ρ ₀represent test environment atmospheric density;

C represents the velocity of sound;

Re[.] represent to get real part;

U _xrepresent abscissa axis direction unit vector;

U _yrepresent axis of ordinates direction unit vector;

Strength vector direction formula is as follows:

γ (ω, t) = \tan^{- 1} [\frac{Re {{p_{w}}^{*} (ω, t) p_{y} (ω, t)}}{Re {{p_{w}}^{*} (ω, t) p_{x} (ω, t)}}] - - - (8)

Detailed process is as follows:

In the embodiment of the present invention, as shown in Figure 8, the distribution probability density map of γ (ω, t); The mixing Feng meter Xiu Si that can obtain meeting this probability density distribution according to above-mentioned required sound source number and angle distributes by 3 single Feng meter Xiu Si distribution and constitutions, and these three center of distribution angles are respectively [50 °, 200 °, 300 °].

Described mixing Feng meter Xiu Si distributed model formula is as follows:

g (θ) = Σ_{n = 1}^{N} α_{n} f (θ; k_{n}) - - - (10)

Wherein,

represent to mix Feng meter Xiu Si distribution probability density;

represent morbid sound orientation angle;

Mix Feng meter Xiu Si distribution function parameter set as follows:

Γ＝{α _n，k _n}，i＝1，2，3 (11)

In the embodiment of the present invention, α value is [1/3,1/3,1/3], k value [8,6,3];

Step 6-3, according to the initial model parameter obtaining, set up initial mixing Feng meter Xiu Si distribution function, obtain function formula and be:

Adopt greatest hope algorithm to estimate the parameter that obtains mixing Feng meter Xiu Si distributed model, its principle is exactly the maximum probability of observed reading appearance, by pattern function about the equal zero revaluation value of calculating parameter α and k of parameter alpha and k differentiate,

Using γ (ω, t) as substitution

take the logarithm and obtain initial log-likelihood value-3.0249e+004, distribute and account for the alpha parameter [0.2267 that mixes ratio that Feng meter Xiu Si distributes and can obtain revaluation by calculating each current single Feng meter Xiu Si, 0.2817, 0.4516], the value that while obtains revaluation k according to differentiate parameters obtained k acquiring method is [5.1498, 4.0061, 3.1277], now can obtain new log-likelihood value is-2.9887e+004, new and old likelihood value difference is got threshold value 0.1 by 362.3362 much larger than threshold value, therefore give old likelihood value by new likelihood value assignment, and then again repeat just now step by these two the revaluation parameters that newly obtain and think algorithm convergence until new and old likelihood value is less than threshold value, in this example, finally obtain alpha parameter [0.2689, 0.2811, 0.4500], the value of k is [4.3508, 3.3601, 2.8332], now obtain and met the mixing Feng meter Xiu Si distribution function that strength vector direction distributes, being illustrated in figure 9 the mixing Feng meter Xiu Si obtaining distributes.

I_{n} (θ; ω, t) = α_{n} f (θ; k_{n}) - - - (12)

Wherein,

represent the strength vector directivity function of n sound source;

The signal formula of each sound source in frequency domain is as follows:

{\tilde{s}}_{n} (ω, t) = p_{w} (ω, t) I_{n} (θ; ω, t) - - - (13)

Wherein, represent the frequency domain signal of n the sound-source signal obtaining after mixing voice separates;

Will

obtain time-domain signal through Fourier inversion

Step 8, calculate the matching probability of specifying sound source in each sound source signals and sample voice storehouse, think that the sound source of probable value maximum is target sound source, retain this sound source signals, delete other non-target sound sources;

In the embodiment of the present invention, suppose that first man is target sound source, three voice after final separation and the matching probability logarithm value of this target sound model are respectively [2.0850-2.8807-3.5084] × 10 ⁴, wherein maximum coupling sound is sound after separating for No. 1, finds target sound source.

C ({\tilde{X}}_{n}) = \log [P ({\tilde{X}}_{n} | G_{c})] - - - (14)

In formula:

represent by voice after separating

the speech characteristic parameter extracting, extracts voice

mel frequency cepstral coefficient as voice

characteristic parameter;

G _crepresent user nominator's acoustic model parameters;

In the embodiment of the present invention, the last directivity function that obtains each sound source according to the mixing Feng meter Xiu Si distributed model parameter obtaining, further separate and obtain original sound, if figure (a) in Figure 10 is to as shown in figure (f), after being ideal and separating, obtain the comparison diagram of data, can see that similarity is high.

Claims

1. an intelligent sound disposal route, is characterized in that, comprises the following steps:

Detailed process is as follows:

Model formation is as follows:

p (XIG) = Σ_{i = 1}^{M} p_{i} b_{i} (X) - - - (1)

I represents single Gauss model number in gauss hybrid models;

P _irepresent the weight coefficient of i single Gauss model,

μ _irepresent the mean value vector of i single Gauss model;

∑ _irepresent the covariance matrix of i single Gauss model;

Detailed process is as follows:

The estimated value formula of vector covariance matrix is as follows:

R_{xx} = \frac{1}{M} Σ_{m = 1}^{M} X (m) X^{H} (m) - - - (2)

X (m) represents m the mixed audio matrix that microphone collects;

The angular spectrum function formula of mixed audio signal is as follows:

P (θ) = \frac{1}{α^{H} (θ) V_{u} {V^{H}}_{u} α (θ)} - - - (3)

Wherein, P (θ) represents the angular spectrum function of mixed audio signal;

represent the orientation angle of m microphone for array center;

θ represents the wave beam arrival direction of sound source;

V _urepresent noise matrix;

V ^h _urepresent noise transpose of a matrix matrix;

Microphone array sound pressure signal formula is as follows:

p_{w} (t) = Σ_{n = 1}^{N} 0.5 Σ_{m = 1}^{M} h_{mn} (t) s_{n} (t) - - - (4)

Wherein, p _w(t) represent t moment microphone array acoustic pressure;

N represents sound source number;

T represents the time;

S _n(t) sound signal of n sound source of expression;

The strength vector formula of the sound pressure signal in frequency field is:

I (ω, t) = \frac{1}{ρ_{0} c} [Re {{p_{w}}^{*} (ω, t) p_{x} (ω, t)} u_{x} + Re {{p_{w}}^{*} (ω, t) p_{y} (ω, t)} u_{y}] - - - (7)

ρ ₀represent test environment atmospheric density;

C represents the velocity of sound;

Re[.] represent to get real part;

U _xrepresent abscissa axis direction unit vector;

U _yrepresent axis of ordinates direction unit vector;

Strength vector direction formula is as follows:

γ (ω, t) = \tan^{- 1} [\frac{Re {{p_{w}}^{*} (ω, t) p_{y} (ω, t)}}{{p_{w}}^{*} (ω, t) p_{x} (ω, t)}] - - - (8)

Detailed process is as follows:

Described mixing Feng meter Xiu Si distributed model formula is as follows:

g (θ) = Σ_{n = 1}^{N} α_{n} f (θ; k_{n}) - - - (10)

Wherein,

represent to mix Feng meter Xiu Si distribution probability density;

Mix Feng meter Xiu Si distribution function parameter set as follows:

Γ＝{α _n，k _n}，i＝1，..，N (11)

represent morbid sound orientation angle;

I_{n} (θ; ω, t) = α_{n} f (θ; k_{n}) - - - (12)

Wherein, represent the strength vector directivity function of n sound source;

The signal formula of each sound source in frequency domain is as follows:

{\tilde{s}}_{n} (ω, t) = p_{w} (ω, t) I_{n} (θ; ω, t) - - - (13)

Wherein,

(ω, t) represents the frequency domain signal of n the sound-source signal obtaining after mixing voice separates;

Will

obtain time-domain signal through Fourier inversion

C ({\tilde{X}}_{n}) = \log [P ({\tilde{X}}_{n} | G_{c})] - - - (14)

In formula:

represent by voice after separating

the speech characteristic parameter extracting, extracts voice

mel frequency cepstral coefficient as voice characteristic parameter;

G _crepresent user nominator's acoustic model parameters;

2. intelligent sound disposal route according to claim 1, is characterized in that, the threshold value span described in step 2-4 is 10 ^-2～10 ^-16.

3. intelligent sound disposal route according to claim 1, is characterized in that, the α described in step 6-1 _nget the random number in 0～1, and meet

k _nget the random number in 1～700.