CN103811020A - Smart voice processing method - Google Patents

Smart voice processing method Download PDF

Info

Publication number
CN103811020A
CN103811020A CN201410081493.6A CN201410081493A CN103811020A CN 103811020 A CN103811020 A CN 103811020A CN 201410081493 A CN201410081493 A CN 201410081493A CN 103811020 A CN103811020 A CN 103811020A
Authority
CN
China
Prior art keywords
sound
sound source
represent
microphone
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410081493.6A
Other languages
Chinese (zh)
Other versions
CN103811020B (en
Inventor
王�义
魏阳杰
陈瑶
关楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN201410081493.6A priority Critical patent/CN103811020B/en
Publication of CN103811020A publication Critical patent/CN103811020A/en
Application granted granted Critical
Publication of CN103811020B publication Critical patent/CN103811020B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention relates to a smart voice processing method and belongs to the technical field of information processing. The smart voice processing method is characterized by achieving smart identification of identities of multiple talkers in the multi-person voice environment and separating mixed voices by building a talker voice model base so as to obtain independent voice of each talker, amplifying voices of the talkers to be heard according to user demands and eliminating voices of talkers not required by the users. Different from the traditional hearing aid, the smart voice processing method can be used for automatically providing needed voices for the users according to user demands, reduces interference of non-target voices except for noise, and has individuation, interaction and intelligentization.

Description

A kind of intelligent sound disposal route
Technical field
The invention belongs to technical field of information processing, be specifically related to a kind of intelligent sound disposal route.
Background technology
According to the up-to-date assessment data demonstration of the World Health Organization (WHO) (WHO) in 2013 issue, the whole world has at present 3.6 hundred million people and has dysaudia in various degree, accounts for 5% of global total population.The use of hearing aid product can compensate Deaf and Hard of Hearing Talents's hearing loss effectively, improves their live and work quality.But the research of current hearing assistance system correlation technique still concentrates on squelch and source magnitude of sound is amplified two aspects, seldom relates to modeling and many sound sources Technology of Auto Separation based on sound characteristic.In the time that practical application scene is very complicated, for example: when party, multiple speakers are sounding simultaneously, or even with background sounds such as music, because hearing assistance system cannot be isolated interested target voice from mixed Speech input, the hearing burden that simple intensity of sound expanded functionality can only increase user even injures, and can not bring effective Speech input and understanding.Therefore, for the technological deficiency of current hearing assistance system, design a novel hearing assistance system specific sound object recognition function, more intelligent and personalized that has, have very important significance.
Summary of the invention
The deficiency existing for prior art, the present invention proposes a kind of intelligent sound disposal route, guarantees that to reach user obtains pure sound according to the demand of oneself and receives and amplify, and realizes the intellectuality of hearing assistance system, interactive and personalized object.
A kind of intelligent sound disposal route, comprises the following steps:
Step 1, collecting sample voice segments build sample voice storehouse, and sample voice is carried out to feature extraction, obtain characteristic parameter, and characteristic parameter is trained;
Detailed process is as follows:
Step 1-1, collecting sample voice segments, carry out discretize processing by the voice segments of collection, extracts the Mel frequency cepstral coefficient of voice signal as phonic signal character parameter, and set up gauss hybrid models;
Model formation is as follows:
p ( XIG ) = Σ i = 1 M p i b i ( X ) - - - ( 1 )
Wherein, the probability in the model that p (XIG) expression sample voice characteristic parameter is G in model parameter;
G represents gauss hybrid models parameter set, G={p i, μ i, ∑ i, i=1,2 ..., I;
I represents single Gauss model number in gauss hybrid models;
P irepresent the weight coefficient of i single Gauss model,
Figure BDA0000473074190000012
μ irepresent the mean value vector of i single Gauss model;
irepresent the covariance matrix of i single Gauss model;
X represents sample voice characteristic parameter, X={x 1, x 2..., x t, the number of T representation feature vector;
B i(X) density function of i single Gauss model of expression, b i(X)=N (μ i, ∑ i), N (.) represents the density function of standard Gaussian distribution;
Step 1-2, utilize phonic signal character parameter training gauss hybrid models;
Adopt k means clustering algorithm to carry out cluster to phonic signal character parameter, obtain gauss hybrid models parameter set initial value G 0={ p i 0, μ i 0, ∑ i 0, i=1,2 ..., I; And according to the gauss hybrid models parameter set initial value obtaining, adopt greatest hope algorithm to estimate model, and then obtain gauss hybrid models parameter, complete the training of characteristic parameter;
Step 2, the microphone array collection test environment sound signal that adopts M microphone to form, determine the direction that this ambient sound source of sound number and each sound source wave beam arrive, and sound source is to the incident angle of microphone array;
Detailed process is as follows:
The microphone array of step 2-1, M microphone composition of employing gathers the mixed audio signal of test environment, and the mixed audio signal gathering is carried out to discretize processing, obtains the amplitude of each sampled point;
Step 2-2, the amplitude of each sampled point is carried out to matrixing, obtain the mixed audio matrix that each microphone collects; Above-mentioned mixed audio matrix column number is one, and line number is sampled point number, the amplitude that in matrix, element is each sampled point;
Step 2-3, the mixed audio matrix collecting according to each microphone and microphone number, the estimated value of the vector covariance matrix of the mixed audio signal of acquisition test environment;
The estimated value formula of vector covariance matrix is as follows:
R xx = 1 M Σ m = 1 M X ( m ) X H ( m ) - - - ( 2 )
Wherein, R xxrepresent the estimated value of the vector covariance matrix of the mixed audio signal of test environment;
X (m) represents m the mixed audio matrix that microphone collects;
X h(m) represent m the mixed audio transpose of a matrix matrix that microphone collects;
Step 2-4, the estimated value of vector covariance matrix is carried out to Eigenvalues Decomposition, obtain eigenwert, and eigenwert is sorted from big to small, determine that eigenwert is greater than the number of threshold value, is the number of sound source;
Step 2-5, microphone number is deducted to sound source number obtain noise source number, and then the corresponding noise matrix that obtains;
Step 2-6, obtain the steering vector of microphone array for the orientation angle of array center and the wave beam arrival direction of sound source according to the wavelength of the distance between each microphone and array center, mixed audio signal, microphone, then obtain the angular spectrum function of mixed audio signal according to the steering vector of noise matrix and microphone array;
The angular spectrum function formula of mixed audio signal is as follows:
P ( θ ) = 1 α H ( θ ) V u V H u α ( θ ) - - - ( 3 )
Wherein, P (θ) represents the angular spectrum function of mixed audio signal;
α (θ) represents the steering vector of microphone array, α (θ)=(α 1(θ) ..., α m(θ) ..., α m(θ)), wherein,
Figure BDA0000473074190000032
j represents imaginary unit, k=2 π/λ, and λ represents the wavelength of mixed audio signal, d mrepresent the distance of m microphone and array center,
Figure BDA0000473074190000033
represent the orientation angle of m microphone for array center;
θ represents the wave beam arrival direction of sound source;
α h(θ) transposed matrix of the steering vector of expression microphone array;
V urepresent noise matrix;
V h urepresent noise transpose of a matrix matrix;
Step 2-7, according to the waveform of the angular spectrum function of mixed audio signal, descending multiple peak values of choosing this waveform, select the number of peak value to be the number of sound source;
Step 2-8, determine choose the angle value that peak value is corresponding, obtain the wave beam arrival direction of each sound source;
Step 3, according to the transformational relation between the sound signal of each sound source, sound source and microphone, obtain the acoustic pressure gradient of microphone the microphone array acoustic pressure, microphone array horizontal direction acoustic pressure gradient and the microphone array vertical direction that receive;
Microphone array sound pressure signal formula is as follows:
p w ( t ) = Σ n = 1 N 0.5 Σ m = 1 M h mn ( t ) s n ( t ) - - - ( 4 )
Wherein, p w(t) represent t moment microphone array acoustic pressure;
N represents sound source number;
T represents the time;
S n(t) sound signal of n sound source of expression;
H mn(t) represent the transition matrix between n sound source and m microphone, h mn(t)=p 0(t) α mn(t)), p 0(t) represent the microphone array center acoustic pressure that the t moment is caused by sound wave; α mn(t)) be illustrated in the steering vector of t moment m microphone about n sound source, wherein, θ n(t) the wave beam arrival direction of expression t moment n sound source;
Microphone array horizontal direction acoustic pressure gradient formula is as follows:
Figure BDA0000473074190000041
Wherein, p x(t) represent microphone array horizontal direction acoustic pressure gradient;
The acoustic pressure gradient formula of microphone array vertical direction is as follows:
Figure BDA0000473074190000042
Wherein, p y(t) acoustic pressure gradient of expression microphone array vertical direction;
Step 4, employing Fourier transform are transformed into frequency domain by the acoustic pressure gradient of microphone array center acoustic pressure, microphone array horizontal direction acoustic pressure gradient and microphone array vertical direction from time domain;
Step 5, microphone array acoustic pressure, microphone array horizontal direction gradient and microphone array vertical direction acoustic pressure gradient according in frequency domain, obtain the strength vector formula of the sound pressure signal in frequency field, and then derive strength vector direction;
The strength vector formula of the sound pressure signal in frequency field is:
I ( ω , t ) = 1 ρ 0 c [ Re { p w * ( ω , t ) p x ( ω , t ) } u x + Re { p w * ( ω , t ) p y ( ω , t ) } u y ] - - - ( 7 )
Wherein, I (ω, t) represents the strength vector of the sound pressure signal in frequency field;
P 0represent test environment atmospheric density;
C represents the velocity of sound;
Re[.] represent to get real part;
P w *(ω, t) represents the conjugate matrices of the microphone array acoustic pressure in frequency domain;
P x(ω, t) represents the microphone array horizontal direction acoustic pressure gradient in frequency domain;
P y(ω, t) represents the microphone array vertical direction acoustic pressure gradient in frequency domain;
U xrepresent abscissa axis direction unit vector;
U yrepresent axis of ordinates direction unit vector;
Strength vector direction formula is as follows:
γ ( ω , t ) = tan - 1 [ Re { p w * ( ω , t ) p y ( ω , t ) } p w * ( ω , t ) p x ( ω , t ) ] - - - ( 8 )
The strength vector direction of the sound pressure signal of the morbid sound that wherein, γ (ω, t) expression microphone array receives;
Step 6, to strength vector direction add up obtain its probability density distribution, adopt mixing Feng meter Xiu Si to distribute and carry out matching, obtain voice strength vector direction and obey the model parameter of mixing Feng meter Xiu Si distribution, and then obtain the strength vector directivity function of each sound pressure signal;
Detailed process is as follows:
Step 6-1, strength vector direction is added up and obtained its probability density distribution, adopt and mix Feng meter Xiu Si and distribute and carry out matching, obtain the model parameter collection that mixing Feng meter Xiu Si that the strength vector direction of voice obeys distributes;
Described mixing Feng meter Xiu Si distributed model formula is as follows:
g ( θ ) = Σ n = 1 N α n f ( θ ; k n ) - - - ( 10 )
Wherein, represent to mix Feng meter Xiu Si distribution probability density;
Figure BDA0000473074190000053
represent morbid sound orientation angle;
α nrepresent the weight of the strength vector directivity function of the sound pressure signal of n sound source;
Figure BDA0000473074190000054
wherein, I 0(k n) n the single order modified Bessel function that sound source is corresponding of expression, k nrepresent the single Feng meter Xiu Si corresponding concentration parameter that distributes that the strength vector direction of n sound source sound pressure signal is obeyed, the inverse of the variance that Feng meter Xiu Si distributes;
Mix Feng meter Xiu Si distribution function parameter set as follows:
Γ={α n,k n},i=1,...,N (11)
Step 6-2, initialization model parameter, obtain initial function parameter set;
Step 6-3, according to the initial model parameter obtaining, adopt greatest hope algorithm to estimate the parameter that obtains mixing Feng meter Xiu Si distributed model;
Step 6-4, according to estimating the mixing Feng meter Xiu Si distributed model parameter that obtains, try to achieve the strength vector directivity function of each sound pressure signal;
The strength vector directivity function formula of sound pressure signal is as follows:
I n ( θ ; ω , t ) = α n f ( θ ; k n ) - - - ( 12 )
Wherein,
Figure BDA0000473074190000056
represent the strength vector directivity function of n sound source;
Strength vector directivity function and the microphone array acoustic pressure of each sound pressure signal that step 7, basis obtain, obtain each sound source in frequency domain signal, and adopt Fourier inversion that the each sound-source signal in this frequency domain is converted to the sound-source signal in time domain;
The signal formula of each sound source in frequency domain is as follows:
s ~ n ( ω , t ) = p w ( ω , t ) I n ( θ ; ω , t ) - - - ( 13 )
Wherein,
Figure BDA0000473074190000062
represent the frequency domain signal of n the sound-source signal obtaining after mixing voice separates;
Will
Figure BDA0000473074190000063
obtain time-domain signal through Fourier inversion
Step 8, calculate the matching probability of specifying sound source in each sound source signals and sample voice storehouse, selecting the sound source of probable value maximum is target sound source, retains this sound source signals, deletes other non-target sound sources;
In each sound source signals and sample voice storehouse, specify the matching probability formula of sound source as follows:
C ( X ~ n ) = log [ P ( X ~ n | G c ) ] - - - ( 14 )
In formula:
Figure BDA0000473074190000066
represent by voice after separating
Figure BDA0000473074190000067
the speech characteristic parameter extracting, extracts voice
Figure BDA0000473074190000068
mel frequency cepstral coefficient as voice characteristic parameter;
Figure BDA00004730741900000610
represent to specify in n sound source signals and sample voice storehouse the matching probability of sound source;
G crepresent user nominator's acoustic model parameters;
represent to separate the probability that rear voice belong to user nominator sound;
Step 9, to retain sound source signals amplify, complete the amplification to specified voice source in test environment.
Threshold value span described in step 2-4 is 10 -2~10 -16.
α described in step 6-1 nget the random number in 0~1, and meet
Figure BDA00004730741900000612
k nget the random number in 1~700.
Advantage of the present invention:
A kind of intelligent sound disposal route of the present invention, talk with people's sound model bank by foundation, the identity that realize the multiple dialogue of Intelligent Recognition people under multi-person speech environment separate mixing voice simultaneously and obtain each dialogue people's independent voice, are the voice that the dialogue people that non-user requires eliminated in voice that user amplifies the dialogue people that will listen to simultaneously according to user's request; Different from traditional hearing aid, thus the method can, according to individual subscriber demand automatically for user provides its required sound, reduce the interference of the non-target voice except noise, has embodied the personalization, interactive and intelligent of the method.
Accompanying drawing explanation
Fig. 1 is the intelligent sound process flow figure of an embodiment of the present invention;
Fig. 2 is the modeling sound source data schematic diagram of an embodiment of the present invention, and wherein, figure (a) represents the sound of first man
Schematic diagram data, figure (b) represents second people's voice data schematic diagram, figure (c) represents the 3rd people's voice data schematic diagram;
Fig. 3 is the sound source data schematic diagram of an embodiment of the present invention for sound mix, wherein, figure (a) represents the schematic diagram data of first sound source of sound, figure (b) represents the schematic diagram data of second sound source of sound, and figure (c) represents the schematic diagram data of the 3rd sound source;
Fig. 4 is the microphone array schematic diagram of an embodiment of the present invention;
Fig. 5 is four schematic diagram datas that microphone receives of an embodiment of the present invention, wherein, figure (a) represents the morbid sound signal schematic representation that first microphone receives, figure (b) represents second morbid sound signal schematic representation that microphone receives, figure (c) represents the 3rd the morbid sound signal schematic representation that microphone receives, and figure (d) represents the 4th the morbid sound signal schematic representation that microphone receives;
Fig. 6 is the schematic diagram after the data sampling that receives of four microphones of an embodiment of the present invention, wherein, figure (a) represents schematic diagram after morbid sound signal sampling that first microphone receives, schematic diagram after the morbid sound signal sampling that second microphone of figure (b) expression receives, schematic diagram after the morbid sound signal sampling that the 3rd microphone of figure (c) expression receives, schematic diagram after the morbid sound signal sampling that the 4th microphone of figure (d) expression receives;
Fig. 7 is the Estimation of Spatial Spectrum schematic diagram of the mixed signal of an embodiment of the present invention;
Fig. 8 is the morbid sound direction vector distribution probability density map of an embodiment of the present invention;
Fig. 9 is that the maximum likelihood of an embodiment of the present invention estimates to mix Feng meter Xiu Si model schematic diagram;
Figure 10 be an embodiment of the present invention desirable voice with separate after obtain Speech comparison figure, wherein, figure (a) is the original sound signal of first sound source of sound, figure (b) is the original sound signal of first sound source of sound after separating, figure (c) is the original sound signal of second sound source of sound, figure (d) is the original sound signal of second sound source of sound after separating, figure (e) is the original sound signal of the 3rd sound source, and figure (f) is the original sound signal of the 3rd sound source after separating.
Embodiment
Below in conjunction with accompanying drawing, an embodiment of the present invention is described further.
In the embodiment of the present invention, model system is mainly divided into pronunciation modeling module and two modules of the dynamic real-time processing module of voice, wherein pronunciation modeling module realizes speaker's pronunciation modeling, the dynamic real-time processing module of voice realizes under complicated voice environment, mix the direction of voice and locate and separate, mixing voice identification and extraction (being the extraction amplification of target sound and the shielding of all the other sound).
A kind of intelligent sound disposal route, method flow diagram as shown in Figure 1, comprises the following steps:
Step 1, collecting sample voice segments build sample voice storehouse, and sample voice is carried out to feature extraction, obtain characteristic parameter, and characteristic parameter is trained; Detailed process is as follows:
Step 1-1, record sample voice section in quiet indoor environment, the voice segments of collection is carried out to discretize processing, extract the Mel frequency cepstral coefficient (MFCC) of voice signal as phonic signal character parameter, and set up gauss hybrid models;
In the embodiment of the present invention, adopt windows to carry sound-track engraving apparatus and record respectively 3 people's voice, everyone records 2 sections, and wherein 1 section separates and identification for sound, other 1 section for speaker's pronunciation modeling, Offered target sound source is number one sound source; If figure (a) in Fig. 2 is to as shown in figure (c), get respectively one section of voice of three people, for it sets up gauss hybrid models, and the model parameter obtaining is deposited in model bank.
Model formation is as follows:
p ( XIG ) = Σ i = 1 M p i b i ( X ) - - - ( 1 )
Wherein, the probability in the model that p (XIG) expression sample voice characteristic parameter is G in model parameter;
G represents gauss hybrid models parameter set, G={p i, μ i, ∑ i, i=1,2 ..., I;
I represents single Gauss model number in gauss hybrid models;
P irepresent the weight coefficient of i single Gauss model,
μ irepresent the mean value vector of i single Gauss model;
irepresent the covariance matrix of i single Gauss model;
X represents sample voice characteristic parameter, X={x 1, x 2..., x t, the number of T representation feature vector;
B i(X) density function of i single Gauss model of expression, b i(X)=N (μ i, ∑ i), N (.) represents the density function of standard Gaussian distribution;
Step 1-2, utilize phonic signal character parameter training gauss hybrid models;
Adopt k means clustering algorithm to carry out cluster to phonic signal character parameter, obtain gauss hybrid models parameter set initial value G 0={ p i 0, μ i 0, ∑ i 0, i=1,2 ..., I;
In this example, adopt 16 single Gauss model composition gauss hybrid models.16 vectors of random generation are as cluster centre, each vector length is number of speech frames, the characteristic parameter of every frame is assigned to some in 16 cluster centres by minimum distance criterion, then recalculate the central value of each cluster centre vector, set it as new cluster centre, finish until algorithm convergence is calculated, the cluster centre now obtaining is exactly initial Gaussian mixture model Mean Parameters μ i 0, ask characteristic parameter covariance to obtain initial ∑ i 0, p i 0initial value is all
Figure BDA0000473074190000084
Adopt greatest hope algorithm to estimate model, its principle is exactly the maximum probability that observed reading occurs, by respectively to pattern function about parameter p i 0, μ i 0, ∑ i 0the differentiate calculating parameter p that equals zero i, μ i, ∑ irevaluation value, until algorithm convergence calculate finish, now complete the training of characteristic parameter.
Step 2, the microphone array collection test environment sound signal that adopts 4 microphones to form, determine the direction that this ambient sound source of sound number and each sound source wave beam arrive, and sound source is to the incident angle of microphone array;
Detailed process is as follows:
Step 2-1, the microphone array collection test environment sound signal that adopts 4 microphones to form, and the mixed audio signal gathering is carried out to discretize processing, obtain the amplitude of each sampled point;
In the embodiment of the present invention, if figure (a) in Fig. 3 is to as shown in figure (c), get respectively three people's another section of voice as the sound data sources of mixed audio, adopt 4 microphones, the array of these 4 microphone compositions as shown in Figure 4, a microphone and No. two microphones are symmetrically distributed in horizontal direction both sides centered by array center, are symmetrically distributed in vertical direction both sides No. three with No. four microphones centered by array center; The blended data of 4 microphone receptions is if figure (a) in Fig. 5 is to as shown in figure (d), the voice that 4 microphones are received carry out discretize processing, the frequency of discretize is 12500Hz, and the amplitude of definite each sampled point, if figure (a) in Fig. 6 is to as shown in figure (d).
Step 2-2, the amplitude of each sampled point is carried out to matrixing, obtain the mixed audio matrix that each microphone collects; Above-mentioned mixed audio matrix column number is one, and line number is sampled point number, the amplitude that in matrix, element is each sampled point;
Step 2-3, the mixed audio matrix collecting according to each microphone and microphone number, the estimated value of the vector covariance matrix of the mixed audio signal of acquisition test environment;
The estimated value formula of vector covariance matrix is as follows:
R xx = 1 M Σ m = 1 4 X ( m ) X H ( m ) - - - ( 2 )
Wherein, R xxrepresent the estimated value of the vector covariance matrix of the mixed audio signal of test environment;
X (m) represents m the mixed audio matrix that microphone collects;
XH (m) represents m the mixed audio transpose of a matrix matrix that microphone collects;
In step 2-4, this example, the estimated value of vector covariance matrix is carried out to Eigenvalues Decomposition, obtain eigenwert [0.0000 0.0190 0.0363 0.1128], and eigenwert is sorted from big to small, with threshold value 10 -7relatively, obtain 3 eigenwerts, therefore sound source number is 3;
Step 2-5, microphone number is deducted to sound source number obtain noise source number, and then the corresponding noise matrix that obtains;
In the embodiment of the present invention, the eigenwert equating with sound source number 3 and characteristic of correspondence vector are regarded as to signal section space, remaining 4-3,1 eigenwert and proper vector are regarded noise section space as, be that noise source number is 1, the element corresponding according to noise characteristic value can obtain noise matrix
V u=[-0.1218-0.4761i-0.1564+0.4659i-0.5070-0.0374i-0.5084];
Step 2-6, obtain the steering vector of microphone array for the orientation angle of array center and the wave beam arrival direction of sound source according to the wavelength of the distance between each microphone and array center, mixed audio signal, microphone, then obtain the angular spectrum function of mixed audio signal according to the steering vector of noise matrix and microphone array;
As shown in Figure 4, the distance of each microphone and array center is all 0.02m; In the embodiment of the present invention, the wavelength of mixed audio signal is 30000; No. one microphone is 0 ° for the orientation angle of array center, and No. two microphone is 180 ° for the orientation angle of array center, and No. three microphone is 90 ° for the orientation angle of array center, and No. one microphone is 270 ° for the orientation angle of array center;
The angular spectrum function formula of mixed audio signal is as follows:
P ( θ ) = 1 α H ( θ ) V u V H u α ( θ ) - - - ( 3 )
Wherein, P (θ) represents the angular spectrum function of mixed audio signal;
α (θ) represents the steering vector of microphone array, α (θ)=(α 1(θ), α 2(θ), α 3(θ), α 4(θ)), wherein, α 1(θ)=e jk0.02cos (0 °-θ), α 2(θ)=e jk002cos (180 °-θ), α 3(θ)=e jk002cos (90 °-θ), α 4(θ)=e jk002cos (270 °-θ), j represents imaginary unit, k=2 π/λ, and λ represents the wavelength of mixed audio signal;
θ represents the wave beam arrival direction of sound source;
α h(θ) transposed matrix of the steering vector of expression microphone array;
V urepresent noise matrix;
V h urepresent noise transpose of a matrix matrix;
Step 2-7, according to the waveform of the angular spectrum function of mixed audio signal, descending multiple peak values of choosing this waveform, select the number of peak value to be the number of sound source;
Step 2-8, determine choose the angle value that peak value is corresponding, obtain the wave beam arrival direction of each sound source;
As shown in Figure 7, the waveform of the angular spectrum function P (θ) of mixed audio signal, the wave beam arrival direction of 3 sound sources that obtain existing in this morbid sound is respectively [50 °, 200 °, 300 °].
Step 3, according to the transformational relation between the sound signal of each sound source, sound source and microphone, obtain the acoustic pressure gradient of microphone the microphone array acoustic pressure, microphone array horizontal direction acoustic pressure gradient and the microphone array vertical direction that receive;
Microphone array acoustic pressure formula is as follows:
p w ( t ) = Σ n = 1 3 0.5 Σ m = 1 4 h mn ( t ) s n ( t ) - - - ( 4 )
Wherein, p w(t) represent t moment microphone array acoustic pressure;
N represents sound source number;
T represents the time;
S n(t) sound signal of n sound source of expression;
H mn(t) represent the transition matrix between n sound source and m microphone, h mn(t)=p 0(t) α mn(t)), p 0(t) represent the microphone array center acoustic pressure that the t moment is caused by sound wave; α mn(t)) be illustrated in the steering vector of t moment m microphone about n sound source, wherein, θ n(t) the wave beam arrival direction of expression t moment n sound source;
Microphone array horizontal direction acoustic pressure gradient formula is as follows:
Figure BDA0000473074190000111
Wherein, p x(t) represent microphone array horizontal direction acoustic pressure gradient;
The acoustic pressure gradient formula of microphone array vertical direction is as follows:
Figure BDA0000473074190000112
Wherein, p y(t) acoustic pressure gradient of expression microphone array vertical direction;
Step 4, employing Fourier transform are transformed into frequency domain by the acoustic pressure gradient of microphone array center acoustic pressure, microphone array horizontal direction acoustic pressure gradient and microphone array vertical direction from time domain;
Step 5, microphone array acoustic pressure, microphone array horizontal direction gradient and microphone array vertical direction acoustic pressure gradient according in frequency domain, obtain the strength vector formula of the sound pressure signal in frequency field, and then draw strength vector direction;
The strength vector formula of the sound pressure signal in frequency field is:
The strength vector formula of the sound pressure signal in frequency field is:
I ( ω , t ) = 1 ρ 0 c [ Re { p w * ( ω , t ) p x ( ω , t ) } u x + Re { p w * ( ω , t ) p y ( ω , t ) } u y ] - - - ( 7 )
Wherein, I (ω, t) represents the strength vector of the sound pressure signal in frequency field;
ρ 0represent test environment atmospheric density;
C represents the velocity of sound;
Re[.] represent to get real part;
P w *(ω, t) represents the conjugate matrices of the microphone array acoustic pressure in frequency domain;
P x(ω, t) represents the microphone array horizontal direction acoustic pressure gradient in frequency domain;
P y(ω, t) represents the microphone array vertical direction acoustic pressure gradient in frequency domain;
U xrepresent abscissa axis direction unit vector;
U yrepresent axis of ordinates direction unit vector;
Strength vector direction formula is as follows:
γ ( ω , t ) = tan - 1 [ Re { p w * ( ω , t ) p y ( ω , t ) } Re { p w * ( ω , t ) p x ( ω , t ) } ] - - - ( 8 )
The strength vector direction of the sound pressure signal of the morbid sound that wherein, γ (ω, t) expression microphone array receives;
Step 6, to strength vector direction add up obtain its probability density distribution, adopt mixing Feng meter Xiu Si to distribute and carry out matching, obtain voice strength vector direction and obey the model parameter of mixing Feng meter Xiu Si distribution, and then obtain the strength vector directivity function of each sound pressure signal;
Detailed process is as follows:
Step 6-1, strength vector direction is added up and obtained its probability density distribution, adopt and mix Feng meter Xiu Si and distribute and carry out matching, obtain the model parameter collection that mixing Feng meter Xiu Si that the strength vector direction of voice obeys distributes;
In the embodiment of the present invention, as shown in Figure 8, the distribution probability density map of γ (ω, t); The mixing Feng meter Xiu Si that can obtain meeting this probability density distribution according to above-mentioned required sound source number and angle distributes by 3 single Feng meter Xiu Si distribution and constitutions, and these three center of distribution angles are respectively [50 °, 200 °, 300 °].
Described mixing Feng meter Xiu Si distributed model formula is as follows:
g ( θ ) = Σ n = 1 N α n f ( θ ; k n ) - - - ( 10 )
Wherein,
Figure BDA0000473074190000123
represent to mix Feng meter Xiu Si distribution probability density;
Figure BDA0000473074190000124
represent morbid sound orientation angle;
α nrepresent the weight of the strength vector directivity function of the sound pressure signal of n sound source;
Figure BDA0000473074190000125
wherein, I 0(k n) n the single order modified Bessel function that sound source is corresponding of expression, k nrepresent the single Feng meter Xiu Si corresponding concentration parameter that distributes that the strength vector direction of n sound source sound pressure signal is obeyed, the inverse of the variance that Feng meter Xiu Si distributes;
Mix Feng meter Xiu Si distribution function parameter set as follows:
Γ={α n,k n},i=1,2,3 (11)
Step 6-2, initialization model parameter, obtain initial function parameter set;
In the embodiment of the present invention, α value is [1/3,1/3,1/3], k value [8,6,3];
Step 6-3, according to the initial model parameter obtaining, set up initial mixing Feng meter Xiu Si distribution function, obtain function formula and be:
Adopt greatest hope algorithm to estimate the parameter that obtains mixing Feng meter Xiu Si distributed model, its principle is exactly the maximum probability of observed reading appearance, by pattern function about the equal zero revaluation value of calculating parameter α and k of parameter alpha and k differentiate,
Using γ (ω, t) as substitution
Figure BDA0000473074190000139
take the logarithm and obtain initial log-likelihood value-3.0249e+004, distribute and account for the alpha parameter [0.2267 that mixes ratio that Feng meter Xiu Si distributes and can obtain revaluation by calculating each current single Feng meter Xiu Si, 0.2817, 0.4516], the value that while obtains revaluation k according to differentiate parameters obtained k acquiring method is [5.1498, 4.0061, 3.1277], now can obtain new log-likelihood value is-2.9887e+004, new and old likelihood value difference is got threshold value 0.1 by 362.3362 much larger than threshold value, therefore give old likelihood value by new likelihood value assignment, and then again repeat just now step by these two the revaluation parameters that newly obtain and think algorithm convergence until new and old likelihood value is less than threshold value, in this example, finally obtain alpha parameter [0.2689, 0.2811, 0.4500], the value of k is [4.3508, 3.3601, 2.8332], now obtain and met the mixing Feng meter Xiu Si distribution function that strength vector direction distributes, being illustrated in figure 9 the mixing Feng meter Xiu Si obtaining distributes.
Step 6-4, according to estimating the mixing Feng meter Xiu Si distributed model parameter that obtains, try to achieve the strength vector directivity function of each sound pressure signal;
The strength vector directivity function formula of sound pressure signal is as follows:
I n ( θ ; ω , t ) = α n f ( θ ; k n ) - - - ( 12 )
Wherein,
Figure BDA0000473074190000133
represent the strength vector directivity function of n sound source;
Strength vector directivity function and the microphone array acoustic pressure of each sound pressure signal that step 7, basis obtain, obtain each sound source in frequency domain signal, and adopt Fourier inversion that the each sound-source signal in this frequency domain is converted to the sound-source signal in time domain;
The signal formula of each sound source in frequency domain is as follows:
s ~ n ( ω , t ) = p w ( ω , t ) I n ( θ ; ω , t ) - - - ( 13 )
Wherein, represent the frequency domain signal of n the sound-source signal obtaining after mixing voice separates;
Will
Figure BDA0000473074190000136
obtain time-domain signal through Fourier inversion
Figure BDA0000473074190000137
Step 8, calculate the matching probability of specifying sound source in each sound source signals and sample voice storehouse, think that the sound source of probable value maximum is target sound source, retain this sound source signals, delete other non-target sound sources;
In the embodiment of the present invention, suppose that first man is target sound source, three voice after final separation and the matching probability logarithm value of this target sound model are respectively [2.0850-2.8807-3.5084] × 10 4, wherein maximum coupling sound is sound after separating for No. 1, finds target sound source.
In each sound source signals and sample voice storehouse, specify the matching probability formula of sound source as follows:
C ( X ~ n ) = log [ P ( X ~ n | G c ) ] - - - ( 14 )
In formula:
Figure BDA0000473074190000142
represent by voice after separating
Figure BDA0000473074190000143
the speech characteristic parameter extracting, extracts voice
Figure BDA0000473074190000144
mel frequency cepstral coefficient as voice
Figure BDA0000473074190000145
characteristic parameter;
Figure BDA0000473074190000146
represent to specify in n sound source signals and sample voice storehouse the matching probability of sound source;
G crepresent user nominator's acoustic model parameters;
Figure BDA0000473074190000147
represent to separate the probability that rear voice belong to user nominator sound;
Step 9, to retain sound source signals amplify, complete the amplification to specified voice source in test environment.
In the embodiment of the present invention, the last directivity function that obtains each sound source according to the mixing Feng meter Xiu Si distributed model parameter obtaining, further separate and obtain original sound, if figure (a) in Figure 10 is to as shown in figure (f), after being ideal and separating, obtain the comparison diagram of data, can see that similarity is high.

Claims (3)

1. an intelligent sound disposal route, is characterized in that, comprises the following steps:
Step 1, collecting sample voice segments build sample voice storehouse, and sample voice is carried out to feature extraction, obtain characteristic parameter, and characteristic parameter is trained;
Detailed process is as follows:
Step 1-1, collecting sample voice segments, carry out discretize processing by the voice segments of collection, extracts the Mel frequency cepstral coefficient of voice signal as phonic signal character parameter, and set up gauss hybrid models;
Model formation is as follows:
p ( XIG ) = Σ i = 1 M p i b i ( X ) - - - ( 1 )
Wherein, the probability in the model that p (XIG) expression sample voice characteristic parameter is G in model parameter;
G represents gauss hybrid models parameter set, G={p i, μ i, ∑ i, i=1,2 ..., I;
I represents single Gauss model number in gauss hybrid models;
P irepresent the weight coefficient of i single Gauss model,
Figure FDA0000473074180000012
μ irepresent the mean value vector of i single Gauss model;
irepresent the covariance matrix of i single Gauss model;
X represents sample voice characteristic parameter, X={x 1, x 2..., x t, the number of T representation feature vector;
B i(X) density function of i single Gauss model of expression, b i(X)=N (μ i, ∑ i), N (.) represents the density function of standard Gaussian distribution;
Step 1-2, utilize phonic signal character parameter training gauss hybrid models;
Adopt k means clustering algorithm to carry out cluster to phonic signal character parameter, obtain gauss hybrid models parameter set initial value G 0={ p i 0, μ i 0, ∑ i 0, i=1,2 ..., I; And according to the gauss hybrid models parameter set initial value obtaining, adopt greatest hope algorithm to estimate model, and then obtain gauss hybrid models parameter, complete the training of characteristic parameter;
Step 2, the microphone array collection test environment sound signal that adopts M microphone to form, determine the direction that this ambient sound source of sound number and each sound source wave beam arrive, and sound source is to the incident angle of microphone array;
Detailed process is as follows:
The microphone array of step 2-1, M microphone composition of employing gathers the mixed audio signal of test environment, and the mixed audio signal gathering is carried out to discretize processing, obtains the amplitude of each sampled point;
Step 2-2, the amplitude of each sampled point is carried out to matrixing, obtain the mixed audio matrix that each microphone collects; Above-mentioned mixed audio matrix column number is one, and line number is sampled point number, the amplitude that in matrix, element is each sampled point;
Step 2-3, the mixed audio matrix collecting according to each microphone and microphone number, the estimated value of the vector covariance matrix of the mixed audio signal of acquisition test environment;
The estimated value formula of vector covariance matrix is as follows:
R xx = 1 M Σ m = 1 M X ( m ) X H ( m ) - - - ( 2 )
Wherein, R xxrepresent the estimated value of the vector covariance matrix of the mixed audio signal of test environment;
X (m) represents m the mixed audio matrix that microphone collects;
X h(m) represent m the mixed audio transpose of a matrix matrix that microphone collects;
Step 2-4, the estimated value of vector covariance matrix is carried out to Eigenvalues Decomposition, obtain eigenwert, and eigenwert is sorted from big to small, determine that eigenwert is greater than the number of threshold value, is the number of sound source;
Step 2-5, microphone number is deducted to sound source number obtain noise source number, and then the corresponding noise matrix that obtains;
Step 2-6, obtain the steering vector of microphone array for the orientation angle of array center and the wave beam arrival direction of sound source according to the wavelength of the distance between each microphone and array center, mixed audio signal, microphone, then obtain the angular spectrum function of mixed audio signal according to the steering vector of noise matrix and microphone array;
The angular spectrum function formula of mixed audio signal is as follows:
P ( θ ) = 1 α H ( θ ) V u V H u α ( θ ) - - - ( 3 )
Wherein, P (θ) represents the angular spectrum function of mixed audio signal;
α (θ) represents the steering vector of microphone array, α (θ)=(α 1(θ) ..., α m(θ) ..., α m(θ)), wherein,
Figure FDA0000473074180000023
j represents imaginary unit, k=2 π/λ, and λ represents the wavelength of mixed audio signal, d mrepresent the distance of m microphone and array center,
Figure FDA0000473074180000024
represent the orientation angle of m microphone for array center;
θ represents the wave beam arrival direction of sound source;
α h(θ) transposed matrix of the steering vector of expression microphone array;
V urepresent noise matrix;
V h urepresent noise transpose of a matrix matrix;
Step 2-7, according to the waveform of the angular spectrum function of mixed audio signal, descending multiple peak values of choosing this waveform, select the number of peak value to be the number of sound source;
Step 2-8, determine choose the angle value that peak value is corresponding, obtain the wave beam arrival direction of each sound source;
Step 3, according to the transformational relation between the sound signal of each sound source, sound source and microphone, obtain the acoustic pressure gradient of microphone the microphone array acoustic pressure, microphone array horizontal direction acoustic pressure gradient and the microphone array vertical direction that receive;
Microphone array sound pressure signal formula is as follows:
p w ( t ) = Σ n = 1 N 0.5 Σ m = 1 M h mn ( t ) s n ( t ) - - - ( 4 )
Wherein, p w(t) represent t moment microphone array acoustic pressure;
N represents sound source number;
T represents the time;
S n(t) sound signal of n sound source of expression;
H mn(t) represent the transition matrix between n sound source and m microphone, h mn(t)=p 0(t) α mn(t)), p 0(t) represent the microphone array center acoustic pressure that the t moment is caused by sound wave; α mn(t)) be illustrated in the steering vector of t moment m microphone about n sound source, wherein, θ n(t) the wave beam arrival direction of expression t moment n sound source;
Microphone array horizontal direction acoustic pressure gradient formula is as follows:
Figure FDA0000473074180000032
Wherein, p x(t) represent microphone array horizontal direction acoustic pressure gradient;
The acoustic pressure gradient formula of microphone array vertical direction is as follows:
Figure FDA0000473074180000033
Wherein, p y(t) acoustic pressure gradient of expression microphone array vertical direction;
Step 4, employing Fourier transform are transformed into frequency domain by the acoustic pressure gradient of microphone array center acoustic pressure, microphone array horizontal direction acoustic pressure gradient and microphone array vertical direction from time domain;
Step 5, microphone array acoustic pressure, microphone array horizontal direction gradient and microphone array vertical direction acoustic pressure gradient according in frequency domain, obtain the strength vector formula of the sound pressure signal in frequency field, and then derive strength vector direction;
The strength vector formula of the sound pressure signal in frequency field is:
I ( ω , t ) = 1 ρ 0 c [ Re { p w * ( ω , t ) p x ( ω , t ) } u x + Re { p w * ( ω , t ) p y ( ω , t ) } u y ] - - - ( 7 )
Wherein, I (ω, t) represents the strength vector of the sound pressure signal in frequency field;
ρ 0represent test environment atmospheric density;
C represents the velocity of sound;
Re[.] represent to get real part;
P w *(ω, t) represents the conjugate matrices of the microphone array acoustic pressure in frequency domain;
P x(ω, t) represents the microphone array horizontal direction acoustic pressure gradient in frequency domain;
P y(ω, t) represents the microphone array vertical direction acoustic pressure gradient in frequency domain;
U xrepresent abscissa axis direction unit vector;
U yrepresent axis of ordinates direction unit vector;
Strength vector direction formula is as follows:
γ ( ω , t ) = tan - 1 [ Re { p w * ( ω , t ) p y ( ω , t ) } p w * ( ω , t ) p x ( ω , t ) ] - - - ( 8 )
The strength vector direction of the sound pressure signal of the morbid sound that wherein, γ (ω, t) expression microphone array receives;
Step 6, to strength vector direction add up obtain its probability density distribution, adopt mixing Feng meter Xiu Si to distribute and carry out matching, obtain voice strength vector direction and obey the model parameter of mixing Feng meter Xiu Si distribution, and then obtain the strength vector directivity function of each sound pressure signal;
Detailed process is as follows:
Step 6-1, strength vector direction is added up and obtained its probability density distribution, adopt and mix Feng meter Xiu Si and distribute and carry out matching, obtain the model parameter collection that mixing Feng meter Xiu Si that the strength vector direction of voice obeys distributes;
Described mixing Feng meter Xiu Si distributed model formula is as follows:
g ( θ ) = Σ n = 1 N α n f ( θ ; k n ) - - - ( 10 )
Wherein,
Figure FDA0000473074180000043
represent to mix Feng meter Xiu Si distribution probability density;
α nrepresent the weight of the strength vector directivity function of the sound pressure signal of n sound source;
Figure FDA0000473074180000044
wherein, I 0(k n) n the single order modified Bessel function that sound source is corresponding of expression, k nrepresent the single Feng meter Xiu Si corresponding concentration parameter that distributes that the strength vector direction of n sound source sound pressure signal is obeyed, the inverse of the variance that Feng meter Xiu Si distributes;
Mix Feng meter Xiu Si distribution function parameter set as follows:
Γ={α n,k n},i=1,..,N (11)
Step 6-2, initialization model parameter, obtain initial function parameter set;
Step 6-3, according to the initial model parameter obtaining, adopt greatest hope algorithm to estimate the parameter that obtains mixing Feng meter Xiu Si distributed model;
Step 6-4, according to estimating the mixing Feng meter Xiu Si distributed model parameter that obtains, try to achieve the strength vector directivity function of each sound pressure signal;
Figure FDA0000473074180000045
represent morbid sound orientation angle;
The strength vector directivity function formula of sound pressure signal is as follows:
I n ( θ ; ω , t ) = α n f ( θ ; k n ) - - - ( 12 )
Wherein, represent the strength vector directivity function of n sound source;
Strength vector directivity function and the microphone array acoustic pressure of each sound pressure signal that step 7, basis obtain, obtain each sound source in frequency domain signal, and adopt Fourier inversion that the each sound-source signal in this frequency domain is converted to the sound-source signal in time domain;
The signal formula of each sound source in frequency domain is as follows:
s ~ n ( ω , t ) = p w ( ω , t ) I n ( θ ; ω , t ) - - - ( 13 )
Wherein,
Figure FDA0000473074180000053
(ω, t) represents the frequency domain signal of n the sound-source signal obtaining after mixing voice separates;
Will
Figure FDA0000473074180000054
obtain time-domain signal through Fourier inversion
Figure FDA0000473074180000055
Step 8, calculate the matching probability of specifying sound source in each sound source signals and sample voice storehouse, selecting the sound source of probable value maximum is target sound source, retains this sound source signals, deletes other non-target sound sources;
In each sound source signals and sample voice storehouse, specify the matching probability formula of sound source as follows:
C ( X ~ n ) = log [ P ( X ~ n | G c ) ] - - - ( 14 )
In formula:
Figure FDA0000473074180000057
represent by voice after separating
Figure FDA0000473074180000058
the speech characteristic parameter extracting, extracts voice
Figure FDA0000473074180000059
mel frequency cepstral coefficient as voice characteristic parameter;
represent to specify in n sound source signals and sample voice storehouse the matching probability of sound source;
G crepresent user nominator's acoustic model parameters;
represent to separate the probability that rear voice belong to user nominator sound;
Step 9, to retain sound source signals amplify, complete the amplification to specified voice source in test environment.
2. intelligent sound disposal route according to claim 1, is characterized in that, the threshold value span described in step 2-4 is 10 -2~10 -16.
3. intelligent sound disposal route according to claim 1, is characterized in that, the α described in step 6-1 nget the random number in 0~1, and meet
Figure FDA00004730741800000513
k nget the random number in 1~700.
CN201410081493.6A 2014-03-05 2014-03-05 A kind of intelligent sound processing method Active CN103811020B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410081493.6A CN103811020B (en) 2014-03-05 2014-03-05 A kind of intelligent sound processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410081493.6A CN103811020B (en) 2014-03-05 2014-03-05 A kind of intelligent sound processing method

Publications (2)

Publication Number Publication Date
CN103811020A true CN103811020A (en) 2014-05-21
CN103811020B CN103811020B (en) 2016-06-22

Family

ID=50707692

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410081493.6A Active CN103811020B (en) 2014-03-05 2014-03-05 A kind of intelligent sound processing method

Country Status (1)

Country Link
CN (1) CN103811020B (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104200813A (en) * 2014-07-01 2014-12-10 东北大学 Dynamic blind signal separation method based on real-time prediction and tracking on sound source direction
CN105609099A (en) * 2015-12-25 2016-05-25 重庆邮电大学 Speech recognition pretreatment method based on human auditory characteristic
CN105933820A (en) * 2016-04-28 2016-09-07 冠捷显示科技(中国)有限公司 Automatic positioning method of external wireless sound boxes
CN106128472A (en) * 2016-07-12 2016-11-16 乐视控股(北京)有限公司 The processing method and processing device of singer's sound
CN106205610A (en) * 2016-06-29 2016-12-07 联想(北京)有限公司 A kind of voice information identification method and equipment
CN106448722A (en) * 2016-09-14 2017-02-22 科大讯飞股份有限公司 Sound recording method, device and system
CN107220021A (en) * 2017-05-16 2017-09-29 北京小鸟看看科技有限公司 Phonetic entry recognition methods, device and headset equipment
CN107274895A (en) * 2017-08-18 2017-10-20 京东方科技集团股份有限公司 A kind of speech recognition apparatus and method
CN107527626A (en) * 2017-08-30 2017-12-29 北京嘉楠捷思信息技术有限公司 Audio identification system
CN108198569A (en) * 2017-12-28 2018-06-22 北京搜狗科技发展有限公司 A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing
CN108520756A (en) * 2018-03-20 2018-09-11 北京时代拓灵科技有限公司 A kind of method and device of speaker's speech Separation
CN108630193A (en) * 2017-03-21 2018-10-09 北京嘀嘀无限科技发展有限公司 Audio recognition method and device
CN108694950A (en) * 2018-05-16 2018-10-23 清华大学 A kind of method for identifying speaker based on depth mixed model
CN108735227A (en) * 2018-06-22 2018-11-02 北京三听科技有限公司 A kind of voice signal for being picked up to microphone array carries out the method and system of Sound seperation
CN108766459A (en) * 2018-06-13 2018-11-06 北京联合大学 Target speaker method of estimation and system in a kind of mixing of multi-person speech
CN109505741A (en) * 2018-12-20 2019-03-22 浙江大学 A kind of wind-driven generator blade breakage detection method and device based on rectangular microphone array
CN110288996A (en) * 2019-07-22 2019-09-27 厦门钛尚人工智能科技有限公司 A kind of speech recognition equipment and audio recognition method
CN110310642A (en) * 2018-03-20 2019-10-08 阿里巴巴集团控股有限公司 Method of speech processing, system, client, equipment and storage medium
CN110335626A (en) * 2019-07-09 2019-10-15 北京字节跳动网络技术有限公司 Age recognition methods and device, storage medium based on audio
CN110706688A (en) * 2019-11-11 2020-01-17 广州国音智能科技有限公司 Method, system, terminal and readable storage medium for constructing voice recognition model
CN110867191A (en) * 2018-08-28 2020-03-06 洞见未来科技股份有限公司 Voice processing method, information device and computer program product
CN111028857A (en) * 2019-12-27 2020-04-17 苏州蛙声科技有限公司 Method and system for reducing noise of multi-channel audio and video conference based on deep learning
CN111696570A (en) * 2020-08-17 2020-09-22 北京声智科技有限公司 Voice signal processing method, device, equipment and storage medium
CN111816185A (en) * 2020-07-07 2020-10-23 广东工业大学 Method and device for identifying speaker in mixed voice
CN111899756A (en) * 2020-09-29 2020-11-06 北京清微智能科技有限公司 Single-channel voice separation method and device
WO2021012734A1 (en) * 2019-07-25 2021-01-28 深圳壹账通智能科技有限公司 Audio separation method and apparatus, electronic device and computer-readable storage medium
CN112289335A (en) * 2019-07-24 2021-01-29 阿里巴巴集团控股有限公司 Voice signal processing method and device and pickup equipment
CN114242072A (en) * 2021-12-21 2022-03-25 上海帝图信息科技有限公司 Voice recognition system for intelligent robot
CN114613385A (en) * 2022-05-07 2022-06-10 广州易而达科技股份有限公司 Far-field voice noise reduction method, cloud server and audio acquisition equipment
CN115240689A (en) * 2022-09-15 2022-10-25 深圳市水世界信息有限公司 Target sound determination method, device, computer equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1653519A (en) * 2002-03-20 2005-08-10 高通股份有限公司 Method for robust voice recognition by analyzing redundant features of source signal
US20090150146A1 (en) * 2007-12-11 2009-06-11 Electronics & Telecommunications Research Institute Microphone array based speech recognition system and target speech extracting method of the system
JP2012211768A (en) * 2011-03-30 2012-11-01 Advanced Telecommunication Research Institute International Sound source positioning apparatus
CN103426434A (en) * 2012-05-04 2013-12-04 索尼电脑娱乐公司 Source separation by independent component analysis in conjunction with source direction information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1653519A (en) * 2002-03-20 2005-08-10 高通股份有限公司 Method for robust voice recognition by analyzing redundant features of source signal
US20090150146A1 (en) * 2007-12-11 2009-06-11 Electronics & Telecommunications Research Institute Microphone array based speech recognition system and target speech extracting method of the system
JP2012211768A (en) * 2011-03-30 2012-11-01 Advanced Telecommunication Research Institute International Sound source positioning apparatus
CN103426434A (en) * 2012-05-04 2013-12-04 索尼电脑娱乐公司 Source separation by independent component analysis in conjunction with source direction information

Cited By (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104200813B (en) * 2014-07-01 2017-05-10 东北大学 Dynamic blind signal separation method based on real-time prediction and tracking on sound source direction
CN104200813A (en) * 2014-07-01 2014-12-10 东北大学 Dynamic blind signal separation method based on real-time prediction and tracking on sound source direction
CN105609099A (en) * 2015-12-25 2016-05-25 重庆邮电大学 Speech recognition pretreatment method based on human auditory characteristic
CN105933820A (en) * 2016-04-28 2016-09-07 冠捷显示科技(中国)有限公司 Automatic positioning method of external wireless sound boxes
CN106205610A (en) * 2016-06-29 2016-12-07 联想(北京)有限公司 A kind of voice information identification method and equipment
CN106205610B (en) * 2016-06-29 2019-11-26 联想(北京)有限公司 A kind of voice information identification method and equipment
CN106128472A (en) * 2016-07-12 2016-11-16 乐视控股(北京)有限公司 The processing method and processing device of singer's sound
CN106448722A (en) * 2016-09-14 2017-02-22 科大讯飞股份有限公司 Sound recording method, device and system
CN106448722B (en) * 2016-09-14 2019-01-18 讯飞智元信息科技有限公司 The way of recording, device and system
CN108630193A (en) * 2017-03-21 2018-10-09 北京嘀嘀无限科技发展有限公司 Audio recognition method and device
CN108630193B (en) * 2017-03-21 2020-10-02 北京嘀嘀无限科技发展有限公司 Voice recognition method and device
CN107220021A (en) * 2017-05-16 2017-09-29 北京小鸟看看科技有限公司 Phonetic entry recognition methods, device and headset equipment
CN107274895A (en) * 2017-08-18 2017-10-20 京东方科技集团股份有限公司 A kind of speech recognition apparatus and method
CN107274895B (en) * 2017-08-18 2020-04-17 京东方科技集团股份有限公司 Voice recognition device and method
CN107527626A (en) * 2017-08-30 2017-12-29 北京嘉楠捷思信息技术有限公司 Audio identification system
CN108198569A (en) * 2017-12-28 2018-06-22 北京搜狗科技发展有限公司 A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing
CN108198569B (en) * 2017-12-28 2021-07-16 北京搜狗科技发展有限公司 Audio processing method, device and equipment and readable storage medium
CN108520756A (en) * 2018-03-20 2018-09-11 北京时代拓灵科技有限公司 A kind of method and device of speaker's speech Separation
CN108520756B (en) * 2018-03-20 2020-09-01 北京时代拓灵科技有限公司 Method and device for separating speaker voice
CN110310642A (en) * 2018-03-20 2019-10-08 阿里巴巴集团控股有限公司 Method of speech processing, system, client, equipment and storage medium
CN108694950B (en) * 2018-05-16 2021-10-01 清华大学 Speaker confirmation method based on deep hybrid model
CN108694950A (en) * 2018-05-16 2018-10-23 清华大学 A kind of method for identifying speaker based on depth mixed model
CN108766459A (en) * 2018-06-13 2018-11-06 北京联合大学 Target speaker method of estimation and system in a kind of mixing of multi-person speech
CN108766459B (en) * 2018-06-13 2020-07-17 北京联合大学 Target speaker estimation method and system in multi-user voice mixing
CN108735227A (en) * 2018-06-22 2018-11-02 北京三听科技有限公司 A kind of voice signal for being picked up to microphone array carries out the method and system of Sound seperation
CN108735227B (en) * 2018-06-22 2020-05-19 北京三听科技有限公司 Method and system for separating sound source of voice signal picked up by microphone array
CN110867191A (en) * 2018-08-28 2020-03-06 洞见未来科技股份有限公司 Voice processing method, information device and computer program product
CN109505741A (en) * 2018-12-20 2019-03-22 浙江大学 A kind of wind-driven generator blade breakage detection method and device based on rectangular microphone array
CN110335626A (en) * 2019-07-09 2019-10-15 北京字节跳动网络技术有限公司 Age recognition methods and device, storage medium based on audio
CN110288996A (en) * 2019-07-22 2019-09-27 厦门钛尚人工智能科技有限公司 A kind of speech recognition equipment and audio recognition method
CN112289335A (en) * 2019-07-24 2021-01-29 阿里巴巴集团控股有限公司 Voice signal processing method and device and pickup equipment
WO2021012734A1 (en) * 2019-07-25 2021-01-28 深圳壹账通智能科技有限公司 Audio separation method and apparatus, electronic device and computer-readable storage medium
CN110706688A (en) * 2019-11-11 2020-01-17 广州国音智能科技有限公司 Method, system, terminal and readable storage medium for constructing voice recognition model
CN110706688B (en) * 2019-11-11 2022-06-17 广州国音智能科技有限公司 Method, system, terminal and readable storage medium for constructing voice recognition model
CN111028857A (en) * 2019-12-27 2020-04-17 苏州蛙声科技有限公司 Method and system for reducing noise of multi-channel audio and video conference based on deep learning
CN111028857B (en) * 2019-12-27 2024-01-19 宁波蛙声科技有限公司 Method and system for reducing noise of multichannel audio-video conference based on deep learning
CN111816185A (en) * 2020-07-07 2020-10-23 广东工业大学 Method and device for identifying speaker in mixed voice
CN111696570B (en) * 2020-08-17 2020-11-24 北京声智科技有限公司 Voice signal processing method, device, equipment and storage medium
CN111696570A (en) * 2020-08-17 2020-09-22 北京声智科技有限公司 Voice signal processing method, device, equipment and storage medium
CN111899756A (en) * 2020-09-29 2020-11-06 北京清微智能科技有限公司 Single-channel voice separation method and device
CN114242072A (en) * 2021-12-21 2022-03-25 上海帝图信息科技有限公司 Voice recognition system for intelligent robot
CN114613385A (en) * 2022-05-07 2022-06-10 广州易而达科技股份有限公司 Far-field voice noise reduction method, cloud server and audio acquisition equipment
CN115240689A (en) * 2022-09-15 2022-10-25 深圳市水世界信息有限公司 Target sound determination method, device, computer equipment and medium
CN115240689B (en) * 2022-09-15 2022-12-02 深圳市水世界信息有限公司 Target sound determination method, target sound determination device, computer equipment and medium

Also Published As

Publication number Publication date
CN103811020B (en) 2016-06-22

Similar Documents

Publication Publication Date Title
CN103811020A (en) Smart voice processing method
Zhang et al. Deep learning based binaural speech separation in reverberant environments
CN110797043B (en) Conference voice real-time transcription method and system
CN105405439B (en) Speech playing method and device
CN107919133A (en) For the speech-enhancement system and sound enhancement method of destination object
CN102565759B (en) Binaural sound source localization method based on sub-band signal to noise ratio estimation
CN102438189B (en) Dual-channel acoustic signal-based sound source localization method
CN106373589B (en) A kind of ears mixing voice separation method based on iteration structure
CN111429939B (en) Sound signal separation method of double sound sources and pickup
CN104464750A (en) Voice separation method based on binaural sound source localization
CN105869651A (en) Two-channel beam forming speech enhancement method based on noise mixed coherence
CN106653048B (en) Single channel sound separation method based on voice model
CN109859749A (en) A kind of voice signal recognition methods and device
CN109935226A (en) A kind of far field speech recognition enhancing system and method based on deep neural network
CN107507625A (en) Sound source distance determines method and device
CN106031196A (en) Signal-processing device, method, and program
Dadvar et al. Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target
CN113593601A (en) Audio-visual multi-modal voice separation method based on deep learning
CN105609099A (en) Speech recognition pretreatment method based on human auditory characteristic
Shamsoddini et al. A sound segregation algorithm for reverberant conditions
Talagala et al. Binaural localization of speech sources in the median plane using cepstral HRTF extraction
Krijnders et al. Tone-fit and MFCC scene classification compared to human recognition
Shabtai et al. The effect of reverberation on the performance of cepstral mean subtraction in speaker verification
CN107578784B (en) Method and device for extracting target source from audio
Venkatesan et al. Full sound source localization of binaural signals

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant