CN103811020A - Smart voice processing method - Google Patents
Smart voice processing method Download PDFInfo
- Publication number
- CN103811020A CN103811020A CN201410081493.6A CN201410081493A CN103811020A CN 103811020 A CN103811020 A CN 103811020A CN 201410081493 A CN201410081493 A CN 201410081493A CN 103811020 A CN103811020 A CN 103811020A
- Authority
- CN
- China
- Prior art keywords
- sound
- sound source
- represent
- microphone
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention relates to a smart voice processing method and belongs to the technical field of information processing. The smart voice processing method is characterized by achieving smart identification of identities of multiple talkers in the multi-person voice environment and separating mixed voices by building a talker voice model base so as to obtain independent voice of each talker, amplifying voices of the talkers to be heard according to user demands and eliminating voices of talkers not required by the users. Different from the traditional hearing aid, the smart voice processing method can be used for automatically providing needed voices for the users according to user demands, reduces interference of non-target voices except for noise, and has individuation, interaction and intelligentization.
Description
Technical field
The invention belongs to technical field of information processing, be specifically related to a kind of intelligent sound disposal route.
Background technology
According to the up-to-date assessment data demonstration of the World Health Organization (WHO) (WHO) in 2013 issue, the whole world has at present 3.6 hundred million people and has dysaudia in various degree, accounts for 5% of global total population.The use of hearing aid product can compensate Deaf and Hard of Hearing Talents's hearing loss effectively, improves their live and work quality.But the research of current hearing assistance system correlation technique still concentrates on squelch and source magnitude of sound is amplified two aspects, seldom relates to modeling and many sound sources Technology of Auto Separation based on sound characteristic.In the time that practical application scene is very complicated, for example: when party, multiple speakers are sounding simultaneously, or even with background sounds such as music, because hearing assistance system cannot be isolated interested target voice from mixed Speech input, the hearing burden that simple intensity of sound expanded functionality can only increase user even injures, and can not bring effective Speech input and understanding.Therefore, for the technological deficiency of current hearing assistance system, design a novel hearing assistance system specific sound object recognition function, more intelligent and personalized that has, have very important significance.
Summary of the invention
The deficiency existing for prior art, the present invention proposes a kind of intelligent sound disposal route, guarantees that to reach user obtains pure sound according to the demand of oneself and receives and amplify, and realizes the intellectuality of hearing assistance system, interactive and personalized object.
A kind of intelligent sound disposal route, comprises the following steps:
Detailed process is as follows:
Step 1-1, collecting sample voice segments, carry out discretize processing by the voice segments of collection, extracts the Mel frequency cepstral coefficient of voice signal as phonic signal character parameter, and set up gauss hybrid models;
Model formation is as follows:
Wherein, the probability in the model that p (XIG) expression sample voice characteristic parameter is G in model parameter;
G represents gauss hybrid models parameter set, G={p
i, μ
i, ∑
i, i=1,2 ..., I;
I represents single Gauss model number in gauss hybrid models;
μ
irepresent the mean value vector of i single Gauss model;
∑
irepresent the covariance matrix of i single Gauss model;
X represents sample voice characteristic parameter, X={x
1, x
2..., x
t, the number of T representation feature vector;
B
i(X) density function of i single Gauss model of expression, b
i(X)=N (μ
i, ∑
i), N (.) represents the density function of standard Gaussian distribution;
Step 1-2, utilize phonic signal character parameter training gauss hybrid models;
Adopt k means clustering algorithm to carry out cluster to phonic signal character parameter, obtain gauss hybrid models parameter set initial value G
0={ p
i 0, μ
i 0, ∑
i 0, i=1,2 ..., I; And according to the gauss hybrid models parameter set initial value obtaining, adopt greatest hope algorithm to estimate model, and then obtain gauss hybrid models parameter, complete the training of characteristic parameter;
Detailed process is as follows:
The microphone array of step 2-1, M microphone composition of employing gathers the mixed audio signal of test environment, and the mixed audio signal gathering is carried out to discretize processing, obtains the amplitude of each sampled point;
Step 2-2, the amplitude of each sampled point is carried out to matrixing, obtain the mixed audio matrix that each microphone collects; Above-mentioned mixed audio matrix column number is one, and line number is sampled point number, the amplitude that in matrix, element is each sampled point;
Step 2-3, the mixed audio matrix collecting according to each microphone and microphone number, the estimated value of the vector covariance matrix of the mixed audio signal of acquisition test environment;
The estimated value formula of vector covariance matrix is as follows:
Wherein, R
xxrepresent the estimated value of the vector covariance matrix of the mixed audio signal of test environment;
X (m) represents m the mixed audio matrix that microphone collects;
X
h(m) represent m the mixed audio transpose of a matrix matrix that microphone collects;
Step 2-4, the estimated value of vector covariance matrix is carried out to Eigenvalues Decomposition, obtain eigenwert, and eigenwert is sorted from big to small, determine that eigenwert is greater than the number of threshold value, is the number of sound source;
Step 2-5, microphone number is deducted to sound source number obtain noise source number, and then the corresponding noise matrix that obtains;
Step 2-6, obtain the steering vector of microphone array for the orientation angle of array center and the wave beam arrival direction of sound source according to the wavelength of the distance between each microphone and array center, mixed audio signal, microphone, then obtain the angular spectrum function of mixed audio signal according to the steering vector of noise matrix and microphone array;
The angular spectrum function formula of mixed audio signal is as follows:
Wherein, P (θ) represents the angular spectrum function of mixed audio signal;
α (θ) represents the steering vector of microphone array, α (θ)=(α
1(θ) ..., α
m(θ) ..., α
m(θ)), wherein,
j represents imaginary unit, k=2 π/λ, and λ represents the wavelength of mixed audio signal, d
mrepresent the distance of m microphone and array center,
represent the orientation angle of m microphone for array center;
θ represents the wave beam arrival direction of sound source;
α
h(θ) transposed matrix of the steering vector of expression microphone array;
V
urepresent noise matrix;
V
h urepresent noise transpose of a matrix matrix;
Step 2-7, according to the waveform of the angular spectrum function of mixed audio signal, descending multiple peak values of choosing this waveform, select the number of peak value to be the number of sound source;
Step 2-8, determine choose the angle value that peak value is corresponding, obtain the wave beam arrival direction of each sound source;
Microphone array sound pressure signal formula is as follows:
Wherein, p
w(t) represent t moment microphone array acoustic pressure;
N represents sound source number;
T represents the time;
S
n(t) sound signal of n sound source of expression;
H
mn(t) represent the transition matrix between n sound source and m microphone, h
mn(t)=p
0(t) α
m(θ
n(t)), p
0(t) represent the microphone array center acoustic pressure that the t moment is caused by sound wave; α
m(θ
n(t)) be illustrated in the steering vector of t moment m microphone about n sound source, wherein, θ
n(t) the wave beam arrival direction of expression t moment n sound source;
Microphone array horizontal direction acoustic pressure gradient formula is as follows:
Wherein, p
x(t) represent microphone array horizontal direction acoustic pressure gradient;
The acoustic pressure gradient formula of microphone array vertical direction is as follows:
Wherein, p
y(t) acoustic pressure gradient of expression microphone array vertical direction;
Step 5, microphone array acoustic pressure, microphone array horizontal direction gradient and microphone array vertical direction acoustic pressure gradient according in frequency domain, obtain the strength vector formula of the sound pressure signal in frequency field, and then derive strength vector direction;
The strength vector formula of the sound pressure signal in frequency field is:
Wherein, I (ω, t) represents the strength vector of the sound pressure signal in frequency field;
P
0represent test environment atmospheric density;
C represents the velocity of sound;
Re[.] represent to get real part;
P
w *(ω, t) represents the conjugate matrices of the microphone array acoustic pressure in frequency domain;
P
x(ω, t) represents the microphone array horizontal direction acoustic pressure gradient in frequency domain;
P
y(ω, t) represents the microphone array vertical direction acoustic pressure gradient in frequency domain;
U
xrepresent abscissa axis direction unit vector;
U
yrepresent axis of ordinates direction unit vector;
Strength vector direction formula is as follows:
The strength vector direction of the sound pressure signal of the morbid sound that wherein, γ (ω, t) expression microphone array receives;
Step 6, to strength vector direction add up obtain its probability density distribution, adopt mixing Feng meter Xiu Si to distribute and carry out matching, obtain voice strength vector direction and obey the model parameter of mixing Feng meter Xiu Si distribution, and then obtain the strength vector directivity function of each sound pressure signal;
Detailed process is as follows:
Step 6-1, strength vector direction is added up and obtained its probability density distribution, adopt and mix Feng meter Xiu Si and distribute and carry out matching, obtain the model parameter collection that mixing Feng meter Xiu Si that the strength vector direction of voice obeys distributes;
Described mixing Feng meter Xiu Si distributed model formula is as follows:
Wherein,
represent to mix Feng meter Xiu Si distribution probability density;
α
nrepresent the weight of the strength vector directivity function of the sound pressure signal of n sound source;
wherein, I
0(k
n) n the single order modified Bessel function that sound source is corresponding of expression, k
nrepresent the single Feng meter Xiu Si corresponding concentration parameter that distributes that the strength vector direction of n sound source sound pressure signal is obeyed, the inverse of the variance that Feng meter Xiu Si distributes;
Mix Feng meter Xiu Si distribution function parameter set as follows:
Γ={α
n,k
n},i=1,...,N (11)
Step 6-2, initialization model parameter, obtain initial function parameter set;
Step 6-3, according to the initial model parameter obtaining, adopt greatest hope algorithm to estimate the parameter that obtains mixing Feng meter Xiu Si distributed model;
Step 6-4, according to estimating the mixing Feng meter Xiu Si distributed model parameter that obtains, try to achieve the strength vector directivity function of each sound pressure signal;
The strength vector directivity function formula of sound pressure signal is as follows:
Strength vector directivity function and the microphone array acoustic pressure of each sound pressure signal that step 7, basis obtain, obtain each sound source in frequency domain signal, and adopt Fourier inversion that the each sound-source signal in this frequency domain is converted to the sound-source signal in time domain;
The signal formula of each sound source in frequency domain is as follows:
Wherein,
represent the frequency domain signal of n the sound-source signal obtaining after mixing voice separates;
Step 8, calculate the matching probability of specifying sound source in each sound source signals and sample voice storehouse, selecting the sound source of probable value maximum is target sound source, retains this sound source signals, deletes other non-target sound sources;
In each sound source signals and sample voice storehouse, specify the matching probability formula of sound source as follows:
In formula:
represent by voice after separating
the speech characteristic parameter extracting, extracts voice
mel frequency cepstral coefficient as voice
characteristic parameter;
represent to specify in n sound source signals and sample voice storehouse the matching probability of sound source;
G
crepresent user nominator's acoustic model parameters;
represent to separate the probability that rear voice belong to user nominator sound;
Step 9, to retain sound source signals amplify, complete the amplification to specified voice source in test environment.
Threshold value span described in step 2-4 is 10
-2~10
-16.
Advantage of the present invention:
A kind of intelligent sound disposal route of the present invention, talk with people's sound model bank by foundation, the identity that realize the multiple dialogue of Intelligent Recognition people under multi-person speech environment separate mixing voice simultaneously and obtain each dialogue people's independent voice, are the voice that the dialogue people that non-user requires eliminated in voice that user amplifies the dialogue people that will listen to simultaneously according to user's request; Different from traditional hearing aid, thus the method can, according to individual subscriber demand automatically for user provides its required sound, reduce the interference of the non-target voice except noise, has embodied the personalization, interactive and intelligent of the method.
Accompanying drawing explanation
Fig. 1 is the intelligent sound process flow figure of an embodiment of the present invention;
Fig. 2 is the modeling sound source data schematic diagram of an embodiment of the present invention, and wherein, figure (a) represents the sound of first man
Schematic diagram data, figure (b) represents second people's voice data schematic diagram, figure (c) represents the 3rd people's voice data schematic diagram;
Fig. 3 is the sound source data schematic diagram of an embodiment of the present invention for sound mix, wherein, figure (a) represents the schematic diagram data of first sound source of sound, figure (b) represents the schematic diagram data of second sound source of sound, and figure (c) represents the schematic diagram data of the 3rd sound source;
Fig. 4 is the microphone array schematic diagram of an embodiment of the present invention;
Fig. 5 is four schematic diagram datas that microphone receives of an embodiment of the present invention, wherein, figure (a) represents the morbid sound signal schematic representation that first microphone receives, figure (b) represents second morbid sound signal schematic representation that microphone receives, figure (c) represents the 3rd the morbid sound signal schematic representation that microphone receives, and figure (d) represents the 4th the morbid sound signal schematic representation that microphone receives;
Fig. 6 is the schematic diagram after the data sampling that receives of four microphones of an embodiment of the present invention, wherein, figure (a) represents schematic diagram after morbid sound signal sampling that first microphone receives, schematic diagram after the morbid sound signal sampling that second microphone of figure (b) expression receives, schematic diagram after the morbid sound signal sampling that the 3rd microphone of figure (c) expression receives, schematic diagram after the morbid sound signal sampling that the 4th microphone of figure (d) expression receives;
Fig. 7 is the Estimation of Spatial Spectrum schematic diagram of the mixed signal of an embodiment of the present invention;
Fig. 8 is the morbid sound direction vector distribution probability density map of an embodiment of the present invention;
Fig. 9 is that the maximum likelihood of an embodiment of the present invention estimates to mix Feng meter Xiu Si model schematic diagram;
Figure 10 be an embodiment of the present invention desirable voice with separate after obtain Speech comparison figure, wherein, figure (a) is the original sound signal of first sound source of sound, figure (b) is the original sound signal of first sound source of sound after separating, figure (c) is the original sound signal of second sound source of sound, figure (d) is the original sound signal of second sound source of sound after separating, figure (e) is the original sound signal of the 3rd sound source, and figure (f) is the original sound signal of the 3rd sound source after separating.
Embodiment
Below in conjunction with accompanying drawing, an embodiment of the present invention is described further.
In the embodiment of the present invention, model system is mainly divided into pronunciation modeling module and two modules of the dynamic real-time processing module of voice, wherein pronunciation modeling module realizes speaker's pronunciation modeling, the dynamic real-time processing module of voice realizes under complicated voice environment, mix the direction of voice and locate and separate, mixing voice identification and extraction (being the extraction amplification of target sound and the shielding of all the other sound).
A kind of intelligent sound disposal route, method flow diagram as shown in Figure 1, comprises the following steps:
Step 1-1, record sample voice section in quiet indoor environment, the voice segments of collection is carried out to discretize processing, extract the Mel frequency cepstral coefficient (MFCC) of voice signal as phonic signal character parameter, and set up gauss hybrid models;
In the embodiment of the present invention, adopt windows to carry sound-track engraving apparatus and record respectively 3 people's voice, everyone records 2 sections, and wherein 1 section separates and identification for sound, other 1 section for speaker's pronunciation modeling, Offered target sound source is number one sound source; If figure (a) in Fig. 2 is to as shown in figure (c), get respectively one section of voice of three people, for it sets up gauss hybrid models, and the model parameter obtaining is deposited in model bank.
Model formation is as follows:
Wherein, the probability in the model that p (XIG) expression sample voice characteristic parameter is G in model parameter;
G represents gauss hybrid models parameter set, G={p
i, μ
i, ∑
i, i=1,2 ..., I;
I represents single Gauss model number in gauss hybrid models;
P
irepresent the weight coefficient of i single Gauss model,
μ
irepresent the mean value vector of i single Gauss model;
∑
irepresent the covariance matrix of i single Gauss model;
X represents sample voice characteristic parameter, X={x
1, x
2..., x
t, the number of T representation feature vector;
B
i(X) density function of i single Gauss model of expression, b
i(X)=N (μ
i, ∑
i), N (.) represents the density function of standard Gaussian distribution;
Step 1-2, utilize phonic signal character parameter training gauss hybrid models;
Adopt k means clustering algorithm to carry out cluster to phonic signal character parameter, obtain gauss hybrid models parameter set initial value G
0={ p
i 0, μ
i 0, ∑
i 0, i=1,2 ..., I;
In this example, adopt 16 single Gauss model composition gauss hybrid models.16 vectors of random generation are as cluster centre, each vector length is number of speech frames, the characteristic parameter of every frame is assigned to some in 16 cluster centres by minimum distance criterion, then recalculate the central value of each cluster centre vector, set it as new cluster centre, finish until algorithm convergence is calculated, the cluster centre now obtaining is exactly initial Gaussian mixture model Mean Parameters μ
i 0, ask characteristic parameter covariance to obtain initial ∑
i 0, p
i 0initial value is all
Adopt greatest hope algorithm to estimate model, its principle is exactly the maximum probability that observed reading occurs, by respectively to pattern function about parameter p
i 0, μ
i 0, ∑
i 0the differentiate calculating parameter p that equals zero
i, μ
i, ∑
irevaluation value, until algorithm convergence calculate finish, now complete the training of characteristic parameter.
Detailed process is as follows:
Step 2-1, the microphone array collection test environment sound signal that adopts 4 microphones to form, and the mixed audio signal gathering is carried out to discretize processing, obtain the amplitude of each sampled point;
In the embodiment of the present invention, if figure (a) in Fig. 3 is to as shown in figure (c), get respectively three people's another section of voice as the sound data sources of mixed audio, adopt 4 microphones, the array of these 4 microphone compositions as shown in Figure 4, a microphone and No. two microphones are symmetrically distributed in horizontal direction both sides centered by array center, are symmetrically distributed in vertical direction both sides No. three with No. four microphones centered by array center; The blended data of 4 microphone receptions is if figure (a) in Fig. 5 is to as shown in figure (d), the voice that 4 microphones are received carry out discretize processing, the frequency of discretize is 12500Hz, and the amplitude of definite each sampled point, if figure (a) in Fig. 6 is to as shown in figure (d).
Step 2-2, the amplitude of each sampled point is carried out to matrixing, obtain the mixed audio matrix that each microphone collects; Above-mentioned mixed audio matrix column number is one, and line number is sampled point number, the amplitude that in matrix, element is each sampled point;
Step 2-3, the mixed audio matrix collecting according to each microphone and microphone number, the estimated value of the vector covariance matrix of the mixed audio signal of acquisition test environment;
The estimated value formula of vector covariance matrix is as follows:
Wherein, R
xxrepresent the estimated value of the vector covariance matrix of the mixed audio signal of test environment;
X (m) represents m the mixed audio matrix that microphone collects;
XH (m) represents m the mixed audio transpose of a matrix matrix that microphone collects;
In step 2-4, this example, the estimated value of vector covariance matrix is carried out to Eigenvalues Decomposition, obtain eigenwert [0.0000 0.0190 0.0363 0.1128], and eigenwert is sorted from big to small, with threshold value 10
-7relatively, obtain 3 eigenwerts, therefore sound source number is 3;
Step 2-5, microphone number is deducted to sound source number obtain noise source number, and then the corresponding noise matrix that obtains;
In the embodiment of the present invention, the eigenwert equating with sound source number 3 and characteristic of correspondence vector are regarded as to signal section space, remaining 4-3,1 eigenwert and proper vector are regarded noise section space as, be that noise source number is 1, the element corresponding according to noise characteristic value can obtain noise matrix
V
u=[-0.1218-0.4761i-0.1564+0.4659i-0.5070-0.0374i-0.5084];
Step 2-6, obtain the steering vector of microphone array for the orientation angle of array center and the wave beam arrival direction of sound source according to the wavelength of the distance between each microphone and array center, mixed audio signal, microphone, then obtain the angular spectrum function of mixed audio signal according to the steering vector of noise matrix and microphone array;
As shown in Figure 4, the distance of each microphone and array center is all 0.02m; In the embodiment of the present invention, the wavelength of mixed audio signal is 30000; No. one microphone is 0 ° for the orientation angle of array center, and No. two microphone is 180 ° for the orientation angle of array center, and No. three microphone is 90 ° for the orientation angle of array center, and No. one microphone is 270 ° for the orientation angle of array center;
The angular spectrum function formula of mixed audio signal is as follows:
Wherein, P (θ) represents the angular spectrum function of mixed audio signal;
α (θ) represents the steering vector of microphone array, α (θ)=(α
1(θ), α
2(θ), α
3(θ), α
4(θ)), wherein, α
1(θ)=e
jk0.02cos (0 °-θ), α
2(θ)=e
jk002cos (180 °-θ), α
3(θ)=e
jk002cos (90 °-θ), α
4(θ)=e
jk002cos (270 °-θ), j represents imaginary unit, k=2 π/λ, and λ represents the wavelength of mixed audio signal;
θ represents the wave beam arrival direction of sound source;
α
h(θ) transposed matrix of the steering vector of expression microphone array;
V
urepresent noise matrix;
V
h urepresent noise transpose of a matrix matrix;
Step 2-7, according to the waveform of the angular spectrum function of mixed audio signal, descending multiple peak values of choosing this waveform, select the number of peak value to be the number of sound source;
Step 2-8, determine choose the angle value that peak value is corresponding, obtain the wave beam arrival direction of each sound source;
As shown in Figure 7, the waveform of the angular spectrum function P (θ) of mixed audio signal, the wave beam arrival direction of 3 sound sources that obtain existing in this morbid sound is respectively [50 °, 200 °, 300 °].
Microphone array acoustic pressure formula is as follows:
Wherein, p
w(t) represent t moment microphone array acoustic pressure;
N represents sound source number;
T represents the time;
S
n(t) sound signal of n sound source of expression;
H
mn(t) represent the transition matrix between n sound source and m microphone, h
mn(t)=p
0(t) α
m(θ
n(t)), p
0(t) represent the microphone array center acoustic pressure that the t moment is caused by sound wave; α
m(θ
n(t)) be illustrated in the steering vector of t moment m microphone about n sound source, wherein, θ
n(t) the wave beam arrival direction of expression t moment n sound source;
Microphone array horizontal direction acoustic pressure gradient formula is as follows:
Wherein, p
x(t) represent microphone array horizontal direction acoustic pressure gradient;
The acoustic pressure gradient formula of microphone array vertical direction is as follows:
Wherein, p
y(t) acoustic pressure gradient of expression microphone array vertical direction;
Step 5, microphone array acoustic pressure, microphone array horizontal direction gradient and microphone array vertical direction acoustic pressure gradient according in frequency domain, obtain the strength vector formula of the sound pressure signal in frequency field, and then draw strength vector direction;
The strength vector formula of the sound pressure signal in frequency field is:
The strength vector formula of the sound pressure signal in frequency field is:
Wherein, I (ω, t) represents the strength vector of the sound pressure signal in frequency field;
ρ
0represent test environment atmospheric density;
C represents the velocity of sound;
Re[.] represent to get real part;
P
w *(ω, t) represents the conjugate matrices of the microphone array acoustic pressure in frequency domain;
P
x(ω, t) represents the microphone array horizontal direction acoustic pressure gradient in frequency domain;
P
y(ω, t) represents the microphone array vertical direction acoustic pressure gradient in frequency domain;
U
xrepresent abscissa axis direction unit vector;
U
yrepresent axis of ordinates direction unit vector;
Strength vector direction formula is as follows:
The strength vector direction of the sound pressure signal of the morbid sound that wherein, γ (ω, t) expression microphone array receives;
Step 6, to strength vector direction add up obtain its probability density distribution, adopt mixing Feng meter Xiu Si to distribute and carry out matching, obtain voice strength vector direction and obey the model parameter of mixing Feng meter Xiu Si distribution, and then obtain the strength vector directivity function of each sound pressure signal;
Detailed process is as follows:
Step 6-1, strength vector direction is added up and obtained its probability density distribution, adopt and mix Feng meter Xiu Si and distribute and carry out matching, obtain the model parameter collection that mixing Feng meter Xiu Si that the strength vector direction of voice obeys distributes;
In the embodiment of the present invention, as shown in Figure 8, the distribution probability density map of γ (ω, t); The mixing Feng meter Xiu Si that can obtain meeting this probability density distribution according to above-mentioned required sound source number and angle distributes by 3 single Feng meter Xiu Si distribution and constitutions, and these three center of distribution angles are respectively [50 °, 200 °, 300 °].
Described mixing Feng meter Xiu Si distributed model formula is as follows:
α
nrepresent the weight of the strength vector directivity function of the sound pressure signal of n sound source;
wherein, I
0(k
n) n the single order modified Bessel function that sound source is corresponding of expression, k
nrepresent the single Feng meter Xiu Si corresponding concentration parameter that distributes that the strength vector direction of n sound source sound pressure signal is obeyed, the inverse of the variance that Feng meter Xiu Si distributes;
Mix Feng meter Xiu Si distribution function parameter set as follows:
Γ={α
n,k
n},i=1,2,3 (11)
Step 6-2, initialization model parameter, obtain initial function parameter set;
In the embodiment of the present invention, α value is [1/3,1/3,1/3], k value [8,6,3];
Step 6-3, according to the initial model parameter obtaining, set up initial mixing Feng meter Xiu Si distribution function, obtain function formula and be:
Adopt greatest hope algorithm to estimate the parameter that obtains mixing Feng meter Xiu Si distributed model, its principle is exactly the maximum probability of observed reading appearance, by pattern function about the equal zero revaluation value of calculating parameter α and k of parameter alpha and k differentiate,
Using γ (ω, t) as
substitution
take the logarithm and obtain initial log-likelihood value-3.0249e+004, distribute and account for the alpha parameter [0.2267 that mixes ratio that Feng meter Xiu Si distributes and can obtain revaluation by calculating each current single Feng meter Xiu Si, 0.2817, 0.4516], the value that while obtains revaluation k according to differentiate parameters obtained k acquiring method is [5.1498, 4.0061, 3.1277], now can obtain new log-likelihood value is-2.9887e+004, new and old likelihood value difference is got threshold value 0.1 by 362.3362 much larger than threshold value, therefore give old likelihood value by new likelihood value assignment, and then again repeat just now step by these two the revaluation parameters that newly obtain and think algorithm convergence until new and old likelihood value is less than threshold value, in this example, finally obtain alpha parameter [0.2689, 0.2811, 0.4500], the value of k is [4.3508, 3.3601, 2.8332], now obtain and met the mixing Feng meter Xiu Si distribution function that strength vector direction distributes, being illustrated in figure 9 the mixing Feng meter Xiu Si obtaining distributes.
Step 6-4, according to estimating the mixing Feng meter Xiu Si distributed model parameter that obtains, try to achieve the strength vector directivity function of each sound pressure signal;
The strength vector directivity function formula of sound pressure signal is as follows:
Strength vector directivity function and the microphone array acoustic pressure of each sound pressure signal that step 7, basis obtain, obtain each sound source in frequency domain signal, and adopt Fourier inversion that the each sound-source signal in this frequency domain is converted to the sound-source signal in time domain;
The signal formula of each sound source in frequency domain is as follows:
Wherein,
represent the frequency domain signal of n the sound-source signal obtaining after mixing voice separates;
Step 8, calculate the matching probability of specifying sound source in each sound source signals and sample voice storehouse, think that the sound source of probable value maximum is target sound source, retain this sound source signals, delete other non-target sound sources;
In the embodiment of the present invention, suppose that first man is target sound source, three voice after final separation and the matching probability logarithm value of this target sound model are respectively [2.0850-2.8807-3.5084] × 10
4, wherein maximum coupling sound is sound after separating for No. 1, finds target sound source.
In each sound source signals and sample voice storehouse, specify the matching probability formula of sound source as follows:
In formula:
represent by voice after separating
the speech characteristic parameter extracting, extracts voice
mel frequency cepstral coefficient as voice
characteristic parameter;
represent to specify in n sound source signals and sample voice storehouse the matching probability of sound source;
G
crepresent user nominator's acoustic model parameters;
Step 9, to retain sound source signals amplify, complete the amplification to specified voice source in test environment.
In the embodiment of the present invention, the last directivity function that obtains each sound source according to the mixing Feng meter Xiu Si distributed model parameter obtaining, further separate and obtain original sound, if figure (a) in Figure 10 is to as shown in figure (f), after being ideal and separating, obtain the comparison diagram of data, can see that similarity is high.
Claims (3)
1. an intelligent sound disposal route, is characterized in that, comprises the following steps:
Step 1, collecting sample voice segments build sample voice storehouse, and sample voice is carried out to feature extraction, obtain characteristic parameter, and characteristic parameter is trained;
Detailed process is as follows:
Step 1-1, collecting sample voice segments, carry out discretize processing by the voice segments of collection, extracts the Mel frequency cepstral coefficient of voice signal as phonic signal character parameter, and set up gauss hybrid models;
Model formation is as follows:
Wherein, the probability in the model that p (XIG) expression sample voice characteristic parameter is G in model parameter;
G represents gauss hybrid models parameter set, G={p
i, μ
i, ∑
i, i=1,2 ..., I;
I represents single Gauss model number in gauss hybrid models;
μ
irepresent the mean value vector of i single Gauss model;
∑
irepresent the covariance matrix of i single Gauss model;
X represents sample voice characteristic parameter, X={x
1, x
2..., x
t, the number of T representation feature vector;
B
i(X) density function of i single Gauss model of expression, b
i(X)=N (μ
i, ∑
i), N (.) represents the density function of standard Gaussian distribution;
Step 1-2, utilize phonic signal character parameter training gauss hybrid models;
Adopt k means clustering algorithm to carry out cluster to phonic signal character parameter, obtain gauss hybrid models parameter set initial value G
0={ p
i 0, μ
i 0, ∑
i 0, i=1,2 ..., I; And according to the gauss hybrid models parameter set initial value obtaining, adopt greatest hope algorithm to estimate model, and then obtain gauss hybrid models parameter, complete the training of characteristic parameter;
Step 2, the microphone array collection test environment sound signal that adopts M microphone to form, determine the direction that this ambient sound source of sound number and each sound source wave beam arrive, and sound source is to the incident angle of microphone array;
Detailed process is as follows:
The microphone array of step 2-1, M microphone composition of employing gathers the mixed audio signal of test environment, and the mixed audio signal gathering is carried out to discretize processing, obtains the amplitude of each sampled point;
Step 2-2, the amplitude of each sampled point is carried out to matrixing, obtain the mixed audio matrix that each microphone collects; Above-mentioned mixed audio matrix column number is one, and line number is sampled point number, the amplitude that in matrix, element is each sampled point;
Step 2-3, the mixed audio matrix collecting according to each microphone and microphone number, the estimated value of the vector covariance matrix of the mixed audio signal of acquisition test environment;
The estimated value formula of vector covariance matrix is as follows:
Wherein, R
xxrepresent the estimated value of the vector covariance matrix of the mixed audio signal of test environment;
X (m) represents m the mixed audio matrix that microphone collects;
X
h(m) represent m the mixed audio transpose of a matrix matrix that microphone collects;
Step 2-4, the estimated value of vector covariance matrix is carried out to Eigenvalues Decomposition, obtain eigenwert, and eigenwert is sorted from big to small, determine that eigenwert is greater than the number of threshold value, is the number of sound source;
Step 2-5, microphone number is deducted to sound source number obtain noise source number, and then the corresponding noise matrix that obtains;
Step 2-6, obtain the steering vector of microphone array for the orientation angle of array center and the wave beam arrival direction of sound source according to the wavelength of the distance between each microphone and array center, mixed audio signal, microphone, then obtain the angular spectrum function of mixed audio signal according to the steering vector of noise matrix and microphone array;
The angular spectrum function formula of mixed audio signal is as follows:
Wherein, P (θ) represents the angular spectrum function of mixed audio signal;
α (θ) represents the steering vector of microphone array, α (θ)=(α
1(θ) ..., α
m(θ) ..., α
m(θ)), wherein,
j represents imaginary unit, k=2 π/λ, and λ represents the wavelength of mixed audio signal, d
mrepresent the distance of m microphone and array center,
represent the orientation angle of m microphone for array center;
θ represents the wave beam arrival direction of sound source;
α
h(θ) transposed matrix of the steering vector of expression microphone array;
V
urepresent noise matrix;
V
h urepresent noise transpose of a matrix matrix;
Step 2-7, according to the waveform of the angular spectrum function of mixed audio signal, descending multiple peak values of choosing this waveform, select the number of peak value to be the number of sound source;
Step 2-8, determine choose the angle value that peak value is corresponding, obtain the wave beam arrival direction of each sound source;
Step 3, according to the transformational relation between the sound signal of each sound source, sound source and microphone, obtain the acoustic pressure gradient of microphone the microphone array acoustic pressure, microphone array horizontal direction acoustic pressure gradient and the microphone array vertical direction that receive;
Microphone array sound pressure signal formula is as follows:
Wherein, p
w(t) represent t moment microphone array acoustic pressure;
N represents sound source number;
T represents the time;
S
n(t) sound signal of n sound source of expression;
H
mn(t) represent the transition matrix between n sound source and m microphone, h
mn(t)=p
0(t) α
m(θ
n(t)), p
0(t) represent the microphone array center acoustic pressure that the t moment is caused by sound wave; α
m(θ
n(t)) be illustrated in the steering vector of t moment m microphone about n sound source, wherein, θ
n(t) the wave beam arrival direction of expression t moment n sound source;
Microphone array horizontal direction acoustic pressure gradient formula is as follows:
Wherein, p
x(t) represent microphone array horizontal direction acoustic pressure gradient;
The acoustic pressure gradient formula of microphone array vertical direction is as follows:
Wherein, p
y(t) acoustic pressure gradient of expression microphone array vertical direction;
Step 4, employing Fourier transform are transformed into frequency domain by the acoustic pressure gradient of microphone array center acoustic pressure, microphone array horizontal direction acoustic pressure gradient and microphone array vertical direction from time domain;
Step 5, microphone array acoustic pressure, microphone array horizontal direction gradient and microphone array vertical direction acoustic pressure gradient according in frequency domain, obtain the strength vector formula of the sound pressure signal in frequency field, and then derive strength vector direction;
The strength vector formula of the sound pressure signal in frequency field is:
Wherein, I (ω, t) represents the strength vector of the sound pressure signal in frequency field;
ρ
0represent test environment atmospheric density;
C represents the velocity of sound;
Re[.] represent to get real part;
P
w *(ω, t) represents the conjugate matrices of the microphone array acoustic pressure in frequency domain;
P
x(ω, t) represents the microphone array horizontal direction acoustic pressure gradient in frequency domain;
P
y(ω, t) represents the microphone array vertical direction acoustic pressure gradient in frequency domain;
U
xrepresent abscissa axis direction unit vector;
U
yrepresent axis of ordinates direction unit vector;
Strength vector direction formula is as follows:
The strength vector direction of the sound pressure signal of the morbid sound that wherein, γ (ω, t) expression microphone array receives;
Step 6, to strength vector direction add up obtain its probability density distribution, adopt mixing Feng meter Xiu Si to distribute and carry out matching, obtain voice strength vector direction and obey the model parameter of mixing Feng meter Xiu Si distribution, and then obtain the strength vector directivity function of each sound pressure signal;
Detailed process is as follows:
Step 6-1, strength vector direction is added up and obtained its probability density distribution, adopt and mix Feng meter Xiu Si and distribute and carry out matching, obtain the model parameter collection that mixing Feng meter Xiu Si that the strength vector direction of voice obeys distributes;
Described mixing Feng meter Xiu Si distributed model formula is as follows:
α
nrepresent the weight of the strength vector directivity function of the sound pressure signal of n sound source;
wherein, I
0(k
n) n the single order modified Bessel function that sound source is corresponding of expression, k
nrepresent the single Feng meter Xiu Si corresponding concentration parameter that distributes that the strength vector direction of n sound source sound pressure signal is obeyed, the inverse of the variance that Feng meter Xiu Si distributes;
Mix Feng meter Xiu Si distribution function parameter set as follows:
Γ={α
n,k
n},i=1,..,N (11)
Step 6-2, initialization model parameter, obtain initial function parameter set;
Step 6-3, according to the initial model parameter obtaining, adopt greatest hope algorithm to estimate the parameter that obtains mixing Feng meter Xiu Si distributed model;
Step 6-4, according to estimating the mixing Feng meter Xiu Si distributed model parameter that obtains, try to achieve the strength vector directivity function of each sound pressure signal;
The strength vector directivity function formula of sound pressure signal is as follows:
Wherein,
represent the strength vector directivity function of n sound source;
Strength vector directivity function and the microphone array acoustic pressure of each sound pressure signal that step 7, basis obtain, obtain each sound source in frequency domain signal, and adopt Fourier inversion that the each sound-source signal in this frequency domain is converted to the sound-source signal in time domain;
The signal formula of each sound source in frequency domain is as follows:
Wherein,
(ω, t) represents the frequency domain signal of n the sound-source signal obtaining after mixing voice separates;
Step 8, calculate the matching probability of specifying sound source in each sound source signals and sample voice storehouse, selecting the sound source of probable value maximum is target sound source, retains this sound source signals, deletes other non-target sound sources;
In each sound source signals and sample voice storehouse, specify the matching probability formula of sound source as follows:
In formula:
represent by voice after separating
the speech characteristic parameter extracting, extracts voice
mel frequency cepstral coefficient as voice
characteristic parameter;
represent to specify in n sound source signals and sample voice storehouse the matching probability of sound source;
G
crepresent user nominator's acoustic model parameters;
represent to separate the probability that rear voice belong to user nominator sound;
Step 9, to retain sound source signals amplify, complete the amplification to specified voice source in test environment.
2. intelligent sound disposal route according to claim 1, is characterized in that, the threshold value span described in step 2-4 is 10
-2~10
-16.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410081493.6A CN103811020B (en) | 2014-03-05 | 2014-03-05 | A kind of intelligent sound processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410081493.6A CN103811020B (en) | 2014-03-05 | 2014-03-05 | A kind of intelligent sound processing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103811020A true CN103811020A (en) | 2014-05-21 |
CN103811020B CN103811020B (en) | 2016-06-22 |
Family
ID=50707692
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410081493.6A Active CN103811020B (en) | 2014-03-05 | 2014-03-05 | A kind of intelligent sound processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103811020B (en) |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104200813A (en) * | 2014-07-01 | 2014-12-10 | 东北大学 | Dynamic blind signal separation method based on real-time prediction and tracking on sound source direction |
CN105609099A (en) * | 2015-12-25 | 2016-05-25 | 重庆邮电大学 | Speech recognition pretreatment method based on human auditory characteristic |
CN105933820A (en) * | 2016-04-28 | 2016-09-07 | 冠捷显示科技(中国)有限公司 | Automatic positioning method of external wireless sound boxes |
CN106128472A (en) * | 2016-07-12 | 2016-11-16 | 乐视控股(北京)有限公司 | The processing method and processing device of singer's sound |
CN106205610A (en) * | 2016-06-29 | 2016-12-07 | 联想(北京)有限公司 | A kind of voice information identification method and equipment |
CN106448722A (en) * | 2016-09-14 | 2017-02-22 | 科大讯飞股份有限公司 | Sound recording method, device and system |
CN107220021A (en) * | 2017-05-16 | 2017-09-29 | 北京小鸟看看科技有限公司 | Phonetic entry recognition methods, device and headset equipment |
CN107274895A (en) * | 2017-08-18 | 2017-10-20 | 京东方科技集团股份有限公司 | A kind of speech recognition apparatus and method |
CN107527626A (en) * | 2017-08-30 | 2017-12-29 | 北京嘉楠捷思信息技术有限公司 | Audio identification system |
CN108198569A (en) * | 2017-12-28 | 2018-06-22 | 北京搜狗科技发展有限公司 | A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing |
CN108520756A (en) * | 2018-03-20 | 2018-09-11 | 北京时代拓灵科技有限公司 | A kind of method and device of speaker's speech Separation |
CN108630193A (en) * | 2017-03-21 | 2018-10-09 | 北京嘀嘀无限科技发展有限公司 | Audio recognition method and device |
CN108694950A (en) * | 2018-05-16 | 2018-10-23 | 清华大学 | A kind of method for identifying speaker based on depth mixed model |
CN108735227A (en) * | 2018-06-22 | 2018-11-02 | 北京三听科技有限公司 | A kind of voice signal for being picked up to microphone array carries out the method and system of Sound seperation |
CN108766459A (en) * | 2018-06-13 | 2018-11-06 | 北京联合大学 | Target speaker method of estimation and system in a kind of mixing of multi-person speech |
CN109505741A (en) * | 2018-12-20 | 2019-03-22 | 浙江大学 | A kind of wind-driven generator blade breakage detection method and device based on rectangular microphone array |
CN110288996A (en) * | 2019-07-22 | 2019-09-27 | 厦门钛尚人工智能科技有限公司 | A kind of speech recognition equipment and audio recognition method |
CN110310642A (en) * | 2018-03-20 | 2019-10-08 | 阿里巴巴集团控股有限公司 | Method of speech processing, system, client, equipment and storage medium |
CN110335626A (en) * | 2019-07-09 | 2019-10-15 | 北京字节跳动网络技术有限公司 | Age recognition methods and device, storage medium based on audio |
CN110706688A (en) * | 2019-11-11 | 2020-01-17 | 广州国音智能科技有限公司 | Method, system, terminal and readable storage medium for constructing voice recognition model |
CN110867191A (en) * | 2018-08-28 | 2020-03-06 | 洞见未来科技股份有限公司 | Voice processing method, information device and computer program product |
CN111028857A (en) * | 2019-12-27 | 2020-04-17 | 苏州蛙声科技有限公司 | Method and system for reducing noise of multi-channel audio and video conference based on deep learning |
CN111696570A (en) * | 2020-08-17 | 2020-09-22 | 北京声智科技有限公司 | Voice signal processing method, device, equipment and storage medium |
CN111816185A (en) * | 2020-07-07 | 2020-10-23 | 广东工业大学 | Method and device for identifying speaker in mixed voice |
CN111899756A (en) * | 2020-09-29 | 2020-11-06 | 北京清微智能科技有限公司 | Single-channel voice separation method and device |
WO2021012734A1 (en) * | 2019-07-25 | 2021-01-28 | 深圳壹账通智能科技有限公司 | Audio separation method and apparatus, electronic device and computer-readable storage medium |
CN112289335A (en) * | 2019-07-24 | 2021-01-29 | 阿里巴巴集团控股有限公司 | Voice signal processing method and device and pickup equipment |
CN114242072A (en) * | 2021-12-21 | 2022-03-25 | 上海帝图信息科技有限公司 | Voice recognition system for intelligent robot |
CN114613385A (en) * | 2022-05-07 | 2022-06-10 | 广州易而达科技股份有限公司 | Far-field voice noise reduction method, cloud server and audio acquisition equipment |
CN115240689A (en) * | 2022-09-15 | 2022-10-25 | 深圳市水世界信息有限公司 | Target sound determination method, device, computer equipment and medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1653519A (en) * | 2002-03-20 | 2005-08-10 | 高通股份有限公司 | Method for robust voice recognition by analyzing redundant features of source signal |
US20090150146A1 (en) * | 2007-12-11 | 2009-06-11 | Electronics & Telecommunications Research Institute | Microphone array based speech recognition system and target speech extracting method of the system |
JP2012211768A (en) * | 2011-03-30 | 2012-11-01 | Advanced Telecommunication Research Institute International | Sound source positioning apparatus |
CN103426434A (en) * | 2012-05-04 | 2013-12-04 | 索尼电脑娱乐公司 | Source separation by independent component analysis in conjunction with source direction information |
-
2014
- 2014-03-05 CN CN201410081493.6A patent/CN103811020B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1653519A (en) * | 2002-03-20 | 2005-08-10 | 高通股份有限公司 | Method for robust voice recognition by analyzing redundant features of source signal |
US20090150146A1 (en) * | 2007-12-11 | 2009-06-11 | Electronics & Telecommunications Research Institute | Microphone array based speech recognition system and target speech extracting method of the system |
JP2012211768A (en) * | 2011-03-30 | 2012-11-01 | Advanced Telecommunication Research Institute International | Sound source positioning apparatus |
CN103426434A (en) * | 2012-05-04 | 2013-12-04 | 索尼电脑娱乐公司 | Source separation by independent component analysis in conjunction with source direction information |
Cited By (44)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104200813B (en) * | 2014-07-01 | 2017-05-10 | 东北大学 | Dynamic blind signal separation method based on real-time prediction and tracking on sound source direction |
CN104200813A (en) * | 2014-07-01 | 2014-12-10 | 东北大学 | Dynamic blind signal separation method based on real-time prediction and tracking on sound source direction |
CN105609099A (en) * | 2015-12-25 | 2016-05-25 | 重庆邮电大学 | Speech recognition pretreatment method based on human auditory characteristic |
CN105933820A (en) * | 2016-04-28 | 2016-09-07 | 冠捷显示科技(中国)有限公司 | Automatic positioning method of external wireless sound boxes |
CN106205610A (en) * | 2016-06-29 | 2016-12-07 | 联想(北京)有限公司 | A kind of voice information identification method and equipment |
CN106205610B (en) * | 2016-06-29 | 2019-11-26 | 联想(北京)有限公司 | A kind of voice information identification method and equipment |
CN106128472A (en) * | 2016-07-12 | 2016-11-16 | 乐视控股(北京)有限公司 | The processing method and processing device of singer's sound |
CN106448722A (en) * | 2016-09-14 | 2017-02-22 | 科大讯飞股份有限公司 | Sound recording method, device and system |
CN106448722B (en) * | 2016-09-14 | 2019-01-18 | 讯飞智元信息科技有限公司 | The way of recording, device and system |
CN108630193A (en) * | 2017-03-21 | 2018-10-09 | 北京嘀嘀无限科技发展有限公司 | Audio recognition method and device |
CN108630193B (en) * | 2017-03-21 | 2020-10-02 | 北京嘀嘀无限科技发展有限公司 | Voice recognition method and device |
CN107220021A (en) * | 2017-05-16 | 2017-09-29 | 北京小鸟看看科技有限公司 | Phonetic entry recognition methods, device and headset equipment |
CN107274895A (en) * | 2017-08-18 | 2017-10-20 | 京东方科技集团股份有限公司 | A kind of speech recognition apparatus and method |
CN107274895B (en) * | 2017-08-18 | 2020-04-17 | 京东方科技集团股份有限公司 | Voice recognition device and method |
CN107527626A (en) * | 2017-08-30 | 2017-12-29 | 北京嘉楠捷思信息技术有限公司 | Audio identification system |
CN108198569A (en) * | 2017-12-28 | 2018-06-22 | 北京搜狗科技发展有限公司 | A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing |
CN108198569B (en) * | 2017-12-28 | 2021-07-16 | 北京搜狗科技发展有限公司 | Audio processing method, device and equipment and readable storage medium |
CN108520756A (en) * | 2018-03-20 | 2018-09-11 | 北京时代拓灵科技有限公司 | A kind of method and device of speaker's speech Separation |
CN108520756B (en) * | 2018-03-20 | 2020-09-01 | 北京时代拓灵科技有限公司 | Method and device for separating speaker voice |
CN110310642A (en) * | 2018-03-20 | 2019-10-08 | 阿里巴巴集团控股有限公司 | Method of speech processing, system, client, equipment and storage medium |
CN108694950B (en) * | 2018-05-16 | 2021-10-01 | 清华大学 | Speaker confirmation method based on deep hybrid model |
CN108694950A (en) * | 2018-05-16 | 2018-10-23 | 清华大学 | A kind of method for identifying speaker based on depth mixed model |
CN108766459A (en) * | 2018-06-13 | 2018-11-06 | 北京联合大学 | Target speaker method of estimation and system in a kind of mixing of multi-person speech |
CN108766459B (en) * | 2018-06-13 | 2020-07-17 | 北京联合大学 | Target speaker estimation method and system in multi-user voice mixing |
CN108735227A (en) * | 2018-06-22 | 2018-11-02 | 北京三听科技有限公司 | A kind of voice signal for being picked up to microphone array carries out the method and system of Sound seperation |
CN108735227B (en) * | 2018-06-22 | 2020-05-19 | 北京三听科技有限公司 | Method and system for separating sound source of voice signal picked up by microphone array |
CN110867191A (en) * | 2018-08-28 | 2020-03-06 | 洞见未来科技股份有限公司 | Voice processing method, information device and computer program product |
CN109505741A (en) * | 2018-12-20 | 2019-03-22 | 浙江大学 | A kind of wind-driven generator blade breakage detection method and device based on rectangular microphone array |
CN110335626A (en) * | 2019-07-09 | 2019-10-15 | 北京字节跳动网络技术有限公司 | Age recognition methods and device, storage medium based on audio |
CN110288996A (en) * | 2019-07-22 | 2019-09-27 | 厦门钛尚人工智能科技有限公司 | A kind of speech recognition equipment and audio recognition method |
CN112289335A (en) * | 2019-07-24 | 2021-01-29 | 阿里巴巴集团控股有限公司 | Voice signal processing method and device and pickup equipment |
WO2021012734A1 (en) * | 2019-07-25 | 2021-01-28 | 深圳壹账通智能科技有限公司 | Audio separation method and apparatus, electronic device and computer-readable storage medium |
CN110706688A (en) * | 2019-11-11 | 2020-01-17 | 广州国音智能科技有限公司 | Method, system, terminal and readable storage medium for constructing voice recognition model |
CN110706688B (en) * | 2019-11-11 | 2022-06-17 | 广州国音智能科技有限公司 | Method, system, terminal and readable storage medium for constructing voice recognition model |
CN111028857A (en) * | 2019-12-27 | 2020-04-17 | 苏州蛙声科技有限公司 | Method and system for reducing noise of multi-channel audio and video conference based on deep learning |
CN111028857B (en) * | 2019-12-27 | 2024-01-19 | 宁波蛙声科技有限公司 | Method and system for reducing noise of multichannel audio-video conference based on deep learning |
CN111816185A (en) * | 2020-07-07 | 2020-10-23 | 广东工业大学 | Method and device for identifying speaker in mixed voice |
CN111696570B (en) * | 2020-08-17 | 2020-11-24 | 北京声智科技有限公司 | Voice signal processing method, device, equipment and storage medium |
CN111696570A (en) * | 2020-08-17 | 2020-09-22 | 北京声智科技有限公司 | Voice signal processing method, device, equipment and storage medium |
CN111899756A (en) * | 2020-09-29 | 2020-11-06 | 北京清微智能科技有限公司 | Single-channel voice separation method and device |
CN114242072A (en) * | 2021-12-21 | 2022-03-25 | 上海帝图信息科技有限公司 | Voice recognition system for intelligent robot |
CN114613385A (en) * | 2022-05-07 | 2022-06-10 | 广州易而达科技股份有限公司 | Far-field voice noise reduction method, cloud server and audio acquisition equipment |
CN115240689A (en) * | 2022-09-15 | 2022-10-25 | 深圳市水世界信息有限公司 | Target sound determination method, device, computer equipment and medium |
CN115240689B (en) * | 2022-09-15 | 2022-12-02 | 深圳市水世界信息有限公司 | Target sound determination method, target sound determination device, computer equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN103811020B (en) | 2016-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103811020A (en) | Smart voice processing method | |
Zhang et al. | Deep learning based binaural speech separation in reverberant environments | |
CN110797043B (en) | Conference voice real-time transcription method and system | |
CN105405439B (en) | Speech playing method and device | |
CN107919133A (en) | For the speech-enhancement system and sound enhancement method of destination object | |
CN102565759B (en) | Binaural sound source localization method based on sub-band signal to noise ratio estimation | |
CN102438189B (en) | Dual-channel acoustic signal-based sound source localization method | |
CN106373589B (en) | A kind of ears mixing voice separation method based on iteration structure | |
CN111429939B (en) | Sound signal separation method of double sound sources and pickup | |
CN104464750A (en) | Voice separation method based on binaural sound source localization | |
CN105869651A (en) | Two-channel beam forming speech enhancement method based on noise mixed coherence | |
CN106653048B (en) | Single channel sound separation method based on voice model | |
CN109859749A (en) | A kind of voice signal recognition methods and device | |
CN109935226A (en) | A kind of far field speech recognition enhancing system and method based on deep neural network | |
CN107507625A (en) | Sound source distance determines method and device | |
CN106031196A (en) | Signal-processing device, method, and program | |
Dadvar et al. | Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target | |
CN113593601A (en) | Audio-visual multi-modal voice separation method based on deep learning | |
CN105609099A (en) | Speech recognition pretreatment method based on human auditory characteristic | |
Shamsoddini et al. | A sound segregation algorithm for reverberant conditions | |
Talagala et al. | Binaural localization of speech sources in the median plane using cepstral HRTF extraction | |
Krijnders et al. | Tone-fit and MFCC scene classification compared to human recognition | |
Shabtai et al. | The effect of reverberation on the performance of cepstral mean subtraction in speaker verification | |
CN107578784B (en) | Method and device for extracting target source from audio | |
Venkatesan et al. | Full sound source localization of binaural signals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |