CN103811020B - A kind of intelligent sound processing method - Google Patents
A kind of intelligent sound processing method Download PDFInfo
- Publication number
- CN103811020B CN103811020B CN201410081493.6A CN201410081493A CN103811020B CN 103811020 B CN103811020 B CN 103811020B CN 201410081493 A CN201410081493 A CN 201410081493A CN 103811020 B CN103811020 B CN 103811020B
- Authority
- CN
- China
- Prior art keywords
- sound
- sound source
- representing
- microphone array
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 14
- 238000000034 method Methods 0.000 claims abstract description 14
- 239000013598 vector Substances 0.000 claims description 99
- 239000011159 matrix material Substances 0.000 claims description 61
- 230000005236 sound signal Effects 0.000 claims description 55
- 238000009826 distribution Methods 0.000 claims description 50
- 239000000203 mixture Substances 0.000 claims description 22
- 238000005070 sampling Methods 0.000 claims description 16
- 238000004422 calculation algorithm Methods 0.000 claims description 12
- 238000000926 separation method Methods 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 9
- 238000001228 spectrum Methods 0.000 claims description 8
- 230000003321 amplification Effects 0.000 claims description 7
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 230000003595 spectral effect Effects 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 238000005315 distribution function Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 5
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 230000007613 environmental effect Effects 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 2
- 230000010365 information processing Effects 0.000 abstract description 2
- 230000002452 interceptive effect Effects 0.000 abstract 1
- 230000006870 function Effects 0.000 description 35
- 238000010586 diagram Methods 0.000 description 16
- 238000004364 calculation method Methods 0.000 description 4
- 208000032041 Hearing impaired Diseases 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 208000016354 hearing loss disease Diseases 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000003064 k means clustering Methods 0.000 description 2
- 206010011878 Deafness Diseases 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000010370 hearing loss Effects 0.000 description 1
- 231100000888 hearing loss Toxicity 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Landscapes
- Circuit For Audible Band Transducer (AREA)
Abstract
One intelligent sound processing method of the present invention, belong to technical field of information processing, the present invention is by setting up dialogue people's sound model library, realize under multi-person speech environment Intelligent Recognition multiple dialogue people identity concurrently separate mixing voice obtain each dialogue people independent voice, according to user's request be user amplify to listen to dialogue people voice eliminate simultaneously non-user require dialogue people voice;Different from traditional hearing aid, the method according to individual subscriber demand thus automatically providing the user its required sound, can decrease the interference of non-targeted voice except noise, embodies the personalization of the method, interactive and intelligent。
Description
Technical Field
The invention belongs to the technical field of information processing, and particularly relates to an intelligent voice processing method.
Background
Recent evaluation data published by the World Health Organization (WHO) in 2013 shows that 3.6 million people currently exist with different degrees of hearing impairment worldwide, accounting for 5% of the global population. The use of hearing aid products can effectively compensate the hearing loss of the hearing-impaired patients and improve the quality of life and work of the hearing-impaired patients. However, research on the related art of hearing aid systems today still focuses on both noise suppression and source sound amplitude amplification, and rarely involves modeling based on sound features and multiple sources automatic separation techniques. When the actual application scenario is very complex, for example: when gathering, a plurality of speakers can simultaneously produce sound even with background sounds such as music, and because the hearing aid system can not separate interested sound objects from mixed sound input, the simple sound intensity amplification function can only increase the hearing burden and even harm of users, and can not bring effective sound input and understanding. Therefore, aiming at the technical defects of the current hearing aid system, a novel hearing aid system which has a specific sound object identification function, is more intelligent and personalized is designed, and the system has very important significance.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an intelligent voice processing method to achieve the purposes of ensuring that a user obtains pure sound receiving and amplification according to the own requirements and realizing the intellectualization, the interaction and the individualization of a hearing aid system.
An intelligent speech processing method, comprising the steps of:
step 1, collecting sample voice sections to construct a sample voice library, performing feature extraction on the sample voice to obtain feature parameters, and training the feature parameters;
the specific process is as follows:
step 1-1, collecting sample voice sections, carrying out discretization processing on the collected voice sections, extracting Mel frequency cepstrum coefficients of voice signals as voice signal characteristic parameters, and establishing a Gaussian mixture model;
the model formula is as follows:
wherein p (XIG) represents the probability of the sample speech feature parameter in the model with the model parameter G;
g denotes a gaussian mixture model parameter set, G ═ pi,μi,∑i},i=1,2,...,I;
I represents the number of single Gaussian models in the Gaussian mixture model;
pirepresenting the weighting coefficients of the ith single gaussian model,
μia mean vector representing the ith single Gaussian model;
∑ia covariance matrix representing the ith single gaussian model;
x represents a sample speech feature parameter, X ═ X1,x2,...,xTT represents the number of feature vectors;
bi(X) a density function representing the ith single Gaussian model, bi(X)=N(μi,∑i) N (·) represents a density function of a standard gaussian distribution;
step 1-2, training a Gaussian mixture model by using the characteristic parameters of the voice signals;
namely, a k-means clustering algorithm is adopted to cluster the characteristic parameters of the voice signals to obtain an initial value G of a parameter set of a Gaussian mixture model0={pi 0,μi 0,∑i 01, 2, ·, I; estimating the model by adopting a maximum expectation algorithm according to the obtained initial value of the parameter set of the Gaussian mixture model, and further obtaining parameters of the Gaussian mixture model, namely finishing the training of the characteristic parameters;
step 2, collecting the audio signal of the detected environment by adopting a microphone array consisting of M microphones, and determining the number of the environmental sound sources and the arrival direction of each sound source beam, namely the incident angle from the sound source to the microphone array;
the specific process is as follows:
step 2-1, collecting mixed audio signals of the tested environment by using a microphone array consisting of M microphones, and carrying out discretization processing on the collected mixed audio signals to obtain the amplitude of each sampling point;
step 2-2, performing matrixing on the amplitude of each sampling point to obtain a mixed audio matrix collected by each microphone; the number of columns of the mixed audio matrix is one, the number of rows is the number of sampling points, and elements in the matrix are the amplitude of each sampling point;
step 2-3, obtaining an estimated value of a vector covariance matrix of the mixed audio signal of the tested environment according to the mixed audio matrix and the number of the microphones acquired by each microphone;
the estimated value of the vector covariance matrix is expressed as follows:
wherein R isxxAn estimate of a vector covariance matrix of the mixed audio signal representing the measured environment;
x (m) represents the mixed audio matrix collected by the mth microphone;
XH(m) a transpose matrix representing the mixed audio matrix collected by the mth microphone;
step 2-4, performing eigenvalue decomposition on the estimated value of the vector covariance matrix to obtain eigenvalues, sequencing the eigenvalues from large to small, and determining the number of the eigenvalues larger than a threshold, namely the number of the sound sources;
2-5, subtracting the number of the sound sources from the number of the microphones to obtain the number of the noise sources, and further correspondingly obtaining a noise matrix;
step 2-6, obtaining a steering vector of the microphone array according to the distance between each microphone and the array center, the wavelength of the mixed audio signal, the direction angle of the microphone to the array center and the arrival direction of the sound source beam, and obtaining an angle spectrum function of the mixed audio signal according to the noise matrix and the steering vector of the microphone array;
the angle spectral function formula of the mixed audio signal is as follows:
wherein P (θ) represents an angular spectral function of the mixed audio signal;
α (theta) represents the steering vector of the microphone array, α (theta) ═ α1(θ),...,αm(θ),...,αM(theta)), wherein, among others,j denotes an imaginary unit, k is 2 pi/λ, λ denotes a wavelength of the mixed audio signal, dmIndicating the distance of the mth microphone from the center of the array,represents the direction angle of the mth microphone to the center of the array;
θ represents a beam arrival direction of the sound source;
αH(θ) a transposed matrix representing steering vectors of the microphone array;
Vurepresenting a noise matrix;
VH ua transposed matrix representing the noise matrix;
2-7, selecting a plurality of peak values of the waveform from big to small according to the waveform of the angle spectrum function of the mixed audio signal, wherein the number of the selected peak values is the number of the sound sources;
step 2-8, determining an angle value corresponding to the selected peak value, namely obtaining the arrival direction of the wave beam of each sound source;
step 3, obtaining microphone array sound pressure received by the microphone, sound pressure gradient in the horizontal direction of the microphone array and sound pressure gradient in the vertical direction of the microphone array according to the audio signal of each sound source and the conversion relation between the sound sources and the microphone;
the microphone array sound pressure signal formula is as follows:
wherein p isw(t) represents microphone array sound pressure at time t;
n represents the number of sound sources;
t represents time;
sn(t) represents an audio signal of an nth sound source;
hmn(t) denotes a conversion matrix between the nth sound source and the mth microphone, hmn(t)=p0(t)αm(θn(t)),p0(t) represents the sound pressure at the center of the microphone array caused by the sound wave at time t, αm(θn(t)) represents the steering vector of the mth microphone with respect to the nth sound source at time t, where θn(t) represents the beam arrival direction of the nth sound source at time t;
the sound pressure gradient formula in the horizontal direction of the microphone array is as follows:
wherein p isx(t)Representing the sound pressure gradient of the microphone array in the horizontal direction;
the sound pressure gradient in the vertical direction of the microphone array is formulated as follows:
wherein p isy(t) representing a sound pressure gradient in a vertical direction of the microphone array;
step 4, converting the central sound pressure of the microphone array, the sound pressure gradient in the horizontal direction of the microphone array and the sound pressure gradient in the vertical direction of the microphone array from a time domain to a frequency domain by adopting Fourier transform;
step 5, obtaining an intensity vector formula of a sound pressure signal in a frequency domain according to sound pressure of the microphone array in the frequency domain, horizontal direction gradient of the microphone array and vertical direction sound pressure gradient of the microphone array, and further deducing an intensity vector direction;
the intensity vector formula of the sound pressure signal in the frequency domain is:
wherein I (ω, t) represents an intensity vector of the sound pressure signal in the frequency domain;
p0representing the air density of the tested environment;
c represents the speed of sound;
re [ ] represents taking the real part of the complex number;
pw *(ω, t) represents a conjugate matrix of microphone array sound pressures in the frequency domain;
px(ω, t) represents the microphone array horizontal direction sound pressure gradient in the frequency domain;
py(ω, t) represents the microphone array vertical direction sound pressure gradient in the frequency domain;
uxrepresenting unit vectors in the direction of the abscissa axis;
uya unit vector representing the direction of the ordinate axis;
the intensity vector direction formula is as follows:
wherein γ (ω, t) represents the intensity vector direction of the sound pressure signal of the mixed sound received by the microphone array;
step 6, counting the intensity vector direction to obtain probability density distribution, fitting by adopting mixed Von Milius distribution to obtain model parameters of the voice intensity vector direction obeying the mixed Von Milius distribution, and further obtaining an intensity vector direction function of each sound pressure signal;
the specific process is as follows:
step 6-1, counting the intensity vector direction to obtain probability density distribution of the intensity vector direction, and fitting by adopting mixed Von Milius distribution to obtain a model parameter set of the mixed Von Milius distribution to which the intensity vector direction of the voice obeys;
the formula of the mixed von mises distribution model is as follows:
wherein,representing a mixed von mises distribution probability density;
representing a mixed sound direction angle;
αna weight representing an intensity vector direction function of a sound pressure signal of an nth sound source;
wherein, I0(kn) First order modified Bessel function, k, representing the corresponding nth sound sourcenRepresenting the nth soundThe method comprises the following steps of (1) obtaining a concentration parameter corresponding to a single von mises distribution obeying the intensity vector direction of a source sound pressure signal, namely the reciprocal of the variance of the von mises distribution;
the hybrid von mises distribution function parameter set is as follows:
={αn,kn},i=1,...,N(11)
6-2, initializing model parameters to obtain an initial function parameter set;
6-3, estimating parameters of the mixed von Michels distribution model by adopting a maximum expectation algorithm according to the obtained initial model parameters;
6-4, solving an intensity vector direction function of each sound pressure signal according to the estimated mixed Von Milius distribution model parameters;
the intensity vector direction function formula of the sound pressure signal is as follows:
wherein,representing the intensity vector direction function of the nth sound source;
step 7, obtaining a signal of each sound source in a frequency domain according to the obtained intensity vector direction function of each sound pressure signal and the sound pressure of the microphone array, and converting each sound source signal in the frequency domain into a sound source signal in a time domain by adopting Fourier inverse transformation;
the signal formula of each sound source in the frequency domain is as follows:
wherein,a frequency domain signal representing an nth sound source signal obtained after separation of the mixed speech;
will be provided withObtaining a time domain signal through Fourier inverse transformation
Step 8, calculating the matching probability of each sound source signal and the designated sound source in the sample sound library, selecting the sound source with the maximum probability value as a target sound source, reserving the sound source signal, and deleting other non-target sound sources;
the matching probability formula of each sound source signal and the designated sound source in the sample voice library is as follows:
in the formula:representing speech by separationExtracted speech feature parameters, i.e. extracting speechAs speech, Mel frequency cepstrum coefficientsThe characteristic parameters of (1);
representing the matching probability of the nth sound source signal and the specified sound source in the sample voice library;
Gcto representA user specifies a person's voice model parameters;
representing the probability that the separated speech belongs to the voice of the user-specified person;
and 9, amplifying the reserved sound source signals, namely completing the amplification of the specified sound source in the tested environment.
The threshold value range in the steps 2 to 4 is 10-2~10-16。
α of step 6-1nTaking a random number within 0-1 and satisfyingknTaking random numbers within 1-700.
The invention has the advantages that:
the invention relates to an intelligent voice processing method, which realizes intelligent recognition of identities of a plurality of dialogues and separation of mixed voice to obtain independent voice of each dialogue by establishing a dialogue voice model library under a multi-person voice environment, amplifies the voice of the dialogues to be listened to for a user according to the user requirement, and eliminates the voice of the dialogues which are not required by the user; different from the traditional hearing aid, the method can automatically provide the required sound for the user according to the personal requirement of the user, reduces the interference of non-target human voice except noise, and embodies the individuation, the interaction and the intellectualization of the method.
Drawings
FIG. 1 is a flow chart of an intelligent speech processing method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of modeling sound source data according to an embodiment of the present invention, in which (a) represents the sound of a first person
A data diagram, in which (b) shows a diagram of the voice data of the second person and (c) shows a diagram of the voice data of the third person;
fig. 3 is a schematic diagram of sound source data for sound mixing according to an embodiment of the present invention, in which (a) shows a schematic diagram of first sound source data, (b) shows a schematic diagram of second sound source data, and (c) shows a schematic diagram of third sound source data;
FIG. 4 is a schematic diagram of a microphone array in accordance with one embodiment of the invention;
FIG. 5 is a graph of data received by four microphones, where (a) shows a graph of a mixed sound signal received by a first microphone, (b) shows a graph of a mixed sound signal received by a second microphone, (c) shows a graph of a mixed sound signal received by a third microphone, and (d) shows a graph of a mixed sound signal received by a fourth microphone, in accordance with an embodiment of the present invention;
FIG. 6 is a diagram of data samples received by four microphones, where (a) shows a diagram of a mixed sound signal received by a first microphone after sampling, (b) shows a diagram of a mixed sound signal received by a second microphone after sampling, (c) shows a diagram of a mixed sound signal received by a third microphone after sampling, and (d) shows a diagram of a mixed sound signal received by a fourth microphone after sampling, according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of spatial spectrum estimation of a hybrid signal according to an embodiment of the present invention;
FIG. 8 is a chart of the directional distribution probability density of the mixed sound vector according to one embodiment of the present invention;
FIG. 9 is a diagram of a maximum likelihood estimation hybrid Von Milius model according to an embodiment of the present invention;
fig. 10 is a comparison graph of ideal voice and separated voice according to an embodiment of the present invention, where (a) is an original sound signal of a first sound source, fig. (b) is an original sound signal of a separated first sound source, fig. (c) is an original sound signal of a second sound source, fig. (d) is an original sound signal of a separated second sound source, fig. (e) is an original sound signal of a third sound source, and fig. (f) is an original sound signal of a separated third sound source.
Detailed Description
An embodiment of the present invention will be further described with reference to the accompanying drawings.
In the embodiment of the invention, the model system is mainly divided into a voice modeling module and a voice dynamic real-time processing module, wherein the voice modeling module realizes speaker voice modeling, and the voice dynamic real-time processing module realizes the direction positioning and separation of mixed voice and the mixed voice recognition and extraction (namely the extraction and amplification of target voice and the shielding of other voices) in a complex voice environment.
An intelligent speech processing method, a flow chart of which is shown in fig. 1, comprises the following steps:
step 1, collecting sample voice sections to construct a sample voice library, performing feature extraction on the sample voice to obtain feature parameters, and training the feature parameters; the specific process is as follows:
step 1-1, recording a sample voice segment in a quiet indoor environment, performing discretization processing on the collected voice segment, extracting a Mel Frequency Cepstrum Coefficient (MFCC) of a voice signal as a characteristic parameter of the voice signal, and establishing a Gaussian mixture model;
in the embodiment of the invention, a windows self-contained recorder is adopted to record 3 persons' voices respectively, each person records 2 segments, wherein 1 segment is used for separating and identifying the voice, the other 1 segment is used for modeling the speaker voice, and a target sound source is set as a first sound source; as shown in fig. 2, a segment of speech of three persons is taken, a gaussian mixture model is built for the segment of speech, and the obtained model parameters are stored in a model library.
The model formula is as follows:
wherein p (XIG) represents the probability of the sample speech feature parameter in the model with the model parameter G;
g denotes a gaussian mixture model parameter set, G ═ pi,μi,∑i},i=1,2,...,I;
I represents the number of single Gaussian models in the Gaussian mixture model;
pirepresenting the weighting coefficients of the ith single gaussian model,
μia mean vector representing the ith single Gaussian model;
∑ia covariance matrix representing the ith single gaussian model;
x represents a sample speech feature parameter, X ═ X1,x2,...,xTT represents the number of feature vectors;
bi(X) a density function representing the ith single Gaussian model, bi(X)=N(μi,∑i) N (·) represents a density function of a standard gaussian distribution;
step 1-2, training a Gaussian mixture model by using the characteristic parameters of the voice signals;
namely, a k-means clustering algorithm is adopted to cluster the characteristic parameters of the voice signals to obtain an initial value G of a parameter set of a Gaussian mixture model0={pi 0,μi 0,∑i 0},i=1,2,...,I;
In this example, 16 single Gaussian models are used to form a Gaussian mixture model. Randomly generating 16 vectors as clustering centers, wherein the length of each vector is the number of voice frames, distributing the characteristic parameters of each frame to one of the 16 clustering centers according to the minimum distance criterion, then recalculating the center value of each clustering center vector, using the center value as a new clustering center until the convergence calculation of the algorithm is finished, and obtaining the clustering center which is the mean value parameter mu of the initial Gaussian mixture model at the momenti 0Obtaining initial ∑ by solving covariance of characteristic parametersi 0,pi 0The initial values are all
Estimating the model by adopting a maximum expectation algorithm, wherein the principle is that the probability of the occurrence of an observed value is maximum, and the parameter p of the model function is respectively estimated byi 0,μi 0,∑i 0Calculation parameter p with derivative equal to zeroi,μi,∑iAnd (4) reestimating the value until the convergence calculation of the algorithm is finished, and finishing the training of the characteristic parameters.
Step 2, collecting the audio signal of the detected environment by adopting a microphone array consisting of 4 microphones, and determining the number of the environmental sound sources and the arrival direction of each sound source beam, namely the incident angle from the sound source to the microphone array;
the specific process is as follows:
step 2-1, collecting the audio signal of the tested environment by adopting a microphone array consisting of 4 microphones, and carrying out discretization processing on the collected mixed audio signal to obtain the amplitude of each sampling point;
in the embodiment of the present invention, as shown in fig. 3 (a) to (c), another segment of speech of three persons is taken as a sound data source of a mixed audio, 4 microphones are used, an array composed of the 4 microphones is shown in fig. 4, a first microphone and a second microphone are symmetrically distributed on two sides in the horizontal direction with the array center as the center, and a third microphone and a fourth microphone are symmetrically distributed on two sides in the vertical direction with the array center as the center; mixed data received by 4 microphones as shown in fig. 5 (a) to (d), the voice received by 4 microphones is discretized at 12500Hz, and the amplitude of each sample point is determined, as shown in fig. 6 (a) to (d).
Step 2-2, performing matrixing on the amplitude of each sampling point to obtain a mixed audio matrix collected by each microphone; the number of columns of the mixed audio matrix is one, the number of rows is the number of sampling points, and elements in the matrix are the amplitude of each sampling point;
step 2-3, obtaining an estimated value of a vector covariance matrix of the mixed audio signal of the tested environment according to the mixed audio matrix and the number of the microphones acquired by each microphone;
the estimated value of the vector covariance matrix is expressed as follows:
wherein R isxxAn estimate of a vector covariance matrix of the mixed audio signal representing the measured environment;
x (m) represents the mixed audio matrix collected by the mth microphone;
xh (m) is a transpose matrix representing the mixed audio matrix collected by the mth microphone;
step 2-4, in this example, eigenvalue decomposition is performed on the estimated value of the vector covariance matrix to obtain an eigenvalue [0.00000.01900.03630.1128 ]]And sorting the eigenvalues from large to small, with a threshold value of 10-7Comparing to obtain 3 characteristic values, so that the number of the sound sources is 3;
2-5, subtracting the number of the sound sources from the number of the microphones to obtain the number of the noise sources, and further correspondingly obtaining a noise matrix;
in the embodiment of the invention, the eigenvalue and the corresponding eigenvector which are equal to the number 3 of the sound sources are regarded as the signal part space, the rest 4-3, namely 1 eigenvalue and eigenvector are regarded as the noise part space, namely the number of the noise sources is 1, and the noise matrix can be obtained according to the elements corresponding to the noise eigenvalue
Vu=[-0.1218-0.4761i-0.1564+0.4659i-0.5070-0.0374i-0.5084];
Step 2-6, obtaining a steering vector of the microphone array according to the distance between each microphone and the array center, the wavelength of the mixed audio signal, the direction angle of the microphone to the array center and the arrival direction of the sound source beam, and obtaining an angle spectrum function of the mixed audio signal according to the noise matrix and the steering vector of the microphone array;
as shown in fig. 4, each microphone is at a distance of 0.02m from the center of the array; in the embodiment of the invention, the wavelength of the mixed audio signal is 30000; the direction angle of the first microphone to the center of the array is 0 degree, the direction angle of the second microphone to the center of the array is 180 degrees, the direction angle of the third microphone to the center of the array is 90 degrees, and the direction angle of the first microphone to the center of the array is 270 degrees;
the angle spectral function formula of the mixed audio signal is as follows:
wherein P (θ) represents an angular spectral function of the mixed audio signal;
α (theta) represents the steering vector of the microphone array, α (theta) ═ α1(θ),α2(θ),α3(θ),α4(θ)), wherein, α1(θ)=ejk0.02cos(0°-θ),α2(θ)=ejk002cos(180°-θ),α3(θ)=ejk002cos(90°-θ),α4(θ)=ejk002cos(270°-θ)J denotes an imaginary unit, k is 2 pi/λ, and λ denotes a wavelength of the mixed audio signal;
θ represents a beam arrival direction of the sound source;
αH(θ) a transposed matrix representing steering vectors of the microphone array;
Vurepresenting a noise matrix;
VH ua transposed matrix representing the noise matrix;
2-7, selecting a plurality of peak values of the waveform from big to small according to the waveform of the angle spectrum function of the mixed audio signal, wherein the number of the selected peak values is the number of the sound sources;
step 2-8, determining an angle value corresponding to the selected peak value, namely obtaining the arrival direction of the wave beam of each sound source;
as shown in fig. 7, the waveforms of the angular spectrum functions P (θ) of the audio signals are mixed, and the beam arrival directions of the 3 sound sources present in the mixed sound are found to be [50 °, 200 °, 300 °, respectively ].
Step 3, obtaining microphone array sound pressure received by the microphone, sound pressure gradient in the horizontal direction of the microphone array and sound pressure gradient in the vertical direction of the microphone array according to the audio signal of each sound source and the conversion relation between the sound sources and the microphone;
the microphone array sound pressure formula is as follows:
wherein p isw(t) represents microphone array sound pressure at time t;
n represents the number of sound sources;
t represents time;
sn(t) represents an audio signal of an nth sound source;
hmn(t) denotes a conversion matrix between the nth sound source and the mth microphone, hmn(t)=p0(t)αm(θn(t)),p0(t) represents the sound pressure at the center of the microphone array caused by the sound wave at time t, αm(θn(t)) represents the steering vector of the mth microphone with respect to the nth sound source at time t, where θn(t) represents the beam arrival direction of the nth sound source at time t;
the sound pressure gradient formula in the horizontal direction of the microphone array is as follows:
wherein p isx(t) representing a sound pressure gradient in a horizontal direction of the microphone array;
the sound pressure gradient in the vertical direction of the microphone array is formulated as follows:
wherein p isy(t) representing a sound pressure gradient in a vertical direction of the microphone array;
step 4, converting the central sound pressure of the microphone array, the sound pressure gradient in the horizontal direction of the microphone array and the sound pressure gradient in the vertical direction of the microphone array from a time domain to a frequency domain by adopting Fourier transform;
step 5, obtaining an intensity vector formula of a sound pressure signal in a frequency domain according to sound pressure of the microphone array in the frequency domain, horizontal direction gradient of the microphone array and vertical direction sound pressure gradient of the microphone array, and further obtaining an intensity vector direction;
the intensity vector formula of the sound pressure signal in the frequency domain is:
the intensity vector formula of the sound pressure signal in the frequency domain is:
wherein I (ω, t) represents an intensity vector of the sound pressure signal in the frequency domain;
ρ0representing the air density of the tested environment;
c represents the speed of sound;
re [ ] represents taking the real part of the complex number;
pw *(ω, t) represents a conjugate matrix of microphone array sound pressures in the frequency domain;
px(ω, t) represents the microphone array horizontal direction sound pressure gradient in the frequency domain;
py(ω, t) represents the microphone array vertical direction sound pressure gradient in the frequency domain;
uxrepresenting unit vectors in the direction of the abscissa axis;
uya unit vector representing the direction of the ordinate axis;
the intensity vector direction formula is as follows:
wherein γ (ω, t) represents the intensity vector direction of the sound pressure signal of the mixed sound received by the microphone array;
step 6, counting the intensity vector direction to obtain probability density distribution, fitting by adopting mixed Von Milius distribution to obtain model parameters of the voice intensity vector direction obeying the mixed Von Milius distribution, and further obtaining an intensity vector direction function of each sound pressure signal;
the specific process is as follows:
step 6-1, counting the intensity vector direction to obtain probability density distribution of the intensity vector direction, and fitting by adopting mixed Von Milius distribution to obtain a model parameter set of the mixed Von Milius distribution to which the intensity vector direction of the voice obeys;
in the embodiment of the present invention, as shown in fig. 8, a distribution probability density map of γ (ω, t); the mixed von mises distribution which can be obtained according to the number and the angles of the sound sources and accords with the probability density distribution is composed of 3 single von mises distributions, and the central angles of the three distributions are respectively [50 degrees, 200 degrees and 300 degrees ].
The formula of the mixed von mises distribution model is as follows:
wherein,representing a mixed von mises distribution probability density;
representing a mixed sound direction angle;
αna weight representing an intensity vector direction function of a sound pressure signal of an nth sound source;
wherein, I0(kn) First order modified Bessel function, k, representing the corresponding nth sound sourcenDenotes the n-thThe concentration parameter corresponding to the single Von Milus distribution obeyed by the intensity vector direction of the sound source sound pressure signal is the reciprocal of the variance of the Von Milus distribution;
the hybrid von mises distribution function parameter set is as follows:
={αn,kn},i=1,2,3(11)
6-2, initializing model parameters to obtain an initial function parameter set;
in the embodiment of the invention, the value of alpha is [1/3, 1/3, 1/3], and the value of k is [8, 6, 3 ];
6-3, establishing an initial mixed von Michels distribution function according to the obtained initial model parameters, and obtaining a function formula as follows:
the parameters of the mixed von Michels distribution model are estimated by adopting a maximum expectation algorithm, the principle is that the probability of the occurrence of an observed value is maximum, reestimated values of the parameters alpha and k are calculated by leading the model function to be equal to zero with respect to the parameters alpha and k,
taking gamma (omega, t) asSubstitution intoTaking logarithm to obtain initial log likelihood value-3.0249 e +004, calculating the ratio of each current single von Michels distribution to the mixed von Michels distribution to obtain re-estimated α parameters [0.2267, 0.2817, 0.4516 ]]Meanwhile, the reestimated k value is obtained according to the parameter k obtained by derivation in the calculation method [5.1498, 4.0061, 3.1277]At this time, a new log-likelihood value of-2.9887 e +004 can be obtained, and the difference value of the new and old likelihood values is 362.3362 which is far greater than the threshold value of 0.1, so that the new likelihood value is assigned to the old likelihood valueThen, the two newly obtained reestimated parameters are used again to repeat the previous steps until the old and new likelihood values are less than the threshold value, and the algorithm is considered to be converged, and finally α parameters are obtained in the example [0.2689, 0.2811, 0.4500 ]]K has a value of [4.3508, 3.3601, 2.8332]At this time, a mixed von mises distribution function satisfying the intensity vector direction distribution is obtained, and the obtained mixed von mises distribution is shown in fig. 9.
6-4, solving an intensity vector direction function of each sound pressure signal according to the estimated mixed Von Milius distribution model parameters;
the intensity vector direction function formula of the sound pressure signal is as follows:
wherein,representing the intensity vector direction function of the nth sound source;
step 7, obtaining a signal of each sound source in a frequency domain according to the obtained intensity vector direction function of each sound pressure signal and the sound pressure of the microphone array, and converting each sound source signal in the frequency domain into a sound source signal in a time domain by adopting Fourier inverse transformation;
the signal formula of each sound source in the frequency domain is as follows:
wherein,a frequency domain signal representing an nth sound source signal obtained after separation of the mixed speech;
will be provided withObtaining a time domain signal through Fourier inverse transformation
Step 8, calculating the matching probability of each sound source signal and the designated sound source in the sample sound library, and considering the sound source with the maximum probability value as the target sound source, reserving the sound source signal and deleting other non-target sound sources;
in the embodiment of the invention, assuming that the first artificial target sound source is provided, the matching probability logarithm values of the three finally separated voices and the target sound model are respectively [ -2.0850-2.8807-3.5084]×104And the maximum matching sound is the sound after No. 1 separation, namely the target sound source is found.
The matching probability formula of each sound source signal and the designated sound source in the sample voice library is as follows:
in the formula:representing speech by separationExtracted speech feature parameters, i.e. extracting speechAs speech, Mel frequency cepstrum coefficientsThe characteristic parameters of (1);
representing the matching probability of the nth sound source signal and the specified sound source in the sample voice library;
Gcacoustic model parameters representing a user-specified person;
representing the probability that the separated speech belongs to the voice of the user-specified person;
and 9, amplifying the reserved sound source signals, namely completing the amplification of the specified sound source in the tested environment.
In the embodiment of the invention, finally, the direction function of each sound source is obtained according to the obtained mixed von mises distribution model parameters, and the original sound is further separated to obtain the original sound, as shown in graphs (a) to (f) in fig. 10, namely, the original sound is a comparison graph of ideal data and data obtained after separation, and the similarity is extremely high.
Claims (3)
1. An intelligent speech processing method, comprising the steps of:
step 1, collecting sample voice sections to construct a sample voice library, performing feature extraction on the sample voice to obtain feature parameters, and training the feature parameters;
the specific process is as follows:
step 1-1, collecting sample voice sections, carrying out discretization processing on the collected voice sections, extracting Mel frequency cepstrum coefficients of voice signals as voice signal characteristic parameters, and establishing a Gaussian mixture model;
the model formula is as follows:
wherein p (X | G) represents the probability of the sample speech feature parameter in the model with the model parameter G;
g denotes a gaussian mixture model parameter set, G ═ pi,μi,∑i},i=1,2,...,I;
I represents the number of single Gaussian models in the Gaussian mixture model;
pirepresenting the weighting coefficients of the ith single gaussian model,
μia mean vector representing the ith single Gaussian model;
∑ia covariance matrix representing the ith single gaussian model;
x represents a sample speech feature parameter, X ═ X1,x2,...,xTT represents the number of feature vectors;
bi(X) a density function representing the ith single Gaussian model, bi(X)=N(μi,∑i) N (·) represents a density function of a standard gaussian distribution;
step 1-2, training a Gaussian mixture model by using the characteristic parameters of the voice signals;
i.e. adopt k are allClustering the characteristic parameters of the voice signals by using a value clustering algorithm to obtain an initial value G of a Gaussian mixture model parameter set0={pi 0,μi 0,∑i 01, 2, ·, I; estimating the model by adopting a maximum expectation algorithm according to the obtained initial value of the parameter set of the Gaussian mixture model, and further obtaining parameters of the Gaussian mixture model, namely finishing the training of the characteristic parameters;
step 2, collecting the audio signal of the detected environment by adopting a microphone array consisting of M microphones, and determining the number of the environmental sound sources and the arrival direction of each sound source beam, namely the incident angle from the sound source to the microphone array;
the specific process is as follows:
step 2-1, collecting mixed audio signals of the tested environment by using a microphone array consisting of M microphones, and carrying out discretization processing on the collected mixed audio signals to obtain the amplitude of each sampling point;
step 2-2, performing matrixing on the amplitude of each sampling point to obtain a mixed audio matrix collected by each microphone; the number of columns of the mixed audio matrix is one, the number of rows is the number of sampling points, and elements in the matrix are the amplitude of each sampling point;
step 2-3, obtaining an estimated value of a vector covariance matrix of the mixed audio signal of the tested environment according to the mixed audio matrix and the number of the microphones acquired by each microphone;
the estimated value of the vector covariance matrix is expressed as follows:
wherein R isxxAn estimate of a vector covariance matrix of the mixed audio signal representing the measured environment;
x (m) represents the mixed audio matrix collected by the mth microphone;
XH(m) a transpose matrix representing the mixed audio matrix collected by the mth microphone;
step 2-4, performing eigenvalue decomposition on the estimated value of the vector covariance matrix to obtain eigenvalues, sequencing the eigenvalues from large to small, and determining the number of the eigenvalues larger than a threshold, namely the number of the sound sources;
2-5, subtracting the number of the sound sources from the number of the microphones to obtain the number of the noise sources, and further correspondingly obtaining a noise matrix;
step 2-6, obtaining a steering vector of the microphone array according to the distance between each microphone and the array center, the wavelength of the mixed audio signal, the direction angle of the microphone to the array center and the arrival direction of the sound source beam, and obtaining an angle spectrum function of the mixed audio signal according to the noise matrix and the steering vector of the microphone array;
the angle spectral function formula of the mixed audio signal is as follows:
wherein P (θ) represents an angular spectral function of the mixed audio signal;
α (theta) represents the steering vector of the microphone array, α (theta) ═ α1(θ),...,αm(θ),...,αM(theta)), wherein, among others,j denotes an imaginary unit, k is 2 pi/λ, λ denotes a wavelength of the mixed audio signal, dmIndicating the distance of the mth microphone from the center of the array,represents the direction angle of the mth microphone to the center of the array;
θ represents a beam arrival direction of the sound source;
αH(θ) a transposed matrix representing steering vectors of the microphone array;
Vurepresenting a noise matrix;
VH ua transposed matrix representing the noise matrix;
2-7, selecting a plurality of peak values of the waveform from big to small according to the waveform of the angle spectrum function of the mixed audio signal, wherein the number of the selected peak values is the number of the sound sources;
step 2-8, determining an angle value corresponding to the selected peak value, namely obtaining the arrival direction of the wave beam of each sound source;
step 3, obtaining microphone array sound pressure received by the microphone, sound pressure gradient in the horizontal direction of the microphone array and sound pressure gradient in the vertical direction of the microphone array according to the audio signal of each sound source and the conversion relation between the sound sources and the microphone;
the microphone array sound pressure signal formula is as follows:
wherein p isw(t) represents microphone array sound pressure at time t;
n represents the number of sound sources;
t represents time;
sn(t) represents an audio signal of an nth sound source;
hmn(t) denotes a conversion matrix between the nth sound source and the mth microphone, hmn(t)=p0(t)αm(θn(t)),p0(t) represents the sound pressure at the center of the microphone array caused by the sound wave at time t, αm(θn(t)) represents the steering vector of the mth microphone with respect to the nth sound source at time t, where θn(t) represents the beam arrival direction of the nth sound source at time t;
the sound pressure gradient formula in the horizontal direction of the microphone array is as follows:
wherein p isx(t) representing a sound pressure gradient in a horizontal direction of the microphone array;
the sound pressure gradient in the vertical direction of the microphone array is formulated as follows:
wherein p isy(t) representing a sound pressure gradient in a vertical direction of the microphone array;
step 4, converting the central sound pressure of the microphone array, the sound pressure gradient in the horizontal direction of the microphone array and the sound pressure gradient in the vertical direction of the microphone array from a time domain to a frequency domain by adopting Fourier transform;
step 5, obtaining an intensity vector formula of a sound pressure signal in a frequency domain according to sound pressure of the microphone array in the frequency domain, horizontal direction gradient of the microphone array and vertical direction sound pressure gradient of the microphone array, and further deducing an intensity vector direction;
the intensity vector formula of the sound pressure signal in the frequency domain is:
wherein I (ω, t) represents an intensity vector of the sound pressure signal in the frequency domain;
ρ0representing the air density of the tested environment;
c represents the speed of sound;
re [ ] represents taking the real part of the complex number;
pw *(ω, t) represents a conjugate matrix of microphone array sound pressures in the frequency domain;
px(ω, t) represents the microphone array horizontal direction sound pressure gradient in the frequency domain;
py(ω, t) represents the microphone array vertical direction sound pressure gradient in the frequency domain;
uxrepresenting unit vectors in the direction of the abscissa axis;
uya unit vector representing the direction of the ordinate axis;
the intensity vector direction formula is as follows:
wherein γ (ω, t) represents the intensity vector direction of the sound pressure signal of the mixed sound received by the microphone array;
step 6, counting the intensity vector direction to obtain probability density distribution, fitting by adopting mixed Von Milius distribution to obtain model parameters of the voice intensity vector direction obeying the mixed Von Milius distribution, and further obtaining an intensity vector direction function of each sound pressure signal;
the specific process is as follows:
step 6-1, counting the intensity vector direction to obtain probability density distribution of the intensity vector direction, and fitting by adopting mixed Von Milius distribution to obtain a model parameter set of the mixed Von Milius distribution to which the intensity vector direction of the voice obeys;
the formula of the mixed von mises distribution model is as follows:
wherein,representing a mixed von mises distribution probability density;
representing a mixed sound direction angle;
αna weight representing an intensity vector direction function of a sound pressure signal of an nth sound source;
wherein, I0(kn) First order modified Bessel function, k, representing the corresponding nth sound sourcenThe concentration parameter corresponding to the single von mises distribution to which the intensity vector direction of the nth sound source sound pressure signal obeys is represented, namely the reciprocal of the variance of the von mises distribution;
the hybrid von mises distribution function parameter set is as follows:
={αn,kn},i=1,...,N(11)
6-2, initializing model parameters to obtain an initial function parameter set;
6-3, estimating parameters of the mixed von Michels distribution model by adopting a maximum expectation algorithm according to the obtained initial model parameters;
6-4, solving an intensity vector direction function of each sound pressure signal according to the estimated mixed Von Milius distribution model parameters;
the intensity vector direction function formula of the sound pressure signal is as follows:
wherein,representing the intensity vector direction function of the nth sound source;
step 7, obtaining a signal of each sound source in a frequency domain according to the obtained intensity vector direction function of each sound pressure signal and the sound pressure of the microphone array, and converting each sound source signal in the frequency domain into a sound source signal in a time domain by adopting Fourier inverse transformation;
the signal formula of each sound source in the frequency domain is as follows:
wherein,a frequency domain signal representing an nth sound source signal obtained after separation of the mixed speech;
will be provided withObtaining a time domain signal through Fourier inverse transformation
Step 8, calculating the matching probability of each sound source signal and the designated sound source in the sample sound library, selecting the sound source with the maximum probability value as a target sound source, reserving the sound source signal, and deleting other non-target sound sources;
the matching probability formula of each sound source signal and the designated sound source in the sample voice library is as follows:
in the formula:representing speech by separationExtracted speech feature parameters, i.e. extracting speechAs speech, Mel frequency cepstrum coefficientsThe characteristic parameters of (1);
representing the matching probability of the nth sound source signal and the specified sound source in the sample voice library;
Gcacoustic model parameters representing a user-specified person;
representing the probability that the separated speech belongs to the voice of the user-specified person;
and 9, amplifying the reserved sound source signals, namely completing the amplification of the specified sound source in the tested environment.
2. The intelligent speech processing method of claim 1, wherein the speech processing method comprises a speech processing method, and a speech processing methodThe threshold value range in the steps 2-4 is 10-2~10-16。
3. The intelligent speech processing method according to claim 1, wherein α of step 6-1nTaking a random number within 0-1 and satisfyingknTaking random numbers within 1-700.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410081493.6A CN103811020B (en) | 2014-03-05 | 2014-03-05 | A kind of intelligent sound processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410081493.6A CN103811020B (en) | 2014-03-05 | 2014-03-05 | A kind of intelligent sound processing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103811020A CN103811020A (en) | 2014-05-21 |
CN103811020B true CN103811020B (en) | 2016-06-22 |
Family
ID=50707692
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410081493.6A Active CN103811020B (en) | 2014-03-05 | 2014-03-05 | A kind of intelligent sound processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103811020B (en) |
Families Citing this family (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104200813B (en) * | 2014-07-01 | 2017-05-10 | 东北大学 | Dynamic blind signal separation method based on real-time prediction and tracking on sound source direction |
CN105609099A (en) * | 2015-12-25 | 2016-05-25 | 重庆邮电大学 | Speech recognition pretreatment method based on human auditory characteristic |
CN105933820A (en) * | 2016-04-28 | 2016-09-07 | 冠捷显示科技(中国)有限公司 | Automatic positioning method of external wireless sound boxes |
CN106205610B (en) * | 2016-06-29 | 2019-11-26 | 联想(北京)有限公司 | A kind of voice information identification method and equipment |
CN106128472A (en) * | 2016-07-12 | 2016-11-16 | 乐视控股(北京)有限公司 | The processing method and processing device of singer's sound |
CN106448722B (en) * | 2016-09-14 | 2019-01-18 | 讯飞智元信息科技有限公司 | The way of recording, device and system |
CN108630193B (en) * | 2017-03-21 | 2020-10-02 | 北京嘀嘀无限科技发展有限公司 | Voice recognition method and device |
CN107220021B (en) * | 2017-05-16 | 2021-03-23 | 北京小鸟看看科技有限公司 | Voice input recognition method and device and head-mounted equipment |
CN107274895B (en) * | 2017-08-18 | 2020-04-17 | 京东方科技集团股份有限公司 | Voice recognition device and method |
CN107527626A (en) * | 2017-08-30 | 2017-12-29 | 北京嘉楠捷思信息技术有限公司 | Audio identification system |
CN108198569B (en) * | 2017-12-28 | 2021-07-16 | 北京搜狗科技发展有限公司 | Audio processing method, device and equipment and readable storage medium |
CN108520756B (en) * | 2018-03-20 | 2020-09-01 | 北京时代拓灵科技有限公司 | Method and device for separating speaker voice |
CN110310642B (en) * | 2018-03-20 | 2023-12-26 | 阿里巴巴集团控股有限公司 | Voice processing method, system, client, equipment and storage medium |
CN108694950B (en) * | 2018-05-16 | 2021-10-01 | 清华大学 | Speaker confirmation method based on deep hybrid model |
CN108766459B (en) * | 2018-06-13 | 2020-07-17 | 北京联合大学 | Target speaker estimation method and system in multi-user voice mixing |
CN108735227B (en) * | 2018-06-22 | 2020-05-19 | 北京三听科技有限公司 | Method and system for separating sound source of voice signal picked up by microphone array |
CN110867191B (en) * | 2018-08-28 | 2024-06-25 | 洞见未来科技股份有限公司 | Speech processing method, information device and computer program product |
CN109505741B (en) * | 2018-12-20 | 2020-07-10 | 浙江大学 | Wind driven generator damaged blade detection method and device based on rectangular microphone array |
CN110335626A (en) * | 2019-07-09 | 2019-10-15 | 北京字节跳动网络技术有限公司 | Age recognition methods and device, storage medium based on audio |
CN110288996A (en) * | 2019-07-22 | 2019-09-27 | 厦门钛尚人工智能科技有限公司 | A kind of speech recognition equipment and audio recognition method |
CN110473566A (en) * | 2019-07-25 | 2019-11-19 | 深圳壹账通智能科技有限公司 | Audio separation method, device, electronic equipment and computer readable storage medium |
CN110706688B (en) * | 2019-11-11 | 2022-06-17 | 广州国音智能科技有限公司 | Method, system, terminal and readable storage medium for constructing voice recognition model |
CN111028857B (en) * | 2019-12-27 | 2024-01-19 | 宁波蛙声科技有限公司 | Method and system for reducing noise of multichannel audio-video conference based on deep learning |
CN111816185A (en) * | 2020-07-07 | 2020-10-23 | 广东工业大学 | Method and device for identifying speaker in mixed voice |
CN111696570B (en) * | 2020-08-17 | 2020-11-24 | 北京声智科技有限公司 | Voice signal processing method, device, equipment and storage medium |
CN111899756B (en) * | 2020-09-29 | 2021-04-09 | 北京清微智能科技有限公司 | Single-channel voice separation method and device |
CN114093382A (en) * | 2021-11-23 | 2022-02-25 | 广东电网有限责任公司 | Intelligent interaction method suitable for voice information |
CN114242072A (en) * | 2021-12-21 | 2022-03-25 | 上海帝图信息科技有限公司 | Voice recognition system for intelligent robot |
CN114613385A (en) * | 2022-05-07 | 2022-06-10 | 广州易而达科技股份有限公司 | Far-field voice noise reduction method, cloud server and audio acquisition equipment |
CN115240689B (en) * | 2022-09-15 | 2022-12-02 | 深圳市水世界信息有限公司 | Target sound determination method, target sound determination device, computer equipment and medium |
CN118574049B (en) * | 2024-08-01 | 2024-11-08 | 罗普特科技集团股份有限公司 | Microphone calibration method and system of multi-mode intelligent terminal |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1653519A (en) * | 2002-03-20 | 2005-08-10 | 高通股份有限公司 | Method for robust voice recognition by analyzing redundant features of source signal |
JP2012211768A (en) * | 2011-03-30 | 2012-11-01 | Advanced Telecommunication Research Institute International | Sound source positioning apparatus |
CN103426434A (en) * | 2012-05-04 | 2013-12-04 | 索尼电脑娱乐公司 | Source separation by independent component analysis in conjunction with source direction information |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8249867B2 (en) * | 2007-12-11 | 2012-08-21 | Electronics And Telecommunications Research Institute | Microphone array based speech recognition system and target speech extracting method of the system |
-
2014
- 2014-03-05 CN CN201410081493.6A patent/CN103811020B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1653519A (en) * | 2002-03-20 | 2005-08-10 | 高通股份有限公司 | Method for robust voice recognition by analyzing redundant features of source signal |
JP2012211768A (en) * | 2011-03-30 | 2012-11-01 | Advanced Telecommunication Research Institute International | Sound source positioning apparatus |
CN103426434A (en) * | 2012-05-04 | 2013-12-04 | 索尼电脑娱乐公司 | Source separation by independent component analysis in conjunction with source direction information |
Also Published As
Publication number | Publication date |
---|---|
CN103811020A (en) | 2014-05-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103811020B (en) | A kind of intelligent sound processing method | |
EP3707716B1 (en) | Multi-channel speech separation | |
CN108922541B (en) | Multi-dimensional characteristic parameter voiceprint recognition method based on DTW and GMM models | |
CN107919133A (en) | For the speech-enhancement system and sound enhancement method of destination object | |
CN111429939B (en) | Sound signal separation method of double sound sources and pickup | |
CN101593522A (en) | A kind of full frequency domain digital hearing aid method and apparatus | |
CN102388416A (en) | Signal processing apparatus and signal processing method | |
JP4964204B2 (en) | Multiple signal section estimation device, multiple signal section estimation method, program thereof, and recording medium | |
CN109859749A (en) | A kind of voice signal recognition methods and device | |
CN106847301A (en) | A kind of ears speech separating method based on compressed sensing and attitude information | |
CN106019230B (en) | A kind of sound localization method based on i-vector Speaker Identification | |
CN116092512A (en) | Small sample voice separation method based on data generation | |
Enzinger et al. | Mismatched distances from speakers to telephone in a forensic-voice-comparison case | |
JP2017067948A (en) | Voice processor and voice processing method | |
Talagala et al. | Binaural localization of speech sources in the median plane using cepstral HRTF extraction | |
Ramou et al. | Automatic detection of articulations disorders from children’s speech preliminary study | |
CN118212929A (en) | Personalized Ambiosonic voice enhancement method | |
Xia et al. | Ava: An adaptive audio filtering architecture for enhancing mobile, embedded, and cyber-physical systems | |
CN117711422A (en) | Underdetermined voice separation method and device based on compressed sensing space information estimation | |
Krijnders et al. | Tone-fit and MFCC scene classification compared to human recognition | |
Krause et al. | Binaural signal representations for joint sound event detection and acoustic scene classification | |
Oualil et al. | Joint detection and localization of multiple speakers using a probabilistic interpretation of the steered response power | |
Venkatesan et al. | Analysis of monaural and binaural statistical properties for the estimation of distance of a target speaker | |
Nguyen et al. | Location Estimation of Receivers in an Audio Room using Deep Learning with a Convolution Neural Network. | |
Kolossa et al. | Missing feature speech recognition in a meeting situation with maximum SNR beamforming |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |