CN103811020B - A kind of intelligent sound processing method - Google Patents

A kind of intelligent sound processing method Download PDF

Info

Publication number
CN103811020B
CN103811020B CN201410081493.6A CN201410081493A CN103811020B CN 103811020 B CN103811020 B CN 103811020B CN 201410081493 A CN201410081493 A CN 201410081493A CN 103811020 B CN103811020 B CN 103811020B
Authority
CN
China
Prior art keywords
sound
sound source
representing
microphone array
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410081493.6A
Other languages
Chinese (zh)
Other versions
CN103811020A (en
Inventor
王�义
魏阳杰
陈瑶
关楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN201410081493.6A priority Critical patent/CN103811020B/en
Publication of CN103811020A publication Critical patent/CN103811020A/en
Application granted granted Critical
Publication of CN103811020B publication Critical patent/CN103811020B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

One intelligent sound processing method of the present invention, belong to technical field of information processing, the present invention is by setting up dialogue people's sound model library, realize under multi-person speech environment Intelligent Recognition multiple dialogue people identity concurrently separate mixing voice obtain each dialogue people independent voice, according to user's request be user amplify to listen to dialogue people voice eliminate simultaneously non-user require dialogue people voice;Different from traditional hearing aid, the method according to individual subscriber demand thus automatically providing the user its required sound, can decrease the interference of non-targeted voice except noise, embodies the personalization of the method, interactive and intelligent。

Description

Intelligent voice processing method
Technical Field
The invention belongs to the technical field of information processing, and particularly relates to an intelligent voice processing method.
Background
Recent evaluation data published by the World Health Organization (WHO) in 2013 shows that 3.6 million people currently exist with different degrees of hearing impairment worldwide, accounting for 5% of the global population. The use of hearing aid products can effectively compensate the hearing loss of the hearing-impaired patients and improve the quality of life and work of the hearing-impaired patients. However, research on the related art of hearing aid systems today still focuses on both noise suppression and source sound amplitude amplification, and rarely involves modeling based on sound features and multiple sources automatic separation techniques. When the actual application scenario is very complex, for example: when gathering, a plurality of speakers can simultaneously produce sound even with background sounds such as music, and because the hearing aid system can not separate interested sound objects from mixed sound input, the simple sound intensity amplification function can only increase the hearing burden and even harm of users, and can not bring effective sound input and understanding. Therefore, aiming at the technical defects of the current hearing aid system, a novel hearing aid system which has a specific sound object identification function, is more intelligent and personalized is designed, and the system has very important significance.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an intelligent voice processing method to achieve the purposes of ensuring that a user obtains pure sound receiving and amplification according to the own requirements and realizing the intellectualization, the interaction and the individualization of a hearing aid system.
An intelligent speech processing method, comprising the steps of:
step 1, collecting sample voice sections to construct a sample voice library, performing feature extraction on the sample voice to obtain feature parameters, and training the feature parameters;
the specific process is as follows:
step 1-1, collecting sample voice sections, carrying out discretization processing on the collected voice sections, extracting Mel frequency cepstrum coefficients of voice signals as voice signal characteristic parameters, and establishing a Gaussian mixture model;
the model formula is as follows:
p ( XIG ) = Σ i = 1 M p i b i ( X ) - - - ( 1 )
wherein p (XIG) represents the probability of the sample speech feature parameter in the model with the model parameter G;
g denotes a gaussian mixture model parameter set, G ═ pi,μi,∑i},i=1,2,...,I;
I represents the number of single Gaussian models in the Gaussian mixture model;
pirepresenting the weighting coefficients of the ith single gaussian model,
μia mean vector representing the ith single Gaussian model;
ia covariance matrix representing the ith single gaussian model;
x represents a sample speech feature parameter, X ═ X1,x2,...,xTT represents the number of feature vectors;
bi(X) a density function representing the ith single Gaussian model, bi(X)=N(μi,∑i) N (·) represents a density function of a standard gaussian distribution;
step 1-2, training a Gaussian mixture model by using the characteristic parameters of the voice signals;
namely, a k-means clustering algorithm is adopted to cluster the characteristic parameters of the voice signals to obtain an initial value G of a parameter set of a Gaussian mixture model0={pi 0,μi 0,∑i 01, 2, ·, I; estimating the model by adopting a maximum expectation algorithm according to the obtained initial value of the parameter set of the Gaussian mixture model, and further obtaining parameters of the Gaussian mixture model, namely finishing the training of the characteristic parameters;
step 2, collecting the audio signal of the detected environment by adopting a microphone array consisting of M microphones, and determining the number of the environmental sound sources and the arrival direction of each sound source beam, namely the incident angle from the sound source to the microphone array;
the specific process is as follows:
step 2-1, collecting mixed audio signals of the tested environment by using a microphone array consisting of M microphones, and carrying out discretization processing on the collected mixed audio signals to obtain the amplitude of each sampling point;
step 2-2, performing matrixing on the amplitude of each sampling point to obtain a mixed audio matrix collected by each microphone; the number of columns of the mixed audio matrix is one, the number of rows is the number of sampling points, and elements in the matrix are the amplitude of each sampling point;
step 2-3, obtaining an estimated value of a vector covariance matrix of the mixed audio signal of the tested environment according to the mixed audio matrix and the number of the microphones acquired by each microphone;
the estimated value of the vector covariance matrix is expressed as follows:
R xx = 1 M Σ m = 1 M X ( m ) X H ( m ) - - - ( 2 )
wherein R isxxAn estimate of a vector covariance matrix of the mixed audio signal representing the measured environment;
x (m) represents the mixed audio matrix collected by the mth microphone;
XH(m) a transpose matrix representing the mixed audio matrix collected by the mth microphone;
step 2-4, performing eigenvalue decomposition on the estimated value of the vector covariance matrix to obtain eigenvalues, sequencing the eigenvalues from large to small, and determining the number of the eigenvalues larger than a threshold, namely the number of the sound sources;
2-5, subtracting the number of the sound sources from the number of the microphones to obtain the number of the noise sources, and further correspondingly obtaining a noise matrix;
step 2-6, obtaining a steering vector of the microphone array according to the distance between each microphone and the array center, the wavelength of the mixed audio signal, the direction angle of the microphone to the array center and the arrival direction of the sound source beam, and obtaining an angle spectrum function of the mixed audio signal according to the noise matrix and the steering vector of the microphone array;
the angle spectral function formula of the mixed audio signal is as follows:
P ( θ ) = 1 α H ( θ ) V u V H u α ( θ ) - - - ( 3 )
wherein P (θ) represents an angular spectral function of the mixed audio signal;
α (theta) represents the steering vector of the microphone array, α (theta) ═ α1(θ),...,αm(θ),...,αM(theta)), wherein, among others,j denotes an imaginary unit, k is 2 pi/λ, λ denotes a wavelength of the mixed audio signal, dmIndicating the distance of the mth microphone from the center of the array,represents the direction angle of the mth microphone to the center of the array;
θ represents a beam arrival direction of the sound source;
αH(θ) a transposed matrix representing steering vectors of the microphone array;
Vurepresenting a noise matrix;
VH ua transposed matrix representing the noise matrix;
2-7, selecting a plurality of peak values of the waveform from big to small according to the waveform of the angle spectrum function of the mixed audio signal, wherein the number of the selected peak values is the number of the sound sources;
step 2-8, determining an angle value corresponding to the selected peak value, namely obtaining the arrival direction of the wave beam of each sound source;
step 3, obtaining microphone array sound pressure received by the microphone, sound pressure gradient in the horizontal direction of the microphone array and sound pressure gradient in the vertical direction of the microphone array according to the audio signal of each sound source and the conversion relation between the sound sources and the microphone;
the microphone array sound pressure signal formula is as follows:
p w ( t ) = Σ n = 1 N 0.5 Σ m = 1 M h mn ( t ) s n ( t ) - - - ( 4 )
wherein p isw(t) represents microphone array sound pressure at time t;
n represents the number of sound sources;
t represents time;
sn(t) represents an audio signal of an nth sound source;
hmn(t) denotes a conversion matrix between the nth sound source and the mth microphone, hmn(t)=p0(t)αmn(t)),p0(t) represents the sound pressure at the center of the microphone array caused by the sound wave at time t, αmn(t)) represents the steering vector of the mth microphone with respect to the nth sound source at time t, where θn(t) represents the beam arrival direction of the nth sound source at time t;
the sound pressure gradient formula in the horizontal direction of the microphone array is as follows:
wherein p isx(t)Representing the sound pressure gradient of the microphone array in the horizontal direction;
the sound pressure gradient in the vertical direction of the microphone array is formulated as follows:
wherein p isy(t) representing a sound pressure gradient in a vertical direction of the microphone array;
step 4, converting the central sound pressure of the microphone array, the sound pressure gradient in the horizontal direction of the microphone array and the sound pressure gradient in the vertical direction of the microphone array from a time domain to a frequency domain by adopting Fourier transform;
step 5, obtaining an intensity vector formula of a sound pressure signal in a frequency domain according to sound pressure of the microphone array in the frequency domain, horizontal direction gradient of the microphone array and vertical direction sound pressure gradient of the microphone array, and further deducing an intensity vector direction;
the intensity vector formula of the sound pressure signal in the frequency domain is:
I ( ω , t ) = 1 ρ 0 c [ Re { p w * ( ω , t ) p x ( ω , t ) } u x + Re { p w * ( ω , t ) p y ( ω , t ) } u y ] - - - ( 7 )
wherein I (ω, t) represents an intensity vector of the sound pressure signal in the frequency domain;
p0representing the air density of the tested environment;
c represents the speed of sound;
re [ ] represents taking the real part of the complex number;
pw *(ω, t) represents a conjugate matrix of microphone array sound pressures in the frequency domain;
px(ω, t) represents the microphone array horizontal direction sound pressure gradient in the frequency domain;
py(ω, t) represents the microphone array vertical direction sound pressure gradient in the frequency domain;
uxrepresenting unit vectors in the direction of the abscissa axis;
uya unit vector representing the direction of the ordinate axis;
the intensity vector direction formula is as follows:
γ ( ω , t ) = tan - 1 [ Re { p w * ( ω , t ) p y ( ω , t ) } p w * ( ω , t ) p x ( ω , t ) ] - - - ( 8 )
wherein γ (ω, t) represents the intensity vector direction of the sound pressure signal of the mixed sound received by the microphone array;
step 6, counting the intensity vector direction to obtain probability density distribution, fitting by adopting mixed Von Milius distribution to obtain model parameters of the voice intensity vector direction obeying the mixed Von Milius distribution, and further obtaining an intensity vector direction function of each sound pressure signal;
the specific process is as follows:
step 6-1, counting the intensity vector direction to obtain probability density distribution of the intensity vector direction, and fitting by adopting mixed Von Milius distribution to obtain a model parameter set of the mixed Von Milius distribution to which the intensity vector direction of the voice obeys;
the formula of the mixed von mises distribution model is as follows:
g ( θ ) = Σ n = 1 N α n f ( θ ; k n ) - - - ( 10 )
wherein,representing a mixed von mises distribution probability density;
representing a mixed sound direction angle;
αna weight representing an intensity vector direction function of a sound pressure signal of an nth sound source;
wherein, I0(kn) First order modified Bessel function, k, representing the corresponding nth sound sourcenRepresenting the nth soundThe method comprises the following steps of (1) obtaining a concentration parameter corresponding to a single von mises distribution obeying the intensity vector direction of a source sound pressure signal, namely the reciprocal of the variance of the von mises distribution;
the hybrid von mises distribution function parameter set is as follows:
={αn,kn},i=1,...,N(11)
6-2, initializing model parameters to obtain an initial function parameter set;
6-3, estimating parameters of the mixed von Michels distribution model by adopting a maximum expectation algorithm according to the obtained initial model parameters;
6-4, solving an intensity vector direction function of each sound pressure signal according to the estimated mixed Von Milius distribution model parameters;
the intensity vector direction function formula of the sound pressure signal is as follows:
I n ( θ ; ω , t ) = α n f ( θ ; k n ) - - - ( 12 )
wherein,representing the intensity vector direction function of the nth sound source;
step 7, obtaining a signal of each sound source in a frequency domain according to the obtained intensity vector direction function of each sound pressure signal and the sound pressure of the microphone array, and converting each sound source signal in the frequency domain into a sound source signal in a time domain by adopting Fourier inverse transformation;
the signal formula of each sound source in the frequency domain is as follows:
s ~ n ( ω , t ) = p w ( ω , t ) I n ( θ ; ω , t ) - - - ( 13 )
wherein,a frequency domain signal representing an nth sound source signal obtained after separation of the mixed speech;
will be provided withObtaining a time domain signal through Fourier inverse transformation
Step 8, calculating the matching probability of each sound source signal and the designated sound source in the sample sound library, selecting the sound source with the maximum probability value as a target sound source, reserving the sound source signal, and deleting other non-target sound sources;
the matching probability formula of each sound source signal and the designated sound source in the sample voice library is as follows:
C ( X ~ n ) = log [ P ( X ~ n | G c ) ] - - - ( 14 )
in the formula:representing speech by separationExtracted speech feature parameters, i.e. extracting speechAs speech, Mel frequency cepstrum coefficientsThe characteristic parameters of (1);
representing the matching probability of the nth sound source signal and the specified sound source in the sample voice library;
Gcto representA user specifies a person's voice model parameters;
representing the probability that the separated speech belongs to the voice of the user-specified person;
and 9, amplifying the reserved sound source signals, namely completing the amplification of the specified sound source in the tested environment.
The threshold value range in the steps 2 to 4 is 10-2~10-16
α of step 6-1nTaking a random number within 0-1 and satisfyingknTaking random numbers within 1-700.
The invention has the advantages that:
the invention relates to an intelligent voice processing method, which realizes intelligent recognition of identities of a plurality of dialogues and separation of mixed voice to obtain independent voice of each dialogue by establishing a dialogue voice model library under a multi-person voice environment, amplifies the voice of the dialogues to be listened to for a user according to the user requirement, and eliminates the voice of the dialogues which are not required by the user; different from the traditional hearing aid, the method can automatically provide the required sound for the user according to the personal requirement of the user, reduces the interference of non-target human voice except noise, and embodies the individuation, the interaction and the intellectualization of the method.
Drawings
FIG. 1 is a flow chart of an intelligent speech processing method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of modeling sound source data according to an embodiment of the present invention, in which (a) represents the sound of a first person
A data diagram, in which (b) shows a diagram of the voice data of the second person and (c) shows a diagram of the voice data of the third person;
fig. 3 is a schematic diagram of sound source data for sound mixing according to an embodiment of the present invention, in which (a) shows a schematic diagram of first sound source data, (b) shows a schematic diagram of second sound source data, and (c) shows a schematic diagram of third sound source data;
FIG. 4 is a schematic diagram of a microphone array in accordance with one embodiment of the invention;
FIG. 5 is a graph of data received by four microphones, where (a) shows a graph of a mixed sound signal received by a first microphone, (b) shows a graph of a mixed sound signal received by a second microphone, (c) shows a graph of a mixed sound signal received by a third microphone, and (d) shows a graph of a mixed sound signal received by a fourth microphone, in accordance with an embodiment of the present invention;
FIG. 6 is a diagram of data samples received by four microphones, where (a) shows a diagram of a mixed sound signal received by a first microphone after sampling, (b) shows a diagram of a mixed sound signal received by a second microphone after sampling, (c) shows a diagram of a mixed sound signal received by a third microphone after sampling, and (d) shows a diagram of a mixed sound signal received by a fourth microphone after sampling, according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of spatial spectrum estimation of a hybrid signal according to an embodiment of the present invention;
FIG. 8 is a chart of the directional distribution probability density of the mixed sound vector according to one embodiment of the present invention;
FIG. 9 is a diagram of a maximum likelihood estimation hybrid Von Milius model according to an embodiment of the present invention;
fig. 10 is a comparison graph of ideal voice and separated voice according to an embodiment of the present invention, where (a) is an original sound signal of a first sound source, fig. (b) is an original sound signal of a separated first sound source, fig. (c) is an original sound signal of a second sound source, fig. (d) is an original sound signal of a separated second sound source, fig. (e) is an original sound signal of a third sound source, and fig. (f) is an original sound signal of a separated third sound source.
Detailed Description
An embodiment of the present invention will be further described with reference to the accompanying drawings.
In the embodiment of the invention, the model system is mainly divided into a voice modeling module and a voice dynamic real-time processing module, wherein the voice modeling module realizes speaker voice modeling, and the voice dynamic real-time processing module realizes the direction positioning and separation of mixed voice and the mixed voice recognition and extraction (namely the extraction and amplification of target voice and the shielding of other voices) in a complex voice environment.
An intelligent speech processing method, a flow chart of which is shown in fig. 1, comprises the following steps:
step 1, collecting sample voice sections to construct a sample voice library, performing feature extraction on the sample voice to obtain feature parameters, and training the feature parameters; the specific process is as follows:
step 1-1, recording a sample voice segment in a quiet indoor environment, performing discretization processing on the collected voice segment, extracting a Mel Frequency Cepstrum Coefficient (MFCC) of a voice signal as a characteristic parameter of the voice signal, and establishing a Gaussian mixture model;
in the embodiment of the invention, a windows self-contained recorder is adopted to record 3 persons' voices respectively, each person records 2 segments, wherein 1 segment is used for separating and identifying the voice, the other 1 segment is used for modeling the speaker voice, and a target sound source is set as a first sound source; as shown in fig. 2, a segment of speech of three persons is taken, a gaussian mixture model is built for the segment of speech, and the obtained model parameters are stored in a model library.
The model formula is as follows:
p ( XIG ) = Σ i = 1 M p i b i ( X ) - - - ( 1 )
wherein p (XIG) represents the probability of the sample speech feature parameter in the model with the model parameter G;
g denotes a gaussian mixture model parameter set, G ═ pi,μi,∑i},i=1,2,...,I;
I represents the number of single Gaussian models in the Gaussian mixture model;
pirepresenting the weighting coefficients of the ith single gaussian model,
μia mean vector representing the ith single Gaussian model;
ia covariance matrix representing the ith single gaussian model;
x represents a sample speech feature parameter, X ═ X1,x2,...,xTT represents the number of feature vectors;
bi(X) a density function representing the ith single Gaussian model, bi(X)=N(μi,∑i) N (·) represents a density function of a standard gaussian distribution;
step 1-2, training a Gaussian mixture model by using the characteristic parameters of the voice signals;
namely, a k-means clustering algorithm is adopted to cluster the characteristic parameters of the voice signals to obtain an initial value G of a parameter set of a Gaussian mixture model0={pi 0,μi 0,∑i 0},i=1,2,...,I;
In this example, 16 single Gaussian models are used to form a Gaussian mixture model. Randomly generating 16 vectors as clustering centers, wherein the length of each vector is the number of voice frames, distributing the characteristic parameters of each frame to one of the 16 clustering centers according to the minimum distance criterion, then recalculating the center value of each clustering center vector, using the center value as a new clustering center until the convergence calculation of the algorithm is finished, and obtaining the clustering center which is the mean value parameter mu of the initial Gaussian mixture model at the momenti 0Obtaining initial ∑ by solving covariance of characteristic parametersi 0,pi 0The initial values are all
Estimating the model by adopting a maximum expectation algorithm, wherein the principle is that the probability of the occurrence of an observed value is maximum, and the parameter p of the model function is respectively estimated byi 0,μi 0,∑i 0Calculation parameter p with derivative equal to zeroi,μi,∑iAnd (4) reestimating the value until the convergence calculation of the algorithm is finished, and finishing the training of the characteristic parameters.
Step 2, collecting the audio signal of the detected environment by adopting a microphone array consisting of 4 microphones, and determining the number of the environmental sound sources and the arrival direction of each sound source beam, namely the incident angle from the sound source to the microphone array;
the specific process is as follows:
step 2-1, collecting the audio signal of the tested environment by adopting a microphone array consisting of 4 microphones, and carrying out discretization processing on the collected mixed audio signal to obtain the amplitude of each sampling point;
in the embodiment of the present invention, as shown in fig. 3 (a) to (c), another segment of speech of three persons is taken as a sound data source of a mixed audio, 4 microphones are used, an array composed of the 4 microphones is shown in fig. 4, a first microphone and a second microphone are symmetrically distributed on two sides in the horizontal direction with the array center as the center, and a third microphone and a fourth microphone are symmetrically distributed on two sides in the vertical direction with the array center as the center; mixed data received by 4 microphones as shown in fig. 5 (a) to (d), the voice received by 4 microphones is discretized at 12500Hz, and the amplitude of each sample point is determined, as shown in fig. 6 (a) to (d).
Step 2-2, performing matrixing on the amplitude of each sampling point to obtain a mixed audio matrix collected by each microphone; the number of columns of the mixed audio matrix is one, the number of rows is the number of sampling points, and elements in the matrix are the amplitude of each sampling point;
step 2-3, obtaining an estimated value of a vector covariance matrix of the mixed audio signal of the tested environment according to the mixed audio matrix and the number of the microphones acquired by each microphone;
the estimated value of the vector covariance matrix is expressed as follows:
R xx = 1 M Σ m = 1 4 X ( m ) X H ( m ) - - - ( 2 )
wherein R isxxAn estimate of a vector covariance matrix of the mixed audio signal representing the measured environment;
x (m) represents the mixed audio matrix collected by the mth microphone;
xh (m) is a transpose matrix representing the mixed audio matrix collected by the mth microphone;
step 2-4, in this example, eigenvalue decomposition is performed on the estimated value of the vector covariance matrix to obtain an eigenvalue [0.00000.01900.03630.1128 ]]And sorting the eigenvalues from large to small, with a threshold value of 10-7Comparing to obtain 3 characteristic values, so that the number of the sound sources is 3;
2-5, subtracting the number of the sound sources from the number of the microphones to obtain the number of the noise sources, and further correspondingly obtaining a noise matrix;
in the embodiment of the invention, the eigenvalue and the corresponding eigenvector which are equal to the number 3 of the sound sources are regarded as the signal part space, the rest 4-3, namely 1 eigenvalue and eigenvector are regarded as the noise part space, namely the number of the noise sources is 1, and the noise matrix can be obtained according to the elements corresponding to the noise eigenvalue
Vu=[-0.1218-0.4761i-0.1564+0.4659i-0.5070-0.0374i-0.5084];
Step 2-6, obtaining a steering vector of the microphone array according to the distance between each microphone and the array center, the wavelength of the mixed audio signal, the direction angle of the microphone to the array center and the arrival direction of the sound source beam, and obtaining an angle spectrum function of the mixed audio signal according to the noise matrix and the steering vector of the microphone array;
as shown in fig. 4, each microphone is at a distance of 0.02m from the center of the array; in the embodiment of the invention, the wavelength of the mixed audio signal is 30000; the direction angle of the first microphone to the center of the array is 0 degree, the direction angle of the second microphone to the center of the array is 180 degrees, the direction angle of the third microphone to the center of the array is 90 degrees, and the direction angle of the first microphone to the center of the array is 270 degrees;
the angle spectral function formula of the mixed audio signal is as follows:
P ( θ ) = 1 α H ( θ ) V u V H u α ( θ ) - - - ( 3 )
wherein P (θ) represents an angular spectral function of the mixed audio signal;
α (theta) represents the steering vector of the microphone array, α (theta) ═ α1(θ),α2(θ),α3(θ),α4(θ)), wherein, α1(θ)=ejk0.02cos(0°-θ),α2(θ)=ejk002cos(180°-θ),α3(θ)=ejk002cos(90°-θ),α4(θ)=ejk002cos(270°-θ)J denotes an imaginary unit, k is 2 pi/λ, and λ denotes a wavelength of the mixed audio signal;
θ represents a beam arrival direction of the sound source;
αH(θ) a transposed matrix representing steering vectors of the microphone array;
Vurepresenting a noise matrix;
VH ua transposed matrix representing the noise matrix;
2-7, selecting a plurality of peak values of the waveform from big to small according to the waveform of the angle spectrum function of the mixed audio signal, wherein the number of the selected peak values is the number of the sound sources;
step 2-8, determining an angle value corresponding to the selected peak value, namely obtaining the arrival direction of the wave beam of each sound source;
as shown in fig. 7, the waveforms of the angular spectrum functions P (θ) of the audio signals are mixed, and the beam arrival directions of the 3 sound sources present in the mixed sound are found to be [50 °, 200 °, 300 °, respectively ].
Step 3, obtaining microphone array sound pressure received by the microphone, sound pressure gradient in the horizontal direction of the microphone array and sound pressure gradient in the vertical direction of the microphone array according to the audio signal of each sound source and the conversion relation between the sound sources and the microphone;
the microphone array sound pressure formula is as follows:
p w ( t ) = Σ n = 1 3 0.5 Σ m = 1 4 h mn ( t ) s n ( t ) - - - ( 4 )
wherein p isw(t) represents microphone array sound pressure at time t;
n represents the number of sound sources;
t represents time;
sn(t) represents an audio signal of an nth sound source;
hmn(t) denotes a conversion matrix between the nth sound source and the mth microphone, hmn(t)=p0(t)αmn(t)),p0(t) represents the sound pressure at the center of the microphone array caused by the sound wave at time t, αmn(t)) represents the steering vector of the mth microphone with respect to the nth sound source at time t, where θn(t) represents the beam arrival direction of the nth sound source at time t;
the sound pressure gradient formula in the horizontal direction of the microphone array is as follows:
wherein p isx(t) representing a sound pressure gradient in a horizontal direction of the microphone array;
the sound pressure gradient in the vertical direction of the microphone array is formulated as follows:
wherein p isy(t) representing a sound pressure gradient in a vertical direction of the microphone array;
step 4, converting the central sound pressure of the microphone array, the sound pressure gradient in the horizontal direction of the microphone array and the sound pressure gradient in the vertical direction of the microphone array from a time domain to a frequency domain by adopting Fourier transform;
step 5, obtaining an intensity vector formula of a sound pressure signal in a frequency domain according to sound pressure of the microphone array in the frequency domain, horizontal direction gradient of the microphone array and vertical direction sound pressure gradient of the microphone array, and further obtaining an intensity vector direction;
the intensity vector formula of the sound pressure signal in the frequency domain is:
the intensity vector formula of the sound pressure signal in the frequency domain is:
I ( ω , t ) = 1 ρ 0 c [ Re { p w * ( ω , t ) p x ( ω , t ) } u x + Re { p w * ( ω , t ) p y ( ω , t ) } u y ] - - - ( 7 )
wherein I (ω, t) represents an intensity vector of the sound pressure signal in the frequency domain;
ρ0representing the air density of the tested environment;
c represents the speed of sound;
re [ ] represents taking the real part of the complex number;
pw *(ω, t) represents a conjugate matrix of microphone array sound pressures in the frequency domain;
px(ω, t) represents the microphone array horizontal direction sound pressure gradient in the frequency domain;
py(ω, t) represents the microphone array vertical direction sound pressure gradient in the frequency domain;
uxrepresenting unit vectors in the direction of the abscissa axis;
uya unit vector representing the direction of the ordinate axis;
the intensity vector direction formula is as follows:
γ ( ω , t ) = tan - 1 [ Re { p w * ( ω , t ) p y ( ω , t ) } Re { p w * ( ω , t ) p x ( ω , t ) } ] - - - ( 8 )
wherein γ (ω, t) represents the intensity vector direction of the sound pressure signal of the mixed sound received by the microphone array;
step 6, counting the intensity vector direction to obtain probability density distribution, fitting by adopting mixed Von Milius distribution to obtain model parameters of the voice intensity vector direction obeying the mixed Von Milius distribution, and further obtaining an intensity vector direction function of each sound pressure signal;
the specific process is as follows:
step 6-1, counting the intensity vector direction to obtain probability density distribution of the intensity vector direction, and fitting by adopting mixed Von Milius distribution to obtain a model parameter set of the mixed Von Milius distribution to which the intensity vector direction of the voice obeys;
in the embodiment of the present invention, as shown in fig. 8, a distribution probability density map of γ (ω, t); the mixed von mises distribution which can be obtained according to the number and the angles of the sound sources and accords with the probability density distribution is composed of 3 single von mises distributions, and the central angles of the three distributions are respectively [50 degrees, 200 degrees and 300 degrees ].
The formula of the mixed von mises distribution model is as follows:
g ( θ ) = Σ n = 1 N α n f ( θ ; k n ) - - - ( 10 )
wherein,representing a mixed von mises distribution probability density;
representing a mixed sound direction angle;
αna weight representing an intensity vector direction function of a sound pressure signal of an nth sound source;
wherein, I0(kn) First order modified Bessel function, k, representing the corresponding nth sound sourcenDenotes the n-thThe concentration parameter corresponding to the single Von Milus distribution obeyed by the intensity vector direction of the sound source sound pressure signal is the reciprocal of the variance of the Von Milus distribution;
the hybrid von mises distribution function parameter set is as follows:
={αn,kn},i=1,2,3(11)
6-2, initializing model parameters to obtain an initial function parameter set;
in the embodiment of the invention, the value of alpha is [1/3, 1/3, 1/3], and the value of k is [8, 6, 3 ];
6-3, establishing an initial mixed von Michels distribution function according to the obtained initial model parameters, and obtaining a function formula as follows:
the parameters of the mixed von Michels distribution model are estimated by adopting a maximum expectation algorithm, the principle is that the probability of the occurrence of an observed value is maximum, reestimated values of the parameters alpha and k are calculated by leading the model function to be equal to zero with respect to the parameters alpha and k,
taking gamma (omega, t) asSubstitution intoTaking logarithm to obtain initial log likelihood value-3.0249 e +004, calculating the ratio of each current single von Michels distribution to the mixed von Michels distribution to obtain re-estimated α parameters [0.2267, 0.2817, 0.4516 ]]Meanwhile, the reestimated k value is obtained according to the parameter k obtained by derivation in the calculation method [5.1498, 4.0061, 3.1277]At this time, a new log-likelihood value of-2.9887 e +004 can be obtained, and the difference value of the new and old likelihood values is 362.3362 which is far greater than the threshold value of 0.1, so that the new likelihood value is assigned to the old likelihood valueThen, the two newly obtained reestimated parameters are used again to repeat the previous steps until the old and new likelihood values are less than the threshold value, and the algorithm is considered to be converged, and finally α parameters are obtained in the example [0.2689, 0.2811, 0.4500 ]]K has a value of [4.3508, 3.3601, 2.8332]At this time, a mixed von mises distribution function satisfying the intensity vector direction distribution is obtained, and the obtained mixed von mises distribution is shown in fig. 9.
6-4, solving an intensity vector direction function of each sound pressure signal according to the estimated mixed Von Milius distribution model parameters;
the intensity vector direction function formula of the sound pressure signal is as follows:
I n ( θ ; ω , t ) = α n f ( θ ; k n ) - - - ( 12 )
wherein,representing the intensity vector direction function of the nth sound source;
step 7, obtaining a signal of each sound source in a frequency domain according to the obtained intensity vector direction function of each sound pressure signal and the sound pressure of the microphone array, and converting each sound source signal in the frequency domain into a sound source signal in a time domain by adopting Fourier inverse transformation;
the signal formula of each sound source in the frequency domain is as follows:
s ~ n ( ω , t ) = p w ( ω , t ) I n ( θ ; ω , t ) - - - ( 13 )
wherein,a frequency domain signal representing an nth sound source signal obtained after separation of the mixed speech;
will be provided withObtaining a time domain signal through Fourier inverse transformation
Step 8, calculating the matching probability of each sound source signal and the designated sound source in the sample sound library, and considering the sound source with the maximum probability value as the target sound source, reserving the sound source signal and deleting other non-target sound sources;
in the embodiment of the invention, assuming that the first artificial target sound source is provided, the matching probability logarithm values of the three finally separated voices and the target sound model are respectively [ -2.0850-2.8807-3.5084]×104And the maximum matching sound is the sound after No. 1 separation, namely the target sound source is found.
The matching probability formula of each sound source signal and the designated sound source in the sample voice library is as follows:
C ( X ~ n ) = log [ P ( X ~ n | G c ) ] - - - ( 14 )
in the formula:representing speech by separationExtracted speech feature parameters, i.e. extracting speechAs speech, Mel frequency cepstrum coefficientsThe characteristic parameters of (1);
representing the matching probability of the nth sound source signal and the specified sound source in the sample voice library;
Gcacoustic model parameters representing a user-specified person;
representing the probability that the separated speech belongs to the voice of the user-specified person;
and 9, amplifying the reserved sound source signals, namely completing the amplification of the specified sound source in the tested environment.
In the embodiment of the invention, finally, the direction function of each sound source is obtained according to the obtained mixed von mises distribution model parameters, and the original sound is further separated to obtain the original sound, as shown in graphs (a) to (f) in fig. 10, namely, the original sound is a comparison graph of ideal data and data obtained after separation, and the similarity is extremely high.

Claims (3)

1. An intelligent speech processing method, comprising the steps of:
step 1, collecting sample voice sections to construct a sample voice library, performing feature extraction on the sample voice to obtain feature parameters, and training the feature parameters;
the specific process is as follows:
step 1-1, collecting sample voice sections, carrying out discretization processing on the collected voice sections, extracting Mel frequency cepstrum coefficients of voice signals as voice signal characteristic parameters, and establishing a Gaussian mixture model;
the model formula is as follows:
p ( X | G ) = Σ i = 1 M p i b i ( X ) - - - ( 1 )
wherein p (X | G) represents the probability of the sample speech feature parameter in the model with the model parameter G;
g denotes a gaussian mixture model parameter set, G ═ pi,μi,∑i},i=1,2,...,I;
I represents the number of single Gaussian models in the Gaussian mixture model;
pirepresenting the weighting coefficients of the ith single gaussian model,
μia mean vector representing the ith single Gaussian model;
ia covariance matrix representing the ith single gaussian model;
x represents a sample speech feature parameter, X ═ X1,x2,...,xTT represents the number of feature vectors;
bi(X) a density function representing the ith single Gaussian model, bi(X)=N(μi,∑i) N (·) represents a density function of a standard gaussian distribution;
step 1-2, training a Gaussian mixture model by using the characteristic parameters of the voice signals;
i.e. adopt k are allClustering the characteristic parameters of the voice signals by using a value clustering algorithm to obtain an initial value G of a Gaussian mixture model parameter set0={pi 0,μi 0,∑i 01, 2, ·, I; estimating the model by adopting a maximum expectation algorithm according to the obtained initial value of the parameter set of the Gaussian mixture model, and further obtaining parameters of the Gaussian mixture model, namely finishing the training of the characteristic parameters;
step 2, collecting the audio signal of the detected environment by adopting a microphone array consisting of M microphones, and determining the number of the environmental sound sources and the arrival direction of each sound source beam, namely the incident angle from the sound source to the microphone array;
the specific process is as follows:
step 2-1, collecting mixed audio signals of the tested environment by using a microphone array consisting of M microphones, and carrying out discretization processing on the collected mixed audio signals to obtain the amplitude of each sampling point;
step 2-2, performing matrixing on the amplitude of each sampling point to obtain a mixed audio matrix collected by each microphone; the number of columns of the mixed audio matrix is one, the number of rows is the number of sampling points, and elements in the matrix are the amplitude of each sampling point;
step 2-3, obtaining an estimated value of a vector covariance matrix of the mixed audio signal of the tested environment according to the mixed audio matrix and the number of the microphones acquired by each microphone;
the estimated value of the vector covariance matrix is expressed as follows:
R x x = 1 M Σ m = 1 M X ( m ) X H ( m ) - - - ( 2 )
wherein R isxxAn estimate of a vector covariance matrix of the mixed audio signal representing the measured environment;
x (m) represents the mixed audio matrix collected by the mth microphone;
XH(m) a transpose matrix representing the mixed audio matrix collected by the mth microphone;
step 2-4, performing eigenvalue decomposition on the estimated value of the vector covariance matrix to obtain eigenvalues, sequencing the eigenvalues from large to small, and determining the number of the eigenvalues larger than a threshold, namely the number of the sound sources;
2-5, subtracting the number of the sound sources from the number of the microphones to obtain the number of the noise sources, and further correspondingly obtaining a noise matrix;
step 2-6, obtaining a steering vector of the microphone array according to the distance between each microphone and the array center, the wavelength of the mixed audio signal, the direction angle of the microphone to the array center and the arrival direction of the sound source beam, and obtaining an angle spectrum function of the mixed audio signal according to the noise matrix and the steering vector of the microphone array;
the angle spectral function formula of the mixed audio signal is as follows:
P ( θ ) = 1 α H ( θ ) V u V H u α ( θ ) - - - ( 3 )
wherein P (θ) represents an angular spectral function of the mixed audio signal;
α (theta) represents the steering vector of the microphone array, α (theta) ═ α1(θ),...,αm(θ),...,αM(theta)), wherein, among others,j denotes an imaginary unit, k is 2 pi/λ, λ denotes a wavelength of the mixed audio signal, dmIndicating the distance of the mth microphone from the center of the array,represents the direction angle of the mth microphone to the center of the array;
θ represents a beam arrival direction of the sound source;
αH(θ) a transposed matrix representing steering vectors of the microphone array;
Vurepresenting a noise matrix;
VH ua transposed matrix representing the noise matrix;
2-7, selecting a plurality of peak values of the waveform from big to small according to the waveform of the angle spectrum function of the mixed audio signal, wherein the number of the selected peak values is the number of the sound sources;
step 2-8, determining an angle value corresponding to the selected peak value, namely obtaining the arrival direction of the wave beam of each sound source;
step 3, obtaining microphone array sound pressure received by the microphone, sound pressure gradient in the horizontal direction of the microphone array and sound pressure gradient in the vertical direction of the microphone array according to the audio signal of each sound source and the conversion relation between the sound sources and the microphone;
the microphone array sound pressure signal formula is as follows:
p w ( t ) = Σ n = 1 N 0.5 Σ m = 1 M h m n ( t ) s n ( t ) - - - ( 4 )
wherein p isw(t) represents microphone array sound pressure at time t;
n represents the number of sound sources;
t represents time;
sn(t) represents an audio signal of an nth sound source;
hmn(t) denotes a conversion matrix between the nth sound source and the mth microphone, hmn(t)=p0(t)αmn(t)),p0(t) represents the sound pressure at the center of the microphone array caused by the sound wave at time t, αmn(t)) represents the steering vector of the mth microphone with respect to the nth sound source at time t, where θn(t) represents the beam arrival direction of the nth sound source at time t;
the sound pressure gradient formula in the horizontal direction of the microphone array is as follows:
wherein p isx(t) representing a sound pressure gradient in a horizontal direction of the microphone array;
the sound pressure gradient in the vertical direction of the microphone array is formulated as follows:
wherein p isy(t) representing a sound pressure gradient in a vertical direction of the microphone array;
step 4, converting the central sound pressure of the microphone array, the sound pressure gradient in the horizontal direction of the microphone array and the sound pressure gradient in the vertical direction of the microphone array from a time domain to a frequency domain by adopting Fourier transform;
step 5, obtaining an intensity vector formula of a sound pressure signal in a frequency domain according to sound pressure of the microphone array in the frequency domain, horizontal direction gradient of the microphone array and vertical direction sound pressure gradient of the microphone array, and further deducing an intensity vector direction;
the intensity vector formula of the sound pressure signal in the frequency domain is:
I ( ω , t ) = 1 ρ 0 c [ Re { p w * ( ω , t ) p x ( ω , t ) } u x + Re { p w * ( ω , t ) p y ( ω , t ) } u y ] - - - ( 7 )
wherein I (ω, t) represents an intensity vector of the sound pressure signal in the frequency domain;
ρ0representing the air density of the tested environment;
c represents the speed of sound;
re [ ] represents taking the real part of the complex number;
pw *(ω, t) represents a conjugate matrix of microphone array sound pressures in the frequency domain;
px(ω, t) represents the microphone array horizontal direction sound pressure gradient in the frequency domain;
py(ω, t) represents the microphone array vertical direction sound pressure gradient in the frequency domain;
uxrepresenting unit vectors in the direction of the abscissa axis;
uya unit vector representing the direction of the ordinate axis;
the intensity vector direction formula is as follows:
γ ( ω , t ) = tan - 1 [ Re { p w * ( ω , t ) p y ( ω , t ) } Re { p w * ( ω , t ) p x ( ω , t ) } ] - - - ( 8 )
wherein γ (ω, t) represents the intensity vector direction of the sound pressure signal of the mixed sound received by the microphone array;
step 6, counting the intensity vector direction to obtain probability density distribution, fitting by adopting mixed Von Milius distribution to obtain model parameters of the voice intensity vector direction obeying the mixed Von Milius distribution, and further obtaining an intensity vector direction function of each sound pressure signal;
the specific process is as follows:
step 6-1, counting the intensity vector direction to obtain probability density distribution of the intensity vector direction, and fitting by adopting mixed Von Milius distribution to obtain a model parameter set of the mixed Von Milius distribution to which the intensity vector direction of the voice obeys;
the formula of the mixed von mises distribution model is as follows:
wherein,representing a mixed von mises distribution probability density;
representing a mixed sound direction angle;
αna weight representing an intensity vector direction function of a sound pressure signal of an nth sound source;
wherein, I0(kn) First order modified Bessel function, k, representing the corresponding nth sound sourcenThe concentration parameter corresponding to the single von mises distribution to which the intensity vector direction of the nth sound source sound pressure signal obeys is represented, namely the reciprocal of the variance of the von mises distribution;
the hybrid von mises distribution function parameter set is as follows:
={αn,kn},i=1,...,N(11)
6-2, initializing model parameters to obtain an initial function parameter set;
6-3, estimating parameters of the mixed von Michels distribution model by adopting a maximum expectation algorithm according to the obtained initial model parameters;
6-4, solving an intensity vector direction function of each sound pressure signal according to the estimated mixed Von Milius distribution model parameters;
the intensity vector direction function formula of the sound pressure signal is as follows:
wherein,representing the intensity vector direction function of the nth sound source;
step 7, obtaining a signal of each sound source in a frequency domain according to the obtained intensity vector direction function of each sound pressure signal and the sound pressure of the microphone array, and converting each sound source signal in the frequency domain into a sound source signal in a time domain by adopting Fourier inverse transformation;
the signal formula of each sound source in the frequency domain is as follows:
wherein,a frequency domain signal representing an nth sound source signal obtained after separation of the mixed speech;
will be provided withObtaining a time domain signal through Fourier inverse transformation
Step 8, calculating the matching probability of each sound source signal and the designated sound source in the sample sound library, selecting the sound source with the maximum probability value as a target sound source, reserving the sound source signal, and deleting other non-target sound sources;
the matching probability formula of each sound source signal and the designated sound source in the sample voice library is as follows:
C ( X ~ n ) = l o g [ P ( X ~ n | G c ) ] - - - ( 14 )
in the formula:representing speech by separationExtracted speech feature parameters, i.e. extracting speechAs speech, Mel frequency cepstrum coefficientsThe characteristic parameters of (1);
representing the matching probability of the nth sound source signal and the specified sound source in the sample voice library;
Gcacoustic model parameters representing a user-specified person;
representing the probability that the separated speech belongs to the voice of the user-specified person;
and 9, amplifying the reserved sound source signals, namely completing the amplification of the specified sound source in the tested environment.
2. The intelligent speech processing method of claim 1, wherein the speech processing method comprises a speech processing method, and a speech processing methodThe threshold value range in the steps 2-4 is 10-2~10-16
3. The intelligent speech processing method according to claim 1, wherein α of step 6-1nTaking a random number within 0-1 and satisfyingknTaking random numbers within 1-700.
CN201410081493.6A 2014-03-05 2014-03-05 A kind of intelligent sound processing method Active CN103811020B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410081493.6A CN103811020B (en) 2014-03-05 2014-03-05 A kind of intelligent sound processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410081493.6A CN103811020B (en) 2014-03-05 2014-03-05 A kind of intelligent sound processing method

Publications (2)

Publication Number Publication Date
CN103811020A CN103811020A (en) 2014-05-21
CN103811020B true CN103811020B (en) 2016-06-22

Family

ID=50707692

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410081493.6A Active CN103811020B (en) 2014-03-05 2014-03-05 A kind of intelligent sound processing method

Country Status (1)

Country Link
CN (1) CN103811020B (en)

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104200813B (en) * 2014-07-01 2017-05-10 东北大学 Dynamic blind signal separation method based on real-time prediction and tracking on sound source direction
CN105609099A (en) * 2015-12-25 2016-05-25 重庆邮电大学 Speech recognition pretreatment method based on human auditory characteristic
CN105933820A (en) * 2016-04-28 2016-09-07 冠捷显示科技(中国)有限公司 Automatic positioning method of external wireless sound boxes
CN106205610B (en) * 2016-06-29 2019-11-26 联想(北京)有限公司 A kind of voice information identification method and equipment
CN106128472A (en) * 2016-07-12 2016-11-16 乐视控股(北京)有限公司 The processing method and processing device of singer's sound
CN106448722B (en) * 2016-09-14 2019-01-18 讯飞智元信息科技有限公司 The way of recording, device and system
CN108630193B (en) * 2017-03-21 2020-10-02 北京嘀嘀无限科技发展有限公司 Voice recognition method and device
CN107220021B (en) * 2017-05-16 2021-03-23 北京小鸟看看科技有限公司 Voice input recognition method and device and head-mounted equipment
CN107274895B (en) * 2017-08-18 2020-04-17 京东方科技集团股份有限公司 Voice recognition device and method
CN107527626A (en) * 2017-08-30 2017-12-29 北京嘉楠捷思信息技术有限公司 Audio identification system
CN108198569B (en) * 2017-12-28 2021-07-16 北京搜狗科技发展有限公司 Audio processing method, device and equipment and readable storage medium
CN108520756B (en) * 2018-03-20 2020-09-01 北京时代拓灵科技有限公司 Method and device for separating speaker voice
CN110310642B (en) * 2018-03-20 2023-12-26 阿里巴巴集团控股有限公司 Voice processing method, system, client, equipment and storage medium
CN108694950B (en) * 2018-05-16 2021-10-01 清华大学 Speaker confirmation method based on deep hybrid model
CN108766459B (en) * 2018-06-13 2020-07-17 北京联合大学 Target speaker estimation method and system in multi-user voice mixing
CN108735227B (en) * 2018-06-22 2020-05-19 北京三听科技有限公司 Method and system for separating sound source of voice signal picked up by microphone array
CN110867191B (en) * 2018-08-28 2024-06-25 洞见未来科技股份有限公司 Speech processing method, information device and computer program product
CN109505741B (en) * 2018-12-20 2020-07-10 浙江大学 Wind driven generator damaged blade detection method and device based on rectangular microphone array
CN110335626A (en) * 2019-07-09 2019-10-15 北京字节跳动网络技术有限公司 Age recognition methods and device, storage medium based on audio
CN110288996A (en) * 2019-07-22 2019-09-27 厦门钛尚人工智能科技有限公司 A kind of speech recognition equipment and audio recognition method
CN110473566A (en) * 2019-07-25 2019-11-19 深圳壹账通智能科技有限公司 Audio separation method, device, electronic equipment and computer readable storage medium
CN110706688B (en) * 2019-11-11 2022-06-17 广州国音智能科技有限公司 Method, system, terminal and readable storage medium for constructing voice recognition model
CN111028857B (en) * 2019-12-27 2024-01-19 宁波蛙声科技有限公司 Method and system for reducing noise of multichannel audio-video conference based on deep learning
CN111816185A (en) * 2020-07-07 2020-10-23 广东工业大学 Method and device for identifying speaker in mixed voice
CN111696570B (en) * 2020-08-17 2020-11-24 北京声智科技有限公司 Voice signal processing method, device, equipment and storage medium
CN111899756B (en) * 2020-09-29 2021-04-09 北京清微智能科技有限公司 Single-channel voice separation method and device
CN114093382A (en) * 2021-11-23 2022-02-25 广东电网有限责任公司 Intelligent interaction method suitable for voice information
CN114242072A (en) * 2021-12-21 2022-03-25 上海帝图信息科技有限公司 Voice recognition system for intelligent robot
CN114613385A (en) * 2022-05-07 2022-06-10 广州易而达科技股份有限公司 Far-field voice noise reduction method, cloud server and audio acquisition equipment
CN115240689B (en) * 2022-09-15 2022-12-02 深圳市水世界信息有限公司 Target sound determination method, target sound determination device, computer equipment and medium
CN118574049B (en) * 2024-08-01 2024-11-08 罗普特科技集团股份有限公司 Microphone calibration method and system of multi-mode intelligent terminal

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1653519A (en) * 2002-03-20 2005-08-10 高通股份有限公司 Method for robust voice recognition by analyzing redundant features of source signal
JP2012211768A (en) * 2011-03-30 2012-11-01 Advanced Telecommunication Research Institute International Sound source positioning apparatus
CN103426434A (en) * 2012-05-04 2013-12-04 索尼电脑娱乐公司 Source separation by independent component analysis in conjunction with source direction information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8249867B2 (en) * 2007-12-11 2012-08-21 Electronics And Telecommunications Research Institute Microphone array based speech recognition system and target speech extracting method of the system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1653519A (en) * 2002-03-20 2005-08-10 高通股份有限公司 Method for robust voice recognition by analyzing redundant features of source signal
JP2012211768A (en) * 2011-03-30 2012-11-01 Advanced Telecommunication Research Institute International Sound source positioning apparatus
CN103426434A (en) * 2012-05-04 2013-12-04 索尼电脑娱乐公司 Source separation by independent component analysis in conjunction with source direction information

Also Published As

Publication number Publication date
CN103811020A (en) 2014-05-21

Similar Documents

Publication Publication Date Title
CN103811020B (en) A kind of intelligent sound processing method
EP3707716B1 (en) Multi-channel speech separation
CN108922541B (en) Multi-dimensional characteristic parameter voiceprint recognition method based on DTW and GMM models
CN107919133A (en) For the speech-enhancement system and sound enhancement method of destination object
CN111429939B (en) Sound signal separation method of double sound sources and pickup
CN101593522A (en) A kind of full frequency domain digital hearing aid method and apparatus
CN102388416A (en) Signal processing apparatus and signal processing method
JP4964204B2 (en) Multiple signal section estimation device, multiple signal section estimation method, program thereof, and recording medium
CN109859749A (en) A kind of voice signal recognition methods and device
CN106847301A (en) A kind of ears speech separating method based on compressed sensing and attitude information
CN106019230B (en) A kind of sound localization method based on i-vector Speaker Identification
CN116092512A (en) Small sample voice separation method based on data generation
Enzinger et al. Mismatched distances from speakers to telephone in a forensic-voice-comparison case
JP2017067948A (en) Voice processor and voice processing method
Talagala et al. Binaural localization of speech sources in the median plane using cepstral HRTF extraction
Ramou et al. Automatic detection of articulations disorders from children’s speech preliminary study
CN118212929A (en) Personalized Ambiosonic voice enhancement method
Xia et al. Ava: An adaptive audio filtering architecture for enhancing mobile, embedded, and cyber-physical systems
CN117711422A (en) Underdetermined voice separation method and device based on compressed sensing space information estimation
Krijnders et al. Tone-fit and MFCC scene classification compared to human recognition
Krause et al. Binaural signal representations for joint sound event detection and acoustic scene classification
Oualil et al. Joint detection and localization of multiple speakers using a probabilistic interpretation of the steered response power
Venkatesan et al. Analysis of monaural and binaural statistical properties for the estimation of distance of a target speaker
Nguyen et al. Location Estimation of Receivers in an Audio Room using Deep Learning with a Convolution Neural Network.
Kolossa et al. Missing feature speech recognition in a meeting situation with maximum SNR beamforming

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant