CN106019230B - A kind of sound localization method based on i-vector Speaker Identification - Google Patents
A kind of sound localization method based on i-vector Speaker Identification Download PDFInfo
- Publication number
- CN106019230B CN106019230B CN201610365659.6A CN201610365659A CN106019230B CN 106019230 B CN106019230 B CN 106019230B CN 201610365659 A CN201610365659 A CN 201610365659A CN 106019230 B CN106019230 B CN 106019230B
- Authority
- CN
- China
- Prior art keywords
- signal
- cross
- correlation function
- vector
- sound source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000004807 localization Effects 0.000 title claims abstract description 28
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 32
- 238000005314 correlation function Methods 0.000 claims abstract description 29
- 238000012549 training Methods 0.000 claims abstract description 14
- 230000004044 response Effects 0.000 claims description 11
- 230000009466 transformation Effects 0.000 claims description 5
- 230000008859 change Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000012360 testing method Methods 0.000 abstract description 10
- 238000011161 development Methods 0.000 abstract description 4
- 239000011159 matrix material Substances 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 9
- 238000002474 experimental method Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 230000003595 spectral effect Effects 0.000 description 5
- 241001269238 Data Species 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 241001123248 Arma Species 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 2
- 241000209140 Triticum Species 0.000 description 2
- 235000021307 Triticum Nutrition 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000035479 physiological effects, processes and functions Effects 0.000 description 2
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000004568 cement Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a kind of sound localization methods based on i-vector Speaker Identification, this method identifies the feature of cross-correlation function by introducing, it obtains identifying cross-correlation function, this feature is divided into training set test set, model in i-vector Speaker Recognition System is trained and is tested, maximal possibility estimation to development set i-vector vector distribution probability function is realized using EM algorithm, set up the PLDA model constrained by voice duration, speech recognition and auditory localization can accurately be carried out, the realization of this algorithm, efficiently solve the problems, such as noise and reverberation in traditional auditory localization.
Description
Technical field
The present invention relates to a kind of sound localization methods based on i-vector Speaker Identification, belong to Internet information technique
Field.
Background technique
The one kind of Speaker Identification as biometrics, be according to application speech waveform in reflection speak human physiology and
Behavioural characteristic speech parameter, a kind of automatic technology for identifying speaker's identity.Speaker Identification is a kind of automatic identification speaker
Process, it is the important branch in the identification of human body personal characteristics, it is spoken human physiology and row according to reflection in speech waveform
The technology for the speech parameter automatic identification speaker's identity being characterized.With the continuous development of information technology, know with other biological
Other technology is compared, and Speaker Identification has more easy, and the advantages such as economic and scalability is good can be widely applied to database
The fields such as access, safety verification, telephone bank, computer remote login.The speaker Recognition Technology biology important as one
Characteristic identity identification technology, has a wide range of applications, and domestic and international many researchers have joined in the research in this field
In.In recent years, speaker's modeling technique based on authentication vector i-vector achieved very big success, made
The performance for obtaining Speaker Recognition System is greatly improved.Identity-based authentication vector (identity vector, i-
Vector subspace modeling) is proved to be the most effective speaker's modeling technique in current forefront.
With the fast development of computer technology and information industry, auditory localization has become a heat of current research
Point.It determines that the position of a sound source in space is the research for having very much broad prospect of application, can be widely applied to social life
The various aspects for producing and living.Auditory localization is that the sound issued by measurement object positions object, and uses sonar, thunder
It reaches, the localization method of wireless telecommunications difference, it is broadband signal that the former signal, which is common sound, and the latter's information source is narrowband letter
Number.The characteristics of according to voice signal, there has been proposed different auditory localization algorithms, but due to the presence of noise and reverberation, make
The positioning accuracy for obtaining existing auditory localization algorithm is lower.
Current auditory localization algorithm can substantially be divided into 3 classes: location algorithm based on High-Resolution Spectral Estimation, based on time delay
The location algorithm estimating the location algorithm of (TDE:Time Delay Estimation) and being formed based on steerable beam.
(1) 4 kinds: ARMA Power estimation method, minimum variance Power estimation method, entropy-spectrum are mainly had based on High-Resolution Spectral Estimation method
The estimation technique and subspace method.ARMA Power estimation method is by establishing model to stationary linear signal process come estimated power spectrum density.
Entropy spectral estimation method includes maximum entropy method (MEM) and two kinds of minimum cross entropy method.Subspace method include Pisarenko Harmonic Decomposition method,
Prony method, multiple signal classification (MUSIC:Multiple Signal Classification) method and be based on invariable rotary skill
Art modulated parameter estimating method (ESPRIT:Estimation of Signal Parameters via Rotational
Invariance Techniques).Location algorithm based on High-Resolution Spectral Estimation is employed to receive the covariance square of signal
Battle array, and the covariance matrix of signal is unknown in practice, it is necessary to estimate to obtain from observation data.Estimate the association side of signal
Poor matrix, needs to assume sound source and noise is statistical average, and parameter (sound source position) to be estimated is fixed and invariable,
It is averagely obtained in certain time interval, and voice is short-term stationarity signal, tends not to meet this condition.Current method is exhausted
Most of designed based on far field narrow band signal, and the reverberation meeting in environment is so that the performance of this kind of algorithm is seriously disliked indoors
Change.
(2) location algorithm based on time delay estimation
Algorithm based on time delay estimation is divided into two steps.The first step is time delay estimation, i.e. calculating sound source to every two wheat
Time delay between gram wind;Second step is location estimation, i.e., estimates sound source according to the geometric position of time delay and microphone array
Position, wherein time delay estimation (TDE) is the most key.Broad sense cross-correlation (GCC:Generalized Cross Correlation)
Time Delay Estimation Method, by calculating the cross-correlation function between different microphones reception signals, it is estimated that reaching the time difference
(TDOA:Time Difference Arrival).But in the actual environment, due to the influence of noise and reverberation, correlation function
Maximum peak can be weakened, cause peak detection difficult.General cross correlation passes through the crosspower spectrum to two microphone signals
It is weighted, so that peak value of the correlation function outside time delay is more prominent.Knapp lists five kinds of common weighting functions,
The general cross correlation (GCC-ML:GCC using Maximum Likelihood) of middle maximum likelihood weighting and phse conversion
The general cross correlation (GCC-PHAT:GCC using Phase Transform) of (PHAT:Phase Transform) weighting
It is the most typical.Computation complexity is low and the characteristics of being easily achieved makes GCC method obtain comparing and be widely applied.
(3) location algorithm formed based on steerable beam
It is used for the target positioning of radar and sonar system based on the location algorithm early stage that steerable beam is formed, was introduced into later
To microphone array signals processing.Microphone array beam-forming technology main answering there are two aspect in speech signal processing
With: 1) speech enhan-cement;2) auditory localization.When known to the position of sound source, the guiding time delay of each microphone is adjusted, can be made
The signal of each microphone is aligned in time, so that microphone array is arrived the position of guidance sound source, then by each wheat
The signal of gram wind is added, and achievees the purpose that inhibit noise, enhancing signal.Above-mentioned this most simple and practical wave beam, which is referred to as, to be prolonged
When-summation (delay-and-sum) Wave beam forming.
Traditional algorithm receives serious limitation in the environment of strong reverberation.For example, controllable based on peak power output
Wave beam is more sensitive to external environment and frequency of source reflection, will limit application;Based on High-Resolution Spectral Estimation technology
Localization method operand greatly and be unsuitable for the positioning of short distance;The time delay precision of localization method based on time delay is vulnerable to mixed
Loud and noise jamming influence.
Summary of the invention
Present invention aims at solving above-mentioned the deficiencies in the prior art, propose a kind of based on i-vector Speaker Identification
Auditory localization algorithm, this method by introduce identify cross-correlation function feature, obtain identify cross-correlation function, by this feature
It is divided into training set test set, the model in i-vector Speaker Recognition System is trained and is tested, using the maximum phase
(EM:expectation maximization) algorithm is hoped to realize the maximum to development set i-vector vector distribution probability function
Possibility predication, it is established that a PLDA model constrained by voice duration can accurately carry out speech recognition and sound source is fixed
Position, the realization of this algorithm efficiently solve the problems, such as noise and reverberation in traditional auditory localization.
The technical scheme adopted by the invention to solve the technical problem is that: a kind of sound based on i-vector Speaker Identification
Source location algorithm, this method include training stage and positioning stage.
Wherein, the step of training stage is as follows:
Step 1: sound source is located at each trained position ri, i=1,2 ... K, microphone array record sound source at this location
The signal (reverb signal) of sending;The meaning of K are as follows: the number of sound source training;
Step 2: using the reverb signal recorded, calculating cross-correlation function;
Step 3: feature vector y is generated by cross-correlation function;
Step 4: for each trained position ri, using feature vector, calculate the mean value of cross-correlation function PLDA model
The speaker subspace of vector μ and fixed dimensionAnd residual epsilonij。
The step of positioning stage, is as follows:
Step 1: microphone array records signal, which includes the signal (reverb signal) and noise that sound source issues;
Step 2: using the signal recorded, calculating cross-correlation function;
Step 3: feature vector y is generated by cross-correlation function;If there is N frame data, then a feature vector set y is generated
={ yt, t=1 ... N };
Step 4: feature being tested using PLDA model, estimates the position of sound source.
In addition, in the selection of cross-correlation function feature, by utilizing a kind of room impulse response pulsing algorithm roomsim
To simulate true acoustic environment, signal x1(k) and x2(k) the broad sense cross-correlation function (GCC) between can be in frequency-domain calculations:
In formula, subscript " * " indicates complex conjugate, X1(ω) is x1(t) Fourier transformation, Ψ1,2(ω) is weighting function.
In order to enhance the anti-reverberation ability of cross-correlation function, phase change (PHAT) weighting function can be used:
Formula (1.2) are substituted into formula (1.1), are obtained:
In a practical situation, microphone signal x1(t) and x2(t) after adding window, then X acquired by Fourier transformation1(ω)
And X2(ω).If the length (L) of room impulse response is shorter than the length of window function very much, microphone signal can be in frequency domain
It indicates are as follows:
Xn(ω)=Hn(rs, ω) and S (ω), n=1,2, (1.4)
In formula, S (ω) and Hn(rs, ω) and it is s (k) and h respectivelyn(rs, k) Fourier transformation.
Formula (1.4) are substituted into formula (1.3), are obtained:
By formula (1.5) it is found that microphone array receives signal x1(k) and x2(k) GCC between is equal to room impulse response h1
(rs, k) and h2(rs, k) between GCC.
However, the length L of room impulse response is more much larger than the length of window function in a practical situation, then microphone signal
Frequency domain can only approximate representation are as follows:
Xn(ω)≈Hn(rs, ω) and * S (ω), n=1,2, (1.6)
Moreover, microphone array receives signal x1(k) and x2(k) GCC between can only be approximately equal to room impulse response h1
(rs, k) and h2(rs, k) between GCC, it may be assumed that
It is hereby achieved that the feature of cross-correlation function.
The present invention can be applied under reverberation and noise to Speaker Identification and to the auditory localization of speaker.
Beneficial effect
1, present invention utilizes the features of cross-correlation function, combine the modeling method of PLDA, according to i- in PLDA model
The validity of PLDA model can be improved in the probability-distribution function of vector.Compared to traditional auditory localization algorithm, can drop
Low error rate improves the accuracy of positioning.The realization of this algorithm efficiently solves noise and reverberation in traditional auditory localization
The problem of.
2, the present invention combines the characteristic information of the cross-correlation function of sound source and PLDA algorithm, has by force suitable for all
The case where noise and reverberation.
3, extraction of the present invention by the cross-correlation function feature to sound source, convenient and simple, the locating effect of data acquisition
Preferably.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention.
Fig. 2 be the present invention to different speakers under iVector model etc. error rates eer analysis schematic diagram.
Fig. 3 is marking point of the present invention to different test datas when iVector model and signal-to-noise ratio are 10dB
Analyse schematic diagram.
Fig. 4 is marking point of the present invention to different test datas when iVector model and signal-to-noise ratio are 20dB
Analyse schematic diagram.
Specific embodiment
The invention is described in further detail with reference to the accompanying drawings of the specification.
As shown in Figure 1, the present invention is a kind of auditory localization algorithm research based on i-vector Speaker Identification.PLDA is calculated
Method is a kind of channel compensation algorithm, it is based on i-Vector feature, because i-Vector feature both includes speaker information
It include again channel information, and we are only concerned speaker information, so needing channel compensation.It will be detailed below sound source characteristics
Selection, probability linear discriminant analysis, model training and four aspects of marking.
Specific implementation step of the present invention includes the following:
Step 1: using the simulated environment of Roomsim, simulating in the environment for having reverberation and noise, calculate sound source letter
The feature of the cross-correlation function of breath carries out the processing such as dimensionality reduction, speech detection to it, and is divided into training set and test set, is next
The model training of step is prepared.
Step 2: extracting i-Vector, under the frame of PLDA, the generation process of i-Vector can be hidden with one to be become
Amount is to describe.Different hidden variable numbers, different a priori assumptions constitute different PLDA models.It is assumed that i-th is spoken
J-th of i-vector of people is expressed as wij, common PLDA model hypothesis is as follows:
wij=μ+Vyi+zij
Wherein, μ is the mean value of all training datas, and V matrix indicates speaker space (eigentones matrix), vector yiIt is right
The speaker's factor answered obeys standard gaussian distribution, zijIt indicates residual error, is indicated by a full-shape matrix D.
Step 3: apply PLDA, on labeled data collection by expectation maximization method (EM) estimate model parameter λ=(μ,
V, D), initial model uses random value.
Step 4: after having estimated model parameter, giving two i-Vector w1And w2, log-likelihood ratio calculates by formula,
Wherein assume θtarIndicate them from the same speaker, θnonIndicate that they, from different speakers, use log-likelihood ratio
Calculate score are as follows:
Respectively under noise-free case, have and tested under noise situations, wherein there is signal-to-noise ratio under noise situations gradually to drop
Low, even if available in the case where having noise and reverberation by testing, this method also has good locating effect.
The auditory localization algorithm to of the invention based on iVector is compared verifying respectively in varied situations below,
Experiment parameter is chosen
(1) emulation data set is chosen in Roomsim, it is one section long rectangular RMR room reverb simulation code, settable sound source
With the position of those who answer.Its size is 7m × 6m × 3m, reverberation time (T60) with the relationship of reflection coefficient (β) by Ai Run formula
It determines:
Entire data set is divided into training set and test set in the ratio of 8:2, and training set data is inputted as algorithm, and is tested
Collection is for the algorithm performance after testing improvement.
(2) sonic location system uses PLDA algorithm, parameter μ, V, yi, zij.μ is the mean value of all training datas, V square
Matrix representation speaker space (eigentones matrix), vector yiFor corresponding speaker's factor, standard gaussian distribution, z are obeyedijIt indicates
Residual error is indicated by a full-shape matrix D.
(3) the parameter matrix T of i-Vector replaces two spaces using a space, in traditional audio recognition method
In, two spaces are the speaker spaces defined by eigentones space matrix, and the letter defined by eigentones channel space matrix
Road space.This new space had not only contained the difference between speaker but also had contained the difference of channel.
Experiment 1: verify without make an uproar under environment with iVector model carry out auditory localization etc. error rates result figure
Fig. 2 is the present invention under noise-free environment, carries out auditory localization to five people.Wherein, Model represents the mould of training
Type, Test represent the model of test.Every a line is matched with each column, color is deeper, and to represent score higher.Etc. error rates eer
It is lower that represent performance better.As seen in Figure 2, without making an uproar under environment, the eer of the algorithm is 0, so the positioning of the model
Effect is very good.
Experiment 2: verify signal-to-noise ratio be 15dB environment under with iVector model carry out auditory localization etc. error rates result
Figure
Fig. 3 be signal-to-noise ratio be 10dB under etc. error rates result figure.It is similar with experiment 1, it can be seen that at 15dB, eer
0 is remained as, locating effect is fine.
Experiment 3: verify signal-to-noise ratio be 20dB environment under with iVector model carry out auditory localization etc. error rates result
Figure
Fig. 4 be signal-to-noise ratio be 20dB under etc. error rates result figure.It is similar with experiment 1, it can be seen that at 15dB, eer
0 is remained as, therefore it may be concluded that the auditory localization algorithm positioning based on i-vector Speaker Identification has well calmly
Position effect.
To those skilled in the art, according to above-mentioned implementation type can be easy to association other the advantages of and deformation.
Therefore, the present invention is not limited to above example, carries out as just example to a kind of form of the invention detailed, exemplary
Explanation.In the range of without departing substantially from present inventive concept, those skilled in the art are equally replaced according to above-mentioned specific example by various
Obtained technical solution is changed, should be included within scope of the presently claimed invention and its equivalency range.
Claims (1)
1. a kind of sound localization method based on i-vector Speaker Identification, which is characterized in that the method includes walking as follows
It is rapid:
Step 1: sound source is located at each trained position ri, i=1,2 ... K, microphone array are recorded sound source and are issued at this location
Signal;K is the number of sound source training;
Step 2: using the reverb signal recorded, calculating cross-correlation function;
Step 3: feature vector y is generated by cross-correlation function;
Step 4: for each trained position ri, using feature vector, calculate the mean vector μ of cross-correlation function PLDA model
With the speaker subspace of fixed dimensionAnd residual epsilonij;
Step 5: microphone array records signal, which includes the signal and noise that sound source issues;
Step 6: using the signal recorded, calculating cross-correlation function;
Step 7: feature vector y is generated by cross-correlation function;If there is N frame data, then a feature vector set is generated;
Step 8: feature vector being tested using PLDA model, estimates the position of sound source;
In addition, in the selection of cross-correlation function feature, by using a kind of room impulse response pulsing algorithm roomsim come mould
Intend true acoustic environment, microphone signal x1(k) and x2(k) the broad sense cross-correlation function (GCC) between is in frequency-domain calculations:
In formula, subscript " * " indicates complex conjugate, X1(ω) is x1(k) Fourier transformation, X2(ω) is x2(k) Fourier transformation,
Ψ1,2(ω) is weighting function;
In order to enhance the anti-reverberation ability of cross-correlation function, phase change (PHAT) weighting function is used:
Formula (1.2) are substituted into formula (1.1), are obtained:
In a practical situation, microphone signal x1(k) and x2(k) after adding window, then X acquired by Fourier transformation1(ω) and X2
(ω), if the length (L) of room impulse response is shorter than the length of window function very much, microphone signal is in frequency domain representation are as follows:
Xn(ω)=Hn(rs, ω) and S (ω), n=1,2, (1.4)
In formula, S (ω) and Hn(rs, ω) and it is s (k) and h respectivelyn(rs, k) Fourier transformation, s (k) is sound source at r (s)
Signal;
Formula (1.4) are substituted into formula (1.3), are obtained:
Known by formula (1.5), microphone array receives microphone signal x1(k) and x2(k) GCC between is equal to room impulse response h1
(rs, k) and h2(rs, k) between GCC,Equal to room impulse response h1(rs, k) and h2(rs, k) between GCC;
However, the length L of room impulse response is more much larger than the length of window function in a practical situation, then microphone signal is in frequency
Domain can only approximate representation are as follows:
Xn(ω)≈Hn(rs, ω) and * S (ω), n=1,2, (1.6)
Moreover, microphone array receives microphone signal x1(k) and x2(k) GCC between can only be approximately equal to room impulse response h1
(rs, k) and h2(rs, k) between GCC, it may be assumed that
Thus the feature of cross-correlation function is obtained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610365659.6A CN106019230B (en) | 2016-05-27 | 2016-05-27 | A kind of sound localization method based on i-vector Speaker Identification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610365659.6A CN106019230B (en) | 2016-05-27 | 2016-05-27 | A kind of sound localization method based on i-vector Speaker Identification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106019230A CN106019230A (en) | 2016-10-12 |
CN106019230B true CN106019230B (en) | 2019-01-08 |
Family
ID=57091462
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610365659.6A Active CN106019230B (en) | 2016-05-27 | 2016-05-27 | A kind of sound localization method based on i-vector Speaker Identification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106019230B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107274906A (en) * | 2017-06-28 | 2017-10-20 | 百度在线网络技术(北京)有限公司 | Voice information processing method, device, terminal and storage medium |
CN107703486B (en) * | 2017-08-23 | 2021-03-23 | 南京邮电大学 | Sound source positioning method based on convolutional neural network CNN |
CN108242234B (en) * | 2018-01-10 | 2020-08-25 | 腾讯科技(深圳)有限公司 | Speech recognition model generation method, speech recognition model generation device, storage medium, and electronic device |
CN110555370B (en) * | 2019-07-16 | 2023-03-31 | 西北工业大学 | Channel effect inhibition method based on PLDA factor analysis method in underwater target recognition |
CN111462759B (en) * | 2020-04-01 | 2024-02-13 | 科大讯飞股份有限公司 | Speaker labeling method, device, equipment and storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
BRPI0913460B1 (en) * | 2008-09-11 | 2024-03-05 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | APPARATUS AND METHOD FOR PROVIDING A SET OF SPATIAL INDICATORS ON THE BASIS OF A MICROPHONE SIGNAL AND APPARATUS FOR PROVIDING A TWO-CHANNEL AUDIO SIGNAL AND A SET OF SPATIAL INDICATORS |
CN102740208B (en) * | 2011-04-14 | 2014-12-10 | 东南大学 | Multivariate statistics-based positioning method of sound source of hearing aid |
CN103439688B (en) * | 2013-08-27 | 2015-04-22 | 大连理工大学 | Sound source positioning system and method used for distributed microphone arrays |
CN104732978B (en) * | 2015-03-12 | 2018-05-08 | 上海交通大学 | The relevant method for distinguishing speek person of text based on combined depth study |
CN105139857B (en) * | 2015-09-02 | 2019-03-22 | 中山大学 | For the countercheck of voice deception in a kind of automatic Speaker Identification |
-
2016
- 2016-05-27 CN CN201610365659.6A patent/CN106019230B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN106019230A (en) | 2016-10-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106019230B (en) | A kind of sound localization method based on i-vector Speaker Identification | |
CN109839612B (en) | Sound source direction estimation method and device based on time-frequency masking and deep neural network | |
Nguyen et al. | Robust source counting and DOA estimation using spatial pseudo-spectrum and convolutional neural network | |
CN110970053B (en) | Multichannel speaker-independent voice separation method based on deep clustering | |
CN103236260B (en) | Speech recognition system | |
Li et al. | Multiple-speaker localization based on direct-path features and likelihood maximization with spatial sparsity regularization | |
CN102565759B (en) | Binaural sound source localization method based on sub-band signal to noise ratio estimation | |
CN102968990B (en) | Speaker identifying method and system | |
CN111429939B (en) | Sound signal separation method of double sound sources and pickup | |
Wan et al. | Sound source localization based on discrimination of cross-correlation functions | |
CN108647556A (en) | Sound localization method based on frequency dividing and deep neural network | |
CN107219512A (en) | A kind of sound localization method based on acoustic transfer function | |
Al-Karawi et al. | Early reflection detection using autocorrelation to improve robustness of speaker verification in reverberant conditions | |
Bai et al. | Time difference of arrival (TDOA)-based acoustic source localization and signal extraction for intelligent audio classification | |
CN111243600A (en) | Voice spoofing attack detection method based on sound field and field pattern | |
Meier et al. | Artificial Neural Network-Based Feature Combination for Spatial Voice Activity Detection. | |
CN102740208B (en) | Multivariate statistics-based positioning method of sound source of hearing aid | |
CN111179959A (en) | Competitive speaker number estimation method and system based on speaker embedding space | |
Hu et al. | Robust binaural sound localisation with temporal attention | |
Sarabia et al. | Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning | |
Kindt et al. | Exploiting speaker embeddings for improved microphone clustering and speech separation in ad-hoc microphone arrays | |
Youssef et al. | From monaural to binaural speaker recognition for humanoid robots | |
Patil et al. | Significance of cmvn for replay spoof detection | |
Al-Ali et al. | Enhanced forensic speaker verification performance using the ICA-EBM algorithm under noisy and reverberant environments | |
Youssef et al. | Binaural speaker recognition for humanoid robots |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |