CN106019230B - A kind of sound localization method based on i-vector Speaker Identification - Google Patents

A kind of sound localization method based on i-vector Speaker Identification Download PDF

Info

Publication number
CN106019230B
CN106019230B CN201610365659.6A CN201610365659A CN106019230B CN 106019230 B CN106019230 B CN 106019230B CN 201610365659 A CN201610365659 A CN 201610365659A CN 106019230 B CN106019230 B CN 106019230B
Authority
CN
China
Prior art keywords
signal
cross
correlation function
vector
sound source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610365659.6A
Other languages
Chinese (zh)
Other versions
CN106019230A (en
Inventor
万新旺
顾晓瑜
杨悦
廖鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201610365659.6A priority Critical patent/CN106019230B/en
Publication of CN106019230A publication Critical patent/CN106019230A/en
Application granted granted Critical
Publication of CN106019230B publication Critical patent/CN106019230B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a kind of sound localization methods based on i-vector Speaker Identification, this method identifies the feature of cross-correlation function by introducing, it obtains identifying cross-correlation function, this feature is divided into training set test set, model in i-vector Speaker Recognition System is trained and is tested, maximal possibility estimation to development set i-vector vector distribution probability function is realized using EM algorithm, set up the PLDA model constrained by voice duration, speech recognition and auditory localization can accurately be carried out, the realization of this algorithm, efficiently solve the problems, such as noise and reverberation in traditional auditory localization.

Description

A kind of sound localization method based on i-vector Speaker Identification
Technical field
The present invention relates to a kind of sound localization methods based on i-vector Speaker Identification, belong to Internet information technique Field.
Background technique
The one kind of Speaker Identification as biometrics, be according to application speech waveform in reflection speak human physiology and Behavioural characteristic speech parameter, a kind of automatic technology for identifying speaker's identity.Speaker Identification is a kind of automatic identification speaker Process, it is the important branch in the identification of human body personal characteristics, it is spoken human physiology and row according to reflection in speech waveform The technology for the speech parameter automatic identification speaker's identity being characterized.With the continuous development of information technology, know with other biological Other technology is compared, and Speaker Identification has more easy, and the advantages such as economic and scalability is good can be widely applied to database The fields such as access, safety verification, telephone bank, computer remote login.The speaker Recognition Technology biology important as one Characteristic identity identification technology, has a wide range of applications, and domestic and international many researchers have joined in the research in this field In.In recent years, speaker's modeling technique based on authentication vector i-vector achieved very big success, made The performance for obtaining Speaker Recognition System is greatly improved.Identity-based authentication vector (identity vector, i- Vector subspace modeling) is proved to be the most effective speaker's modeling technique in current forefront.
With the fast development of computer technology and information industry, auditory localization has become a heat of current research Point.It determines that the position of a sound source in space is the research for having very much broad prospect of application, can be widely applied to social life The various aspects for producing and living.Auditory localization is that the sound issued by measurement object positions object, and uses sonar, thunder It reaches, the localization method of wireless telecommunications difference, it is broadband signal that the former signal, which is common sound, and the latter's information source is narrowband letter Number.The characteristics of according to voice signal, there has been proposed different auditory localization algorithms, but due to the presence of noise and reverberation, make The positioning accuracy for obtaining existing auditory localization algorithm is lower.
Current auditory localization algorithm can substantially be divided into 3 classes: location algorithm based on High-Resolution Spectral Estimation, based on time delay The location algorithm estimating the location algorithm of (TDE:Time Delay Estimation) and being formed based on steerable beam.
(1) 4 kinds: ARMA Power estimation method, minimum variance Power estimation method, entropy-spectrum are mainly had based on High-Resolution Spectral Estimation method The estimation technique and subspace method.ARMA Power estimation method is by establishing model to stationary linear signal process come estimated power spectrum density. Entropy spectral estimation method includes maximum entropy method (MEM) and two kinds of minimum cross entropy method.Subspace method include Pisarenko Harmonic Decomposition method, Prony method, multiple signal classification (MUSIC:Multiple Signal Classification) method and be based on invariable rotary skill Art modulated parameter estimating method (ESPRIT:Estimation of Signal Parameters via Rotational Invariance Techniques).Location algorithm based on High-Resolution Spectral Estimation is employed to receive the covariance square of signal Battle array, and the covariance matrix of signal is unknown in practice, it is necessary to estimate to obtain from observation data.Estimate the association side of signal Poor matrix, needs to assume sound source and noise is statistical average, and parameter (sound source position) to be estimated is fixed and invariable, It is averagely obtained in certain time interval, and voice is short-term stationarity signal, tends not to meet this condition.Current method is exhausted Most of designed based on far field narrow band signal, and the reverberation meeting in environment is so that the performance of this kind of algorithm is seriously disliked indoors Change.
(2) location algorithm based on time delay estimation
Algorithm based on time delay estimation is divided into two steps.The first step is time delay estimation, i.e. calculating sound source to every two wheat Time delay between gram wind;Second step is location estimation, i.e., estimates sound source according to the geometric position of time delay and microphone array Position, wherein time delay estimation (TDE) is the most key.Broad sense cross-correlation (GCC:Generalized Cross Correlation) Time Delay Estimation Method, by calculating the cross-correlation function between different microphones reception signals, it is estimated that reaching the time difference (TDOA:Time Difference Arrival).But in the actual environment, due to the influence of noise and reverberation, correlation function Maximum peak can be weakened, cause peak detection difficult.General cross correlation passes through the crosspower spectrum to two microphone signals It is weighted, so that peak value of the correlation function outside time delay is more prominent.Knapp lists five kinds of common weighting functions, The general cross correlation (GCC-ML:GCC using Maximum Likelihood) of middle maximum likelihood weighting and phse conversion The general cross correlation (GCC-PHAT:GCC using Phase Transform) of (PHAT:Phase Transform) weighting It is the most typical.Computation complexity is low and the characteristics of being easily achieved makes GCC method obtain comparing and be widely applied.
(3) location algorithm formed based on steerable beam
It is used for the target positioning of radar and sonar system based on the location algorithm early stage that steerable beam is formed, was introduced into later To microphone array signals processing.Microphone array beam-forming technology main answering there are two aspect in speech signal processing With: 1) speech enhan-cement;2) auditory localization.When known to the position of sound source, the guiding time delay of each microphone is adjusted, can be made The signal of each microphone is aligned in time, so that microphone array is arrived the position of guidance sound source, then by each wheat The signal of gram wind is added, and achievees the purpose that inhibit noise, enhancing signal.Above-mentioned this most simple and practical wave beam, which is referred to as, to be prolonged When-summation (delay-and-sum) Wave beam forming.
Traditional algorithm receives serious limitation in the environment of strong reverberation.For example, controllable based on peak power output Wave beam is more sensitive to external environment and frequency of source reflection, will limit application;Based on High-Resolution Spectral Estimation technology Localization method operand greatly and be unsuitable for the positioning of short distance;The time delay precision of localization method based on time delay is vulnerable to mixed Loud and noise jamming influence.
Summary of the invention
Present invention aims at solving above-mentioned the deficiencies in the prior art, propose a kind of based on i-vector Speaker Identification Auditory localization algorithm, this method by introduce identify cross-correlation function feature, obtain identify cross-correlation function, by this feature It is divided into training set test set, the model in i-vector Speaker Recognition System is trained and is tested, using the maximum phase (EM:expectation maximization) algorithm is hoped to realize the maximum to development set i-vector vector distribution probability function Possibility predication, it is established that a PLDA model constrained by voice duration can accurately carry out speech recognition and sound source is fixed Position, the realization of this algorithm efficiently solve the problems, such as noise and reverberation in traditional auditory localization.
The technical scheme adopted by the invention to solve the technical problem is that: a kind of sound based on i-vector Speaker Identification Source location algorithm, this method include training stage and positioning stage.
Wherein, the step of training stage is as follows:
Step 1: sound source is located at each trained position ri, i=1,2 ... K, microphone array record sound source at this location The signal (reverb signal) of sending;The meaning of K are as follows: the number of sound source training;
Step 2: using the reverb signal recorded, calculating cross-correlation function;
Step 3: feature vector y is generated by cross-correlation function;
Step 4: for each trained position ri, using feature vector, calculate the mean value of cross-correlation function PLDA model The speaker subspace of vector μ and fixed dimensionAnd residual epsilonij
The step of positioning stage, is as follows:
Step 1: microphone array records signal, which includes the signal (reverb signal) and noise that sound source issues;
Step 2: using the signal recorded, calculating cross-correlation function;
Step 3: feature vector y is generated by cross-correlation function;If there is N frame data, then a feature vector set y is generated ={ yt, t=1 ... N };
Step 4: feature being tested using PLDA model, estimates the position of sound source.
In addition, in the selection of cross-correlation function feature, by utilizing a kind of room impulse response pulsing algorithm roomsim To simulate true acoustic environment, signal x1(k) and x2(k) the broad sense cross-correlation function (GCC) between can be in frequency-domain calculations:
In formula, subscript " * " indicates complex conjugate, X1(ω) is x1(t) Fourier transformation, Ψ1,2(ω) is weighting function.
In order to enhance the anti-reverberation ability of cross-correlation function, phase change (PHAT) weighting function can be used:
Formula (1.2) are substituted into formula (1.1), are obtained:
In a practical situation, microphone signal x1(t) and x2(t) after adding window, then X acquired by Fourier transformation1(ω) And X2(ω).If the length (L) of room impulse response is shorter than the length of window function very much, microphone signal can be in frequency domain It indicates are as follows:
Xn(ω)=Hn(rs, ω) and S (ω), n=1,2, (1.4)
In formula, S (ω) and Hn(rs, ω) and it is s (k) and h respectivelyn(rs, k) Fourier transformation.
Formula (1.4) are substituted into formula (1.3), are obtained:
By formula (1.5) it is found that microphone array receives signal x1(k) and x2(k) GCC between is equal to room impulse response h1 (rs, k) and h2(rs, k) between GCC.
However, the length L of room impulse response is more much larger than the length of window function in a practical situation, then microphone signal Frequency domain can only approximate representation are as follows:
Xn(ω)≈Hn(rs, ω) and * S (ω), n=1,2, (1.6)
Moreover, microphone array receives signal x1(k) and x2(k) GCC between can only be approximately equal to room impulse response h1 (rs, k) and h2(rs, k) between GCC, it may be assumed that
It is hereby achieved that the feature of cross-correlation function.
The present invention can be applied under reverberation and noise to Speaker Identification and to the auditory localization of speaker.
Beneficial effect
1, present invention utilizes the features of cross-correlation function, combine the modeling method of PLDA, according to i- in PLDA model The validity of PLDA model can be improved in the probability-distribution function of vector.Compared to traditional auditory localization algorithm, can drop Low error rate improves the accuracy of positioning.The realization of this algorithm efficiently solves noise and reverberation in traditional auditory localization The problem of.
2, the present invention combines the characteristic information of the cross-correlation function of sound source and PLDA algorithm, has by force suitable for all The case where noise and reverberation.
3, extraction of the present invention by the cross-correlation function feature to sound source, convenient and simple, the locating effect of data acquisition Preferably.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention.
Fig. 2 be the present invention to different speakers under iVector model etc. error rates eer analysis schematic diagram.
Fig. 3 is marking point of the present invention to different test datas when iVector model and signal-to-noise ratio are 10dB Analyse schematic diagram.
Fig. 4 is marking point of the present invention to different test datas when iVector model and signal-to-noise ratio are 20dB Analyse schematic diagram.
Specific embodiment
The invention is described in further detail with reference to the accompanying drawings of the specification.
As shown in Figure 1, the present invention is a kind of auditory localization algorithm research based on i-vector Speaker Identification.PLDA is calculated Method is a kind of channel compensation algorithm, it is based on i-Vector feature, because i-Vector feature both includes speaker information It include again channel information, and we are only concerned speaker information, so needing channel compensation.It will be detailed below sound source characteristics Selection, probability linear discriminant analysis, model training and four aspects of marking.
Specific implementation step of the present invention includes the following:
Step 1: using the simulated environment of Roomsim, simulating in the environment for having reverberation and noise, calculate sound source letter The feature of the cross-correlation function of breath carries out the processing such as dimensionality reduction, speech detection to it, and is divided into training set and test set, is next The model training of step is prepared.
Step 2: extracting i-Vector, under the frame of PLDA, the generation process of i-Vector can be hidden with one to be become Amount is to describe.Different hidden variable numbers, different a priori assumptions constitute different PLDA models.It is assumed that i-th is spoken J-th of i-vector of people is expressed as wij, common PLDA model hypothesis is as follows:
wij=μ+Vyi+zij
Wherein, μ is the mean value of all training datas, and V matrix indicates speaker space (eigentones matrix), vector yiIt is right The speaker's factor answered obeys standard gaussian distribution, zijIt indicates residual error, is indicated by a full-shape matrix D.
Step 3: apply PLDA, on labeled data collection by expectation maximization method (EM) estimate model parameter λ=(μ, V, D), initial model uses random value.
Step 4: after having estimated model parameter, giving two i-Vector w1And w2, log-likelihood ratio calculates by formula, Wherein assume θtarIndicate them from the same speaker, θnonIndicate that they, from different speakers, use log-likelihood ratio Calculate score are as follows:
Respectively under noise-free case, have and tested under noise situations, wherein there is signal-to-noise ratio under noise situations gradually to drop Low, even if available in the case where having noise and reverberation by testing, this method also has good locating effect.
The auditory localization algorithm to of the invention based on iVector is compared verifying respectively in varied situations below, Experiment parameter is chosen
(1) emulation data set is chosen in Roomsim, it is one section long rectangular RMR room reverb simulation code, settable sound source With the position of those who answer.Its size is 7m × 6m × 3m, reverberation time (T60) with the relationship of reflection coefficient (β) by Ai Run formula It determines:
Entire data set is divided into training set and test set in the ratio of 8:2, and training set data is inputted as algorithm, and is tested Collection is for the algorithm performance after testing improvement.
(2) sonic location system uses PLDA algorithm, parameter μ, V, yi, zij.μ is the mean value of all training datas, V square Matrix representation speaker space (eigentones matrix), vector yiFor corresponding speaker's factor, standard gaussian distribution, z are obeyedijIt indicates Residual error is indicated by a full-shape matrix D.
(3) the parameter matrix T of i-Vector replaces two spaces using a space, in traditional audio recognition method In, two spaces are the speaker spaces defined by eigentones space matrix, and the letter defined by eigentones channel space matrix Road space.This new space had not only contained the difference between speaker but also had contained the difference of channel.
Experiment 1: verify without make an uproar under environment with iVector model carry out auditory localization etc. error rates result figure
Fig. 2 is the present invention under noise-free environment, carries out auditory localization to five people.Wherein, Model represents the mould of training Type, Test represent the model of test.Every a line is matched with each column, color is deeper, and to represent score higher.Etc. error rates eer It is lower that represent performance better.As seen in Figure 2, without making an uproar under environment, the eer of the algorithm is 0, so the positioning of the model Effect is very good.
Experiment 2: verify signal-to-noise ratio be 15dB environment under with iVector model carry out auditory localization etc. error rates result Figure
Fig. 3 be signal-to-noise ratio be 10dB under etc. error rates result figure.It is similar with experiment 1, it can be seen that at 15dB, eer 0 is remained as, locating effect is fine.
Experiment 3: verify signal-to-noise ratio be 20dB environment under with iVector model carry out auditory localization etc. error rates result Figure
Fig. 4 be signal-to-noise ratio be 20dB under etc. error rates result figure.It is similar with experiment 1, it can be seen that at 15dB, eer 0 is remained as, therefore it may be concluded that the auditory localization algorithm positioning based on i-vector Speaker Identification has well calmly Position effect.
To those skilled in the art, according to above-mentioned implementation type can be easy to association other the advantages of and deformation. Therefore, the present invention is not limited to above example, carries out as just example to a kind of form of the invention detailed, exemplary Explanation.In the range of without departing substantially from present inventive concept, those skilled in the art are equally replaced according to above-mentioned specific example by various Obtained technical solution is changed, should be included within scope of the presently claimed invention and its equivalency range.

Claims (1)

1. a kind of sound localization method based on i-vector Speaker Identification, which is characterized in that the method includes walking as follows It is rapid:
Step 1: sound source is located at each trained position ri, i=1,2 ... K, microphone array are recorded sound source and are issued at this location Signal;K is the number of sound source training;
Step 2: using the reverb signal recorded, calculating cross-correlation function;
Step 3: feature vector y is generated by cross-correlation function;
Step 4: for each trained position ri, using feature vector, calculate the mean vector μ of cross-correlation function PLDA model With the speaker subspace of fixed dimensionAnd residual epsilonij
Step 5: microphone array records signal, which includes the signal and noise that sound source issues;
Step 6: using the signal recorded, calculating cross-correlation function;
Step 7: feature vector y is generated by cross-correlation function;If there is N frame data, then a feature vector set is generated;
Step 8: feature vector being tested using PLDA model, estimates the position of sound source;
In addition, in the selection of cross-correlation function feature, by using a kind of room impulse response pulsing algorithm roomsim come mould Intend true acoustic environment, microphone signal x1(k) and x2(k) the broad sense cross-correlation function (GCC) between is in frequency-domain calculations:
In formula, subscript " * " indicates complex conjugate, X1(ω) is x1(k) Fourier transformation, X2(ω) is x2(k) Fourier transformation, Ψ1,2(ω) is weighting function;
In order to enhance the anti-reverberation ability of cross-correlation function, phase change (PHAT) weighting function is used:
Formula (1.2) are substituted into formula (1.1), are obtained:
In a practical situation, microphone signal x1(k) and x2(k) after adding window, then X acquired by Fourier transformation1(ω) and X2 (ω), if the length (L) of room impulse response is shorter than the length of window function very much, microphone signal is in frequency domain representation are as follows:
Xn(ω)=Hn(rs, ω) and S (ω), n=1,2, (1.4)
In formula, S (ω) and Hn(rs, ω) and it is s (k) and h respectivelyn(rs, k) Fourier transformation, s (k) is sound source at r (s) Signal;
Formula (1.4) are substituted into formula (1.3), are obtained:
Known by formula (1.5), microphone array receives microphone signal x1(k) and x2(k) GCC between is equal to room impulse response h1 (rs, k) and h2(rs, k) between GCC,Equal to room impulse response h1(rs, k) and h2(rs, k) between GCC;
However, the length L of room impulse response is more much larger than the length of window function in a practical situation, then microphone signal is in frequency Domain can only approximate representation are as follows:
Xn(ω)≈Hn(rs, ω) and * S (ω), n=1,2, (1.6)
Moreover, microphone array receives microphone signal x1(k) and x2(k) GCC between can only be approximately equal to room impulse response h1 (rs, k) and h2(rs, k) between GCC, it may be assumed that
Thus the feature of cross-correlation function is obtained.
CN201610365659.6A 2016-05-27 2016-05-27 A kind of sound localization method based on i-vector Speaker Identification Active CN106019230B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610365659.6A CN106019230B (en) 2016-05-27 2016-05-27 A kind of sound localization method based on i-vector Speaker Identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610365659.6A CN106019230B (en) 2016-05-27 2016-05-27 A kind of sound localization method based on i-vector Speaker Identification

Publications (2)

Publication Number Publication Date
CN106019230A CN106019230A (en) 2016-10-12
CN106019230B true CN106019230B (en) 2019-01-08

Family

ID=57091462

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610365659.6A Active CN106019230B (en) 2016-05-27 2016-05-27 A kind of sound localization method based on i-vector Speaker Identification

Country Status (1)

Country Link
CN (1) CN106019230B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107274906A (en) * 2017-06-28 2017-10-20 百度在线网络技术(北京)有限公司 Voice information processing method, device, terminal and storage medium
CN107703486B (en) * 2017-08-23 2021-03-23 南京邮电大学 Sound source positioning method based on convolutional neural network CNN
CN108242234B (en) * 2018-01-10 2020-08-25 腾讯科技(深圳)有限公司 Speech recognition model generation method, speech recognition model generation device, storage medium, and electronic device
CN110555370B (en) * 2019-07-16 2023-03-31 西北工业大学 Channel effect inhibition method based on PLDA factor analysis method in underwater target recognition
CN111462759B (en) * 2020-04-01 2024-02-13 科大讯飞股份有限公司 Speaker labeling method, device, equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BRPI0913460B1 (en) * 2008-09-11 2024-03-05 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. APPARATUS AND METHOD FOR PROVIDING A SET OF SPATIAL INDICATORS ON THE BASIS OF A MICROPHONE SIGNAL AND APPARATUS FOR PROVIDING A TWO-CHANNEL AUDIO SIGNAL AND A SET OF SPATIAL INDICATORS
CN102740208B (en) * 2011-04-14 2014-12-10 东南大学 Multivariate statistics-based positioning method of sound source of hearing aid
CN103439688B (en) * 2013-08-27 2015-04-22 大连理工大学 Sound source positioning system and method used for distributed microphone arrays
CN104732978B (en) * 2015-03-12 2018-05-08 上海交通大学 The relevant method for distinguishing speek person of text based on combined depth study
CN105139857B (en) * 2015-09-02 2019-03-22 中山大学 For the countercheck of voice deception in a kind of automatic Speaker Identification

Also Published As

Publication number Publication date
CN106019230A (en) 2016-10-12

Similar Documents

Publication Publication Date Title
CN106019230B (en) A kind of sound localization method based on i-vector Speaker Identification
CN109839612B (en) Sound source direction estimation method and device based on time-frequency masking and deep neural network
Nguyen et al. Robust source counting and DOA estimation using spatial pseudo-spectrum and convolutional neural network
CN110970053B (en) Multichannel speaker-independent voice separation method based on deep clustering
CN103236260B (en) Speech recognition system
Li et al. Multiple-speaker localization based on direct-path features and likelihood maximization with spatial sparsity regularization
CN102565759B (en) Binaural sound source localization method based on sub-band signal to noise ratio estimation
CN102968990B (en) Speaker identifying method and system
CN111429939B (en) Sound signal separation method of double sound sources and pickup
Wan et al. Sound source localization based on discrimination of cross-correlation functions
CN108647556A (en) Sound localization method based on frequency dividing and deep neural network
CN107219512A (en) A kind of sound localization method based on acoustic transfer function
Al-Karawi et al. Early reflection detection using autocorrelation to improve robustness of speaker verification in reverberant conditions
Bai et al. Time difference of arrival (TDOA)-based acoustic source localization and signal extraction for intelligent audio classification
CN111243600A (en) Voice spoofing attack detection method based on sound field and field pattern
Meier et al. Artificial Neural Network-Based Feature Combination for Spatial Voice Activity Detection.
CN102740208B (en) Multivariate statistics-based positioning method of sound source of hearing aid
CN111179959A (en) Competitive speaker number estimation method and system based on speaker embedding space
Hu et al. Robust binaural sound localisation with temporal attention
Sarabia et al. Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning
Kindt et al. Exploiting speaker embeddings for improved microphone clustering and speech separation in ad-hoc microphone arrays
Youssef et al. From monaural to binaural speaker recognition for humanoid robots
Patil et al. Significance of cmvn for replay spoof detection
Al-Ali et al. Enhanced forensic speaker verification performance using the ICA-EBM algorithm under noisy and reverberant environments
Youssef et al. Binaural speaker recognition for humanoid robots

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant