CN106019230A - Sound source positioning method based on i-vector speaker recognition - Google Patents

Sound source positioning method based on i-vector speaker recognition Download PDF

Info

Publication number
CN106019230A
CN106019230A CN201610365659.6A CN201610365659A CN106019230A CN 106019230 A CN106019230 A CN 106019230A CN 201610365659 A CN201610365659 A CN 201610365659A CN 106019230 A CN106019230 A CN 106019230A
Authority
CN
China
Prior art keywords
vector
signal
cross
correlation function
sound source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610365659.6A
Other languages
Chinese (zh)
Other versions
CN106019230B (en
Inventor
万新旺
顾晓瑜
杨悦
廖鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201610365659.6A priority Critical patent/CN106019230B/en
Publication of CN106019230A publication Critical patent/CN106019230A/en
Application granted granted Critical
Publication of CN106019230B publication Critical patent/CN106019230B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a sound source positioning method based on i-vector speaker recognition. The method includes that the features of the discriminating cross-correlation function are introduced to obtain the discriminating cross-correlation function, and the features are divided into a training set and a testing set, the model in an i-vector speaker recognition system is trained and tested, by means of expectation maximization (EM), the maximum likelihood estimation of the probability function of the development set i-vector distribution is realized, and a PLDA model, which is constrained by the speech duration, is established to accurately perform speech recognition and positioning, in addition, by means of the realization of the algorithm, the problems of noise and reverberation in conventional sound source positioning are effectively solved.

Description

A kind of sound localization method based on i-vector Speaker Identification
Technical field
The present invention relates to a kind of sound localization method based on i-vector Speaker Identification, belong to Internet information technique Field.
Background technology
Speaker Identification as the one of biometrics, be according to application speech waveform in reflection speak human physiology and Behavior characteristics speech parameter, differentiates a kind of technology of speaker's identity automatically.Speaker Identification is that one identifies speaker automatically Process, it is the important branch in human body personal characteristics identification, and it is to speak human physiology and row according to reflection in speech waveform The speech parameter being characterized identifies the technology of speaker's identity automatically.Along with the development of information technology, know with other biological The advantages such as other technology is compared, and Speaker Identification has the easiest, and economic and extensibility is good, can be widely applied to data base The fields such as access, safety verification, telephone bank, computer remote login.Speaker Recognition Technology is as an important biology Characteristic identity identification technology, has a wide range of applications, and the most many researcheres have all joined in the research in this field In.In recent years, the speaker's modeling technique based on authentication vector i-vector achieved the biggest success, made The performance obtaining Speaker Recognition System is greatly improved.Identity-based authentication vector (identity vector, i- Vector) subspace modeling is proved to be current forefront maximally effective speaker modeling technique.
Along with the fast development of computer technology Yu information industry, sound localization has become as a heat of current research Point.Determine that sound source position in space is the research having very much broad prospect of application, can be widely applied to society raw Produce and the various aspects of life.Sound localization is that object is positioned by the sound sent by Measuring Object, with use sonar, thunder Reach, the localization method of wireless telecommunications different, the former signal is common sound, is broadband signal, and the latter's information source is arrowband letter Number.According to the feature of acoustical signal, there has been proposed different sound localization algorithms, but due to noise and the existence of reverberation, make The positioning precision obtaining existing sound localization algorithm is relatively low.
Current sound localization algorithm substantially can be divided into 3 classes: location algorithm based on High-Resolution Spectral Estimation, based on time delay The location algorithm estimating (TDE:Time Delay Estimation) and the location algorithm formed based on steerable beam.
(1) mainly there are 4 kinds: ARMA Power estimation method, minimum variance Power estimation method, entropy-spectrum based on High-Resolution Spectral Estimation method The estimation technique and subspace method.ARMA Power estimation method carrys out estimated power spectrum density by stationary linear signal process is set up model. Entropy spectral estimation method comprises maximum entropy method (MEM) and minimum cross entropy method two kinds.Subspace method include Pisarenko Harmonic Decomposition method, Prony method, multiple signal classification (MUSIC:Multiple Signal Classification) method and based on invariable rotary skill Art modulated parameter estimating method (ESPRIT:Estimation of Signal Parameters via Rotational Invariance Techniques).Location algorithm based on High-Resolution Spectral Estimation is employed to receive the covariance square of signal Battle array, and the covariance matrix of signal is unknown in practice, it is necessary to estimate to obtain from observation data.Estimate the association side of signal Difference matrix, needs to suppose that sound source and noise are statistical average, and parameter (sound source position) to be estimated is changeless, Averagely obtain in intervals, and voice is short-term stationarity signal, tend not to meet this condition.Current method The overwhelming majority designs based on far field narrow band signal, and the reverberation in indoor environment can make the performance of this kind of algorithm seriously dislike Change.
(2) location algorithm estimated based on time delay
The algorithm estimated based on time delay is divided into two steps.The first step is that time delay is estimated, i.e. calculates sound source to each two wheat Time delay between gram wind;Second step is location estimation, i.e. estimates sound source according to the geometric position of time delay and microphone array Position, wherein time delay estimates that (TDE) is the most key.Broad sense cross-correlation (GCC:Generalized Cross Correlation) Time Delay Estimation Method, receives the cross-correlation function between signal, it is estimated that reach time difference by calculating different mike (TDOA:Time Difference Arrival).But in actual environment, due to noise and the impact of reverberation, correlation function Maximum peak can be weakened, cause peakvalue's checking difficulty.General cross correlation is by the crosspower spectrum to two microphone signals It is weighted so that correlation function peak value outside time delay is more prominent.Knapp lists five kinds of conventional weighting functions, its The general cross correlation (GCC-ML:GCC using Maximum Likelihood) of middle maximum likelihood weighting and phse conversion The general cross correlation (GCC-PHAT:GCC using Phase Transform) that (PHAT:Phase Transform) weights Typical case the most.Computation complexity feature that is low and that be easily achieved makes GCC method obtain comparing and be widely applied.
(3) location algorithm formed based on steerable beam
The location algorithm formed based on steerable beam positions for the target of radar and sonar system in early days, is introduced into later Process to microphone array signals.Microphone array beam-forming technology mainly has answering of two aspects With: 1) speech enhan-cement;2) sound localization.When the position of sound source is known, adjusts the guiding time delay of each mike, can make The signal obtaining each mike aligns in time, so that microphone array is by the position to guiding sound source, then by each The signal of mike is added, and reaches to suppress noise, the purpose of enhancing signal.Above-mentioned the most simple and practical this wave beam is referred to as prolonging Time-summation (delay-and-sum) Wave beam forming.
Algorithm traditional in the environment of strong reverberation receives serious restriction.Such as, controlled based on peak power output Wave beam environment to external world and frequency of source reflection are more sensitive, can limit application scenario;Based on High-Resolution Spectral Estimation technology Localization method operand greatly and be unsuitable for in-plant location;The time delay precision of localization method based on time delay is vulnerable to mix Ring and the impact of noise jamming.
Summary of the invention
Present invention aim at solving above-mentioned the deficiencies in the prior art, propose a kind of based on i-vector Speaker Identification Sound localization algorithm, the method by introduce differentiate cross-correlation function feature, obtain differentiate cross-correlation function, by this feature It is divided into training set test set, the model in i-vector Speaker Recognition System is trained and tests, use the maximum phase Hope that (EM:expectation maximization) algorithm realizes the maximum to development set i-vector vector distribution probability function Possibility predication, it is established that a PLDA model retrained by voice duration, it is possible to carry out speech recognition exactly and sound source is fixed Position, the realization of this algorithm, efficiently solve noise and the problem of reverberation in tradition sound localization.
The present invention solves its technical problem and is adopted the technical scheme that: a kind of sound based on i-vector Speaker Identification Source location algorithm, the method includes training stage and positioning stage.
Wherein, the step of training stage is as follows:
Step 1: sound source is positioned at each training position ri, i=1,2 ... K, microphone array records sound source in this position The signal (reverb signal) sent;
Step 2: utilize the reverb signal recorded, calculate cross-correlation function;
Step 3: generated characteristic vector y by cross-correlation function;
Step 4: for each training position ri, utilize characteristic vector, calculate the average of cross-correlation function PLDA model Vector μ and the speaker subspace of fixed dimensionAnd residual epsilonij
The step of positioning stage is as follows:
Step 1: microphone array records signal, this signal includes signal (reverb signal) and the noise that sound source sends;
Step 2: utilize the signal recorded, calculate cross-correlation function;
Step 3: generated characteristic vector y by cross-correlation function;If there being N frame data, then generate a characteristic vector set y ={ yt, t=1 ... N};
Step 4: utilize PLDA model to test feature, estimates the position of sound source.
Additionally, in the choosing of cross-correlation function feature, by utilizing a kind of room impulse response pulsing algorithm roomsim Simulate real acoustic environment, signal x1(k) and x2K the broad sense cross-correlation function (GCC) between () can be in frequency-domain calculations:
R x 1 x 2 ( τ ) = ∫ - ∞ ∞ Ψ 1 , 2 ( ω ) X 1 ( ω ) X 2 * ( ω ) e j ω τ d ω - - - ( 1.1 )
In formula, subscript " * " represents complex conjugate, X1(ω) it is x1The Fourier transformation of (t), Ψ1,2(ω) it is weighting function.
In order to strengthen the anti-reverberation ability of cross-correlation function, it is possible to use phase place change (PHAT) weighting function:
Ψ 1 , 2 ( ω ) = 1 | X 1 ( ω ) X 2 * ( ω ) | - - - ( 1.2 )
Formula (1.2) is substituted into formula (1.1), obtains:
R x 1 x 2 ( τ ) = ∫ - ∞ ∞ X 1 ( ω ) X 2 * ( ω ) | X 1 ( ω ) X 2 * ( ω ) | e j ω τ d ω - - - ( 1.3 )
In a practical situation, microphone signal x1(t) and x2T () is after windowing, then tried to achieve X by Fourier transformation1(ω) And X2(ω).If the length of room impulse response (L) is shorter than the length of window function a lot, then microphone signal is permissible at frequency domain It is expressed as:
Xn(ω)=Hn(rs, ω) and S (ω), n=1,2, (1.4)
In formula, S (ω) and Hn(rs, ω) and it is s (k) and h respectivelyn(rs, Fourier transformation k).
Formula (1.4) is substituted into formula (1.3), obtains:
R x 1 x 2 ( τ ) = ∫ - ∞ ∞ H 1 ( r s , ω ) H 2 * ( r s , ω ) | H 1 ( r s , ω ) H 2 * ( r s , ω ) | e j ω τ d ω = R h 1 h 2 ( r s , τ ) - - - ( 1.5 )
From formula (1.5), microphone array receives signal x1(k) and x2K the GCC between () is equal to room impulse response h1 (rs, k) and h2(rs, k) between GCC.
But, length L of room impulse response is more much larger than the length of window function in a practical situation, then microphone signal At frequency domain can only approximate representation be:
Xn(ω)≈Hn(rs, ω) and * S (ω), n=1,2, (1.6)
And, microphone array receives signal x1(k) and x2K the GCC between () can only be approximately equal to room impulse response h1 (rs, k) and h2(rs, k) between GCC, it may be assumed that
R x 1 x 2 ( τ ) ≈ ∫ - ∞ ∞ H 1 ( r s , ω ) H 2 * ( r s , ω ) | H 1 ( r s , ω ) H 2 * ( r s , ω ) | e j ω τ d ω = R h 1 h 2 ( r s , τ ) - - - ( 1.7 )
It is hereby achieved that the feature of cross-correlation function.
The present invention can be applied under reverberation and noise Speaker Identification and the sound localization to speaker.
Beneficial effect
1, present invention utilizes the feature of cross-correlation function, combine the modeling method of PLDA, according to i-in PLDA model The probability-distribution function of vector, can improve the effectiveness of PLDA model.Compared to traditional sound localization algorithm, can drop Low error rate, improves the accuracy of location.The realization of this algorithm, efficiently solves noise and reverberation in tradition sound localization Problem.
2, characteristic information and the PLDA algorithm of the cross-correlation function of sound source are combined by the present invention, it is adaptable to all have by force Noise and the situation of reverberation.
3, the present invention is by the extraction of the cross-correlation function feature to sound source, and data acquisition is convenient and simple, and locating effect is also Preferably.
Accompanying drawing explanation
Fig. 1 is the method flow diagram of the present invention.
Fig. 2 be the present invention to different speakers under iVector model etc. the analysis schematic diagram of error rate eer.
Fig. 3 is the present invention to difference test data at iVector model and signal to noise ratio is that the marking in the case of 10dB divides Analysis schematic diagram.
Fig. 4 is that difference test data marking analysis in the case of iVector model and signal to noise ratio are 20dB is shown by the present invention It is intended to.
Detailed description of the invention
Below in conjunction with Figure of description, the invention is described in further detail.
As it is shown in figure 1, the present invention is a kind of sound localization algorithm research based on i-vector Speaker Identification.PLDA Algorithm is a kind of channel compensation algorithm, and it is based on i-Vector feature, because i-Vector feature had both comprised speaker's letter Breath comprises again channel information, and we are only concerned speaker information, so needing channel compensation.Will be detailed below sound source special Levy selection, probability linear discriminant analysis, model training and four aspects of marking.
The present invention is embodied as step, includes the following:
Step 1: utilize the simulated environment of Roomsim, simulates at the environment with reverberation and noise, calculates sound source letter The feature of the cross-correlation function of breath, carries out dimensionality reduction, speech detection etc. to it and processes, and be divided into training set and test set, for next The model training of step is prepared.
Step 2: extract i-Vector, under the framework of PLDA, the generation process of i-Vector can be hidden with one and become Amount describes.Different hidden variable numbers, different a priori assumptions constitutes different PLDA models.Assuming that i-th is spoken Jth i-vector of people is expressed as wij, conventional PLDA model hypothesis is as follows:
wij=μ+Vyi+zij
Wherein, μ is the average of all training datas, and V matrix represents speaker space (eigentones matrix), vector yiFor right The speaker's factor answered, obeys standard gaussian distribution, zijRepresent residual error, a full-shape matrix D represent.
Step 3: application PLDA, on labeled data collection by expectation maximization method (EM) estimate model parameter λ=(μ, V, D), initial model uses random value.
Step 4: after having estimated model parameter, given two i-Vector w1And w2, its log-likelihood ratio is calculated by formula, Wherein assume θtarRepresent that they are from same speaker, θnonRepresent that they, from different speakers, use log-likelihood ratio Calculate and be divided into:
s c o r e = l o g p ( w 1 , w 2 | θ t a r ) p ( w 1 , w 2 | θ n o n )
Respectively under noise-free case, have under noise situations and test, wherein have signal to noise ratio under noise situations gradually to drop Low, even if can obtain in the case of having noise and reverberation through experiment, the method also has good locating effect.
Below the sound localization algorithm based on iVector of the present invention is compared checking the most respectively, Experiment parameter is chosen and is included the following:
(1) emulation data set is chosen in Roomsim, and it is a segment length square RMR room reverb simulation code, can arrange sound source Position with those who answer.Its size is 7m × 6m × 3m, reverberation time (T60) with the relation of reflection coefficient (β) by Ai Run formula Determine:
β = exp ( - 13.82 / [ c ( L x - 1 + L y - 1 + L z - 1 ) T 60 ] )
Whole data set is divided into training set and test set in the ratio of 8:2, and training set data inputs as algorithm, and tests Collection algorithm performance after testing improvement.
(2) sonic location system uses PLDA algorithm, and parameter is μ, V, yi, zij.μ is the average of all training datas, V square Matrix representation speaker space (eigentones matrix), vector yiFor corresponding speaker's factor, obey standard gaussian distribution, zijRepresent Residual error, is represented by a full-shape matrix D.
(3) the parameter matrix T of i-Vector uses a space to replace two spaces, at traditional audio recognition method In, two spaces are the speaker spaces defined by eigentones space matrix, and the letter defined by eigentones channel space matrix Space, road.The difference that this new space had not only contained the difference between speaker but also contained channel.
Experiment 1: verify without make an uproar under environment with iVector model carry out sound localization etc. the result figure of error rate
Fig. 2 be the present invention under noise-free environment, five people are carried out sound localization.Wherein, Model represents the mould of training Type, Test represents the model of test.Every a line being mated with every string, it is the highest that color represents score the most deeply.Etc. error rate eer It is the lowest that to represent performance the best.As seen in Figure 2, without making an uproar under environment, the eer of this algorithm is 0, so the location of this model Effect is the best.
Experiment 2: checking under signal to noise ratio is 15dB environment with iVector model carry out sound localization etc. the result of error rate Figure
Fig. 3 be under signal to noise ratio is 10dB etc. the result figure of error rate.Similar with experiment 1, it can be seen that under 15dB, eer Remaining as 0, locating effect is fine.
Experiment 3: checking under signal to noise ratio is 20dB environment with iVector model carry out sound localization etc. the result of error rate Figure
Fig. 4 be under signal to noise ratio is 20dB etc. the result figure of error rate.Similar with experiment 1, it can be seen that under 15dB, eer Remain as 0, therefore it may be concluded that sound localization algorithm based on i-vector Speaker Identification location has the most calmly Position effect.
To those skilled in the art, according to above-mentioned implementation type can be easy to association other advantage and deformation. Therefore, the present invention is not limited to above example, and a kind of form of the present invention is carried out detailed, exemplary as just example by it Explanation.In the range of without departing substantially from present inventive concept, those skilled in the art, according to above-mentioned instantiation, are replaced by various equivalents Change obtained technical scheme, within should be included in scope of the presently claimed invention and equivalency range thereof.

Claims (4)

1. a sound localization method based on i-vector Speaker Identification, it is characterised in that described method includes walking as follows Rapid:
Step 1: sound source is positioned at each training position ri, i=1,2 ... K, microphone array is recorded sound source and is sent in this position Signal;
Step 2: utilize the reverb signal recorded, calculate cross-correlation function;
Step 3: generated characteristic vector y by cross-correlation function;
Step 4: for each training position ri, utilize characteristic vector, calculate the mean vector μ of cross-correlation function PLDA model Speaker subspace with fixed dimensionAnd residual epsilonij
Step 5: microphone array records signal, this signal includes signal and the noise that sound source sends;
Step 6: utilize the signal recorded, calculate cross-correlation function;
Step 7: generated characteristic vector y by cross-correlation function;If there being N frame data, then generate a characteristic vector set y.
Step 8: utilize PLDA model to test feature, estimates the position of sound source.
A kind of sound localization algorithm based on i-vector Speaker Identification the most according to claim 1, it is characterised in that In step 2, described characteristic attribute needs to distribute different weights.
A kind of sound localization algorithm based on i-vector Speaker Identification the most according to claim 1, it is characterised in that In step 3, sound source position eigenvalue is included by item characteristic property calculation, described calculating process:
Step 3-1, in the choosing of cross-correlation function feature, by utilizing a kind of room impulse response pulsing algorithm roomsim to come Simulating real acoustic environment, the broad sense cross-correlation function between signal can be in frequency-domain calculations;
Step 3-2, in order to strengthen the anti-reverberation ability of cross-correlation function, it is possible to use phase place change weighting function;
Step 3-3, in practical situation, microphone signal time-domain function is after windowing, then is tried to achieve frequency domain letter by Fourier transformation Number;If the length of room impulse response is shorter than the length of window function a lot, then the GCC that microphone array receives between signal is equal to The GCC of room impulse response.
A kind of sound localization algorithm based on i-vector Speaker Identification the most according to claim 1, it is characterised in that: Described method is applied to all items sonic location system with characteristic attribute.
CN201610365659.6A 2016-05-27 2016-05-27 A kind of sound localization method based on i-vector Speaker Identification Active CN106019230B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610365659.6A CN106019230B (en) 2016-05-27 2016-05-27 A kind of sound localization method based on i-vector Speaker Identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610365659.6A CN106019230B (en) 2016-05-27 2016-05-27 A kind of sound localization method based on i-vector Speaker Identification

Publications (2)

Publication Number Publication Date
CN106019230A true CN106019230A (en) 2016-10-12
CN106019230B CN106019230B (en) 2019-01-08

Family

ID=57091462

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610365659.6A Active CN106019230B (en) 2016-05-27 2016-05-27 A kind of sound localization method based on i-vector Speaker Identification

Country Status (1)

Country Link
CN (1) CN106019230B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107274906A (en) * 2017-06-28 2017-10-20 百度在线网络技术(北京)有限公司 Voice information processing method, device, terminal and storage medium
CN107703486A (en) * 2017-08-23 2018-02-16 南京邮电大学 A kind of auditory localization algorithm based on convolutional neural networks CNN
CN108242234A (en) * 2018-01-10 2018-07-03 腾讯科技(深圳)有限公司 Speech recognition modeling generation method and its equipment, storage medium, electronic equipment
CN110555370A (en) * 2019-07-16 2019-12-10 西北工业大学 channel effect inhibition method based on PLDA factor analysis method in underwater target recognition
CN111462759A (en) * 2020-04-01 2020-07-28 科大讯飞股份有限公司 Speaker labeling method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010028784A1 (en) * 2008-09-11 2010-03-18 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method and computer program for providing a set of spatial cues on the basis of a microphone signal and apparatus for providing a two-channel audio signal and a set of spatial cues
CN102740208A (en) * 2011-04-14 2012-10-17 东南大学 Multivariate statistics-based positioning method of sound source of hearing aid
CN103439688A (en) * 2013-08-27 2013-12-11 大连理工大学 Sound source positioning system and method used for distributed microphone arrays
CN104732978A (en) * 2015-03-12 2015-06-24 上海交通大学 Text-dependent speaker recognition method based on joint deep learning
CN105139857A (en) * 2015-09-02 2015-12-09 广东顺德中山大学卡内基梅隆大学国际联合研究院 Countercheck method for automatically identifying speaker aiming to voice deception

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010028784A1 (en) * 2008-09-11 2010-03-18 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method and computer program for providing a set of spatial cues on the basis of a microphone signal and apparatus for providing a two-channel audio signal and a set of spatial cues
CN102740208A (en) * 2011-04-14 2012-10-17 东南大学 Multivariate statistics-based positioning method of sound source of hearing aid
CN103439688A (en) * 2013-08-27 2013-12-11 大连理工大学 Sound source positioning system and method used for distributed microphone arrays
CN104732978A (en) * 2015-03-12 2015-06-24 上海交通大学 Text-dependent speaker recognition method based on joint deep learning
CN105139857A (en) * 2015-09-02 2015-12-09 广东顺德中山大学卡内基梅隆大学国际联合研究院 Countercheck method for automatically identifying speaker aiming to voice deception

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
万新旺 等: "基于双耳互相关函数的声源定位算法", 《东南大学学报(自然科学版)》 *
杨琳 等: "说话人识别中的总变化因子分析技术", 《网络新媒体技术》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107274906A (en) * 2017-06-28 2017-10-20 百度在线网络技术(北京)有限公司 Voice information processing method, device, terminal and storage medium
CN107703486A (en) * 2017-08-23 2018-02-16 南京邮电大学 A kind of auditory localization algorithm based on convolutional neural networks CNN
CN107703486B (en) * 2017-08-23 2021-03-23 南京邮电大学 Sound source positioning method based on convolutional neural network CNN
CN108242234A (en) * 2018-01-10 2018-07-03 腾讯科技(深圳)有限公司 Speech recognition modeling generation method and its equipment, storage medium, electronic equipment
CN108242234B (en) * 2018-01-10 2020-08-25 腾讯科技(深圳)有限公司 Speech recognition model generation method, speech recognition model generation device, storage medium, and electronic device
CN110555370A (en) * 2019-07-16 2019-12-10 西北工业大学 channel effect inhibition method based on PLDA factor analysis method in underwater target recognition
CN110555370B (en) * 2019-07-16 2023-03-31 西北工业大学 Channel effect inhibition method based on PLDA factor analysis method in underwater target recognition
CN111462759A (en) * 2020-04-01 2020-07-28 科大讯飞股份有限公司 Speaker labeling method, device, equipment and storage medium
CN111462759B (en) * 2020-04-01 2024-02-13 科大讯飞股份有限公司 Speaker labeling method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN106019230B (en) 2019-01-08

Similar Documents

Publication Publication Date Title
CN109712611B (en) Joint model training method and system
CN109839612A (en) Sounnd source direction estimation method based on time-frequency masking and deep neural network
CN106019230A (en) Sound source positioning method based on i-vector speaker recognition
Li et al. Multiple-speaker localization based on direct-path features and likelihood maximization with spatial sparsity regularization
CN109490822B (en) Voice DOA estimation method based on ResNet
Pfeifenberger et al. DNN-based speech mask estimation for eigenvector beamforming
CN108231067A (en) Sound scenery recognition methods based on convolutional neural networks and random forest classification
Hu et al. Multiple source direction of arrival estimations using relative sound pressure based MUSIC
CN102968990B (en) Speaker identifying method and system
CN106373589B (en) A kind of ears mixing voice separation method based on iteration structure
Wan et al. Sound source localization based on discrimination of cross-correlation functions
CN108417224A (en) The training and recognition methods of two way blocks model and system
CN108766459A (en) Target speaker method of estimation and system in a kind of mixing of multi-person speech
CN105204001A (en) Sound source positioning method and system
CN102664010B (en) Robust speaker distinguishing method based on multifactor frequency displacement invariant feature
CN103456302B (en) A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight
CN107219512A (en) A kind of sound localization method based on acoustic transfer function
Janský et al. Adaptive blind audio source extraction supervised by dominant speaker identification using x-vectors
Pfeifenberger et al. Eigenvector-based speech mask estimation for multi-channel speech enhancement
CN113472390B (en) Frequency hopping signal parameter estimation method based on deep learning
Tong et al. Classification and recognition of underwater target based on MFCC feature extraction
Lv et al. A permutation algorithm based on dynamic time warping in speech frequency-domain blind source separation
Zhao et al. Sound source localization based on srp-phat spatial spectrum and deep neural network
Vargas et al. On improved training of CNN for acoustic source localisation
CN111243600A (en) Voice spoofing attack detection method based on sound field and field pattern

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant