CN106019230A - Sound source positioning method based on i-vector speaker recognition - Google Patents
Sound source positioning method based on i-vector speaker recognition Download PDFInfo
- Publication number
- CN106019230A CN106019230A CN201610365659.6A CN201610365659A CN106019230A CN 106019230 A CN106019230 A CN 106019230A CN 201610365659 A CN201610365659 A CN 201610365659A CN 106019230 A CN106019230 A CN 106019230A
- Authority
- CN
- China
- Prior art keywords
- vector
- signal
- cross
- correlation function
- sound source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 35
- 238000005314 correlation function Methods 0.000 claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 16
- 238000012360 testing method Methods 0.000 claims abstract description 14
- 230000004807 localization Effects 0.000 claims description 27
- 230000004044 response Effects 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000011161 development Methods 0.000 abstract description 4
- 238000007476 Maximum Likelihood Methods 0.000 abstract description 3
- 239000011159 matrix material Substances 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 10
- 238000002474 experimental method Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 230000003595 spectral effect Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 241001123248 Arma Species 0.000 description 2
- 241001269238 Data Species 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000035479 physiological effects, processes and functions Effects 0.000 description 2
- 241000209140 Triticum Species 0.000 description 1
- 235000021307 Triticum Nutrition 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000004568 cement Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a sound source positioning method based on i-vector speaker recognition. The method includes that the features of the discriminating cross-correlation function are introduced to obtain the discriminating cross-correlation function, and the features are divided into a training set and a testing set, the model in an i-vector speaker recognition system is trained and tested, by means of expectation maximization (EM), the maximum likelihood estimation of the probability function of the development set i-vector distribution is realized, and a PLDA model, which is constrained by the speech duration, is established to accurately perform speech recognition and positioning, in addition, by means of the realization of the algorithm, the problems of noise and reverberation in conventional sound source positioning are effectively solved.
Description
Technical field
The present invention relates to a kind of sound localization method based on i-vector Speaker Identification, belong to Internet information technique
Field.
Background technology
Speaker Identification as the one of biometrics, be according to application speech waveform in reflection speak human physiology and
Behavior characteristics speech parameter, differentiates a kind of technology of speaker's identity automatically.Speaker Identification is that one identifies speaker automatically
Process, it is the important branch in human body personal characteristics identification, and it is to speak human physiology and row according to reflection in speech waveform
The speech parameter being characterized identifies the technology of speaker's identity automatically.Along with the development of information technology, know with other biological
The advantages such as other technology is compared, and Speaker Identification has the easiest, and economic and extensibility is good, can be widely applied to data base
The fields such as access, safety verification, telephone bank, computer remote login.Speaker Recognition Technology is as an important biology
Characteristic identity identification technology, has a wide range of applications, and the most many researcheres have all joined in the research in this field
In.In recent years, the speaker's modeling technique based on authentication vector i-vector achieved the biggest success, made
The performance obtaining Speaker Recognition System is greatly improved.Identity-based authentication vector (identity vector, i-
Vector) subspace modeling is proved to be current forefront maximally effective speaker modeling technique.
Along with the fast development of computer technology Yu information industry, sound localization has become as a heat of current research
Point.Determine that sound source position in space is the research having very much broad prospect of application, can be widely applied to society raw
Produce and the various aspects of life.Sound localization is that object is positioned by the sound sent by Measuring Object, with use sonar, thunder
Reach, the localization method of wireless telecommunications different, the former signal is common sound, is broadband signal, and the latter's information source is arrowband letter
Number.According to the feature of acoustical signal, there has been proposed different sound localization algorithms, but due to noise and the existence of reverberation, make
The positioning precision obtaining existing sound localization algorithm is relatively low.
Current sound localization algorithm substantially can be divided into 3 classes: location algorithm based on High-Resolution Spectral Estimation, based on time delay
The location algorithm estimating (TDE:Time Delay Estimation) and the location algorithm formed based on steerable beam.
(1) mainly there are 4 kinds: ARMA Power estimation method, minimum variance Power estimation method, entropy-spectrum based on High-Resolution Spectral Estimation method
The estimation technique and subspace method.ARMA Power estimation method carrys out estimated power spectrum density by stationary linear signal process is set up model.
Entropy spectral estimation method comprises maximum entropy method (MEM) and minimum cross entropy method two kinds.Subspace method include Pisarenko Harmonic Decomposition method,
Prony method, multiple signal classification (MUSIC:Multiple Signal Classification) method and based on invariable rotary skill
Art modulated parameter estimating method (ESPRIT:Estimation of Signal Parameters via Rotational
Invariance Techniques).Location algorithm based on High-Resolution Spectral Estimation is employed to receive the covariance square of signal
Battle array, and the covariance matrix of signal is unknown in practice, it is necessary to estimate to obtain from observation data.Estimate the association side of signal
Difference matrix, needs to suppose that sound source and noise are statistical average, and parameter (sound source position) to be estimated is changeless,
Averagely obtain in intervals, and voice is short-term stationarity signal, tend not to meet this condition.Current method
The overwhelming majority designs based on far field narrow band signal, and the reverberation in indoor environment can make the performance of this kind of algorithm seriously dislike
Change.
(2) location algorithm estimated based on time delay
The algorithm estimated based on time delay is divided into two steps.The first step is that time delay is estimated, i.e. calculates sound source to each two wheat
Time delay between gram wind;Second step is location estimation, i.e. estimates sound source according to the geometric position of time delay and microphone array
Position, wherein time delay estimates that (TDE) is the most key.Broad sense cross-correlation (GCC:Generalized Cross Correlation)
Time Delay Estimation Method, receives the cross-correlation function between signal, it is estimated that reach time difference by calculating different mike
(TDOA:Time Difference Arrival).But in actual environment, due to noise and the impact of reverberation, correlation function
Maximum peak can be weakened, cause peakvalue's checking difficulty.General cross correlation is by the crosspower spectrum to two microphone signals
It is weighted so that correlation function peak value outside time delay is more prominent.Knapp lists five kinds of conventional weighting functions, its
The general cross correlation (GCC-ML:GCC using Maximum Likelihood) of middle maximum likelihood weighting and phse conversion
The general cross correlation (GCC-PHAT:GCC using Phase Transform) that (PHAT:Phase Transform) weights
Typical case the most.Computation complexity feature that is low and that be easily achieved makes GCC method obtain comparing and be widely applied.
(3) location algorithm formed based on steerable beam
The location algorithm formed based on steerable beam positions for the target of radar and sonar system in early days, is introduced into later
Process to microphone array signals.Microphone array beam-forming technology mainly has answering of two aspects
With: 1) speech enhan-cement;2) sound localization.When the position of sound source is known, adjusts the guiding time delay of each mike, can make
The signal obtaining each mike aligns in time, so that microphone array is by the position to guiding sound source, then by each
The signal of mike is added, and reaches to suppress noise, the purpose of enhancing signal.Above-mentioned the most simple and practical this wave beam is referred to as prolonging
Time-summation (delay-and-sum) Wave beam forming.
Algorithm traditional in the environment of strong reverberation receives serious restriction.Such as, controlled based on peak power output
Wave beam environment to external world and frequency of source reflection are more sensitive, can limit application scenario;Based on High-Resolution Spectral Estimation technology
Localization method operand greatly and be unsuitable for in-plant location;The time delay precision of localization method based on time delay is vulnerable to mix
Ring and the impact of noise jamming.
Summary of the invention
Present invention aim at solving above-mentioned the deficiencies in the prior art, propose a kind of based on i-vector Speaker Identification
Sound localization algorithm, the method by introduce differentiate cross-correlation function feature, obtain differentiate cross-correlation function, by this feature
It is divided into training set test set, the model in i-vector Speaker Recognition System is trained and tests, use the maximum phase
Hope that (EM:expectation maximization) algorithm realizes the maximum to development set i-vector vector distribution probability function
Possibility predication, it is established that a PLDA model retrained by voice duration, it is possible to carry out speech recognition exactly and sound source is fixed
Position, the realization of this algorithm, efficiently solve noise and the problem of reverberation in tradition sound localization.
The present invention solves its technical problem and is adopted the technical scheme that: a kind of sound based on i-vector Speaker Identification
Source location algorithm, the method includes training stage and positioning stage.
Wherein, the step of training stage is as follows:
Step 1: sound source is positioned at each training position ri, i=1,2 ... K, microphone array records sound source in this position
The signal (reverb signal) sent;
Step 2: utilize the reverb signal recorded, calculate cross-correlation function;
Step 3: generated characteristic vector y by cross-correlation function;
Step 4: for each training position ri, utilize characteristic vector, calculate the average of cross-correlation function PLDA model
Vector μ and the speaker subspace of fixed dimensionAnd residual epsilonij。
The step of positioning stage is as follows:
Step 1: microphone array records signal, this signal includes signal (reverb signal) and the noise that sound source sends;
Step 2: utilize the signal recorded, calculate cross-correlation function;
Step 3: generated characteristic vector y by cross-correlation function;If there being N frame data, then generate a characteristic vector set y
={ yt, t=1 ... N};
Step 4: utilize PLDA model to test feature, estimates the position of sound source.
Additionally, in the choosing of cross-correlation function feature, by utilizing a kind of room impulse response pulsing algorithm roomsim
Simulate real acoustic environment, signal x1(k) and x2K the broad sense cross-correlation function (GCC) between () can be in frequency-domain calculations:
In formula, subscript " * " represents complex conjugate, X1(ω) it is x1The Fourier transformation of (t), Ψ1,2(ω) it is weighting function.
In order to strengthen the anti-reverberation ability of cross-correlation function, it is possible to use phase place change (PHAT) weighting function:
Formula (1.2) is substituted into formula (1.1), obtains:
In a practical situation, microphone signal x1(t) and x2T () is after windowing, then tried to achieve X by Fourier transformation1(ω)
And X2(ω).If the length of room impulse response (L) is shorter than the length of window function a lot, then microphone signal is permissible at frequency domain
It is expressed as:
Xn(ω)=Hn(rs, ω) and S (ω), n=1,2, (1.4)
In formula, S (ω) and Hn(rs, ω) and it is s (k) and h respectivelyn(rs, Fourier transformation k).
Formula (1.4) is substituted into formula (1.3), obtains:
From formula (1.5), microphone array receives signal x1(k) and x2K the GCC between () is equal to room impulse response h1
(rs, k) and h2(rs, k) between GCC.
But, length L of room impulse response is more much larger than the length of window function in a practical situation, then microphone signal
At frequency domain can only approximate representation be:
Xn(ω)≈Hn(rs, ω) and * S (ω), n=1,2, (1.6)
And, microphone array receives signal x1(k) and x2K the GCC between () can only be approximately equal to room impulse response h1
(rs, k) and h2(rs, k) between GCC, it may be assumed that
It is hereby achieved that the feature of cross-correlation function.
The present invention can be applied under reverberation and noise Speaker Identification and the sound localization to speaker.
Beneficial effect
1, present invention utilizes the feature of cross-correlation function, combine the modeling method of PLDA, according to i-in PLDA model
The probability-distribution function of vector, can improve the effectiveness of PLDA model.Compared to traditional sound localization algorithm, can drop
Low error rate, improves the accuracy of location.The realization of this algorithm, efficiently solves noise and reverberation in tradition sound localization
Problem.
2, characteristic information and the PLDA algorithm of the cross-correlation function of sound source are combined by the present invention, it is adaptable to all have by force
Noise and the situation of reverberation.
3, the present invention is by the extraction of the cross-correlation function feature to sound source, and data acquisition is convenient and simple, and locating effect is also
Preferably.
Accompanying drawing explanation
Fig. 1 is the method flow diagram of the present invention.
Fig. 2 be the present invention to different speakers under iVector model etc. the analysis schematic diagram of error rate eer.
Fig. 3 is the present invention to difference test data at iVector model and signal to noise ratio is that the marking in the case of 10dB divides
Analysis schematic diagram.
Fig. 4 is that difference test data marking analysis in the case of iVector model and signal to noise ratio are 20dB is shown by the present invention
It is intended to.
Detailed description of the invention
Below in conjunction with Figure of description, the invention is described in further detail.
As it is shown in figure 1, the present invention is a kind of sound localization algorithm research based on i-vector Speaker Identification.PLDA
Algorithm is a kind of channel compensation algorithm, and it is based on i-Vector feature, because i-Vector feature had both comprised speaker's letter
Breath comprises again channel information, and we are only concerned speaker information, so needing channel compensation.Will be detailed below sound source special
Levy selection, probability linear discriminant analysis, model training and four aspects of marking.
The present invention is embodied as step, includes the following:
Step 1: utilize the simulated environment of Roomsim, simulates at the environment with reverberation and noise, calculates sound source letter
The feature of the cross-correlation function of breath, carries out dimensionality reduction, speech detection etc. to it and processes, and be divided into training set and test set, for next
The model training of step is prepared.
Step 2: extract i-Vector, under the framework of PLDA, the generation process of i-Vector can be hidden with one and become
Amount describes.Different hidden variable numbers, different a priori assumptions constitutes different PLDA models.Assuming that i-th is spoken
Jth i-vector of people is expressed as wij, conventional PLDA model hypothesis is as follows:
wij=μ+Vyi+zij
Wherein, μ is the average of all training datas, and V matrix represents speaker space (eigentones matrix), vector yiFor right
The speaker's factor answered, obeys standard gaussian distribution, zijRepresent residual error, a full-shape matrix D represent.
Step 3: application PLDA, on labeled data collection by expectation maximization method (EM) estimate model parameter λ=(μ,
V, D), initial model uses random value.
Step 4: after having estimated model parameter, given two i-Vector w1And w2, its log-likelihood ratio is calculated by formula,
Wherein assume θtarRepresent that they are from same speaker, θnonRepresent that they, from different speakers, use log-likelihood ratio
Calculate and be divided into:
Respectively under noise-free case, have under noise situations and test, wherein have signal to noise ratio under noise situations gradually to drop
Low, even if can obtain in the case of having noise and reverberation through experiment, the method also has good locating effect.
Below the sound localization algorithm based on iVector of the present invention is compared checking the most respectively,
Experiment parameter is chosen and is included the following:
(1) emulation data set is chosen in Roomsim, and it is a segment length square RMR room reverb simulation code, can arrange sound source
Position with those who answer.Its size is 7m × 6m × 3m, reverberation time (T60) with the relation of reflection coefficient (β) by Ai Run formula
Determine:
Whole data set is divided into training set and test set in the ratio of 8:2, and training set data inputs as algorithm, and tests
Collection algorithm performance after testing improvement.
(2) sonic location system uses PLDA algorithm, and parameter is μ, V, yi, zij.μ is the average of all training datas, V square
Matrix representation speaker space (eigentones matrix), vector yiFor corresponding speaker's factor, obey standard gaussian distribution, zijRepresent
Residual error, is represented by a full-shape matrix D.
(3) the parameter matrix T of i-Vector uses a space to replace two spaces, at traditional audio recognition method
In, two spaces are the speaker spaces defined by eigentones space matrix, and the letter defined by eigentones channel space matrix
Space, road.The difference that this new space had not only contained the difference between speaker but also contained channel.
Experiment 1: verify without make an uproar under environment with iVector model carry out sound localization etc. the result figure of error rate
Fig. 2 be the present invention under noise-free environment, five people are carried out sound localization.Wherein, Model represents the mould of training
Type, Test represents the model of test.Every a line being mated with every string, it is the highest that color represents score the most deeply.Etc. error rate eer
It is the lowest that to represent performance the best.As seen in Figure 2, without making an uproar under environment, the eer of this algorithm is 0, so the location of this model
Effect is the best.
Experiment 2: checking under signal to noise ratio is 15dB environment with iVector model carry out sound localization etc. the result of error rate
Figure
Fig. 3 be under signal to noise ratio is 10dB etc. the result figure of error rate.Similar with experiment 1, it can be seen that under 15dB, eer
Remaining as 0, locating effect is fine.
Experiment 3: checking under signal to noise ratio is 20dB environment with iVector model carry out sound localization etc. the result of error rate
Figure
Fig. 4 be under signal to noise ratio is 20dB etc. the result figure of error rate.Similar with experiment 1, it can be seen that under 15dB, eer
Remain as 0, therefore it may be concluded that sound localization algorithm based on i-vector Speaker Identification location has the most calmly
Position effect.
To those skilled in the art, according to above-mentioned implementation type can be easy to association other advantage and deformation.
Therefore, the present invention is not limited to above example, and a kind of form of the present invention is carried out detailed, exemplary as just example by it
Explanation.In the range of without departing substantially from present inventive concept, those skilled in the art, according to above-mentioned instantiation, are replaced by various equivalents
Change obtained technical scheme, within should be included in scope of the presently claimed invention and equivalency range thereof.
Claims (4)
1. a sound localization method based on i-vector Speaker Identification, it is characterised in that described method includes walking as follows
Rapid:
Step 1: sound source is positioned at each training position ri, i=1,2 ... K, microphone array is recorded sound source and is sent in this position
Signal;
Step 2: utilize the reverb signal recorded, calculate cross-correlation function;
Step 3: generated characteristic vector y by cross-correlation function;
Step 4: for each training position ri, utilize characteristic vector, calculate the mean vector μ of cross-correlation function PLDA model
Speaker subspace with fixed dimensionAnd residual epsilonij;
Step 5: microphone array records signal, this signal includes signal and the noise that sound source sends;
Step 6: utilize the signal recorded, calculate cross-correlation function;
Step 7: generated characteristic vector y by cross-correlation function;If there being N frame data, then generate a characteristic vector set y.
Step 8: utilize PLDA model to test feature, estimates the position of sound source.
A kind of sound localization algorithm based on i-vector Speaker Identification the most according to claim 1, it is characterised in that
In step 2, described characteristic attribute needs to distribute different weights.
A kind of sound localization algorithm based on i-vector Speaker Identification the most according to claim 1, it is characterised in that
In step 3, sound source position eigenvalue is included by item characteristic property calculation, described calculating process:
Step 3-1, in the choosing of cross-correlation function feature, by utilizing a kind of room impulse response pulsing algorithm roomsim to come
Simulating real acoustic environment, the broad sense cross-correlation function between signal can be in frequency-domain calculations;
Step 3-2, in order to strengthen the anti-reverberation ability of cross-correlation function, it is possible to use phase place change weighting function;
Step 3-3, in practical situation, microphone signal time-domain function is after windowing, then is tried to achieve frequency domain letter by Fourier transformation
Number;If the length of room impulse response is shorter than the length of window function a lot, then the GCC that microphone array receives between signal is equal to
The GCC of room impulse response.
A kind of sound localization algorithm based on i-vector Speaker Identification the most according to claim 1, it is characterised in that:
Described method is applied to all items sonic location system with characteristic attribute.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610365659.6A CN106019230B (en) | 2016-05-27 | 2016-05-27 | A kind of sound localization method based on i-vector Speaker Identification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610365659.6A CN106019230B (en) | 2016-05-27 | 2016-05-27 | A kind of sound localization method based on i-vector Speaker Identification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106019230A true CN106019230A (en) | 2016-10-12 |
CN106019230B CN106019230B (en) | 2019-01-08 |
Family
ID=57091462
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610365659.6A Active CN106019230B (en) | 2016-05-27 | 2016-05-27 | A kind of sound localization method based on i-vector Speaker Identification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106019230B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107274906A (en) * | 2017-06-28 | 2017-10-20 | 百度在线网络技术(北京)有限公司 | Voice information processing method, device, terminal and storage medium |
CN107703486A (en) * | 2017-08-23 | 2018-02-16 | 南京邮电大学 | A kind of auditory localization algorithm based on convolutional neural networks CNN |
CN108242234A (en) * | 2018-01-10 | 2018-07-03 | 腾讯科技(深圳)有限公司 | Speech recognition modeling generation method and its equipment, storage medium, electronic equipment |
CN110555370A (en) * | 2019-07-16 | 2019-12-10 | 西北工业大学 | channel effect inhibition method based on PLDA factor analysis method in underwater target recognition |
CN111462759A (en) * | 2020-04-01 | 2020-07-28 | 科大讯飞股份有限公司 | Speaker labeling method, device, equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010028784A1 (en) * | 2008-09-11 | 2010-03-18 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus, method and computer program for providing a set of spatial cues on the basis of a microphone signal and apparatus for providing a two-channel audio signal and a set of spatial cues |
CN102740208A (en) * | 2011-04-14 | 2012-10-17 | 东南大学 | Multivariate statistics-based positioning method of sound source of hearing aid |
CN103439688A (en) * | 2013-08-27 | 2013-12-11 | 大连理工大学 | Sound source positioning system and method used for distributed microphone arrays |
CN104732978A (en) * | 2015-03-12 | 2015-06-24 | 上海交通大学 | Text-dependent speaker recognition method based on joint deep learning |
CN105139857A (en) * | 2015-09-02 | 2015-12-09 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Countercheck method for automatically identifying speaker aiming to voice deception |
-
2016
- 2016-05-27 CN CN201610365659.6A patent/CN106019230B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010028784A1 (en) * | 2008-09-11 | 2010-03-18 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus, method and computer program for providing a set of spatial cues on the basis of a microphone signal and apparatus for providing a two-channel audio signal and a set of spatial cues |
CN102740208A (en) * | 2011-04-14 | 2012-10-17 | 东南大学 | Multivariate statistics-based positioning method of sound source of hearing aid |
CN103439688A (en) * | 2013-08-27 | 2013-12-11 | 大连理工大学 | Sound source positioning system and method used for distributed microphone arrays |
CN104732978A (en) * | 2015-03-12 | 2015-06-24 | 上海交通大学 | Text-dependent speaker recognition method based on joint deep learning |
CN105139857A (en) * | 2015-09-02 | 2015-12-09 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Countercheck method for automatically identifying speaker aiming to voice deception |
Non-Patent Citations (2)
Title |
---|
万新旺 等: "基于双耳互相关函数的声源定位算法", 《东南大学学报(自然科学版)》 * |
杨琳 等: "说话人识别中的总变化因子分析技术", 《网络新媒体技术》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107274906A (en) * | 2017-06-28 | 2017-10-20 | 百度在线网络技术(北京)有限公司 | Voice information processing method, device, terminal and storage medium |
CN107703486A (en) * | 2017-08-23 | 2018-02-16 | 南京邮电大学 | A kind of auditory localization algorithm based on convolutional neural networks CNN |
CN107703486B (en) * | 2017-08-23 | 2021-03-23 | 南京邮电大学 | Sound source positioning method based on convolutional neural network CNN |
CN108242234A (en) * | 2018-01-10 | 2018-07-03 | 腾讯科技(深圳)有限公司 | Speech recognition modeling generation method and its equipment, storage medium, electronic equipment |
CN108242234B (en) * | 2018-01-10 | 2020-08-25 | 腾讯科技(深圳)有限公司 | Speech recognition model generation method, speech recognition model generation device, storage medium, and electronic device |
CN110555370A (en) * | 2019-07-16 | 2019-12-10 | 西北工业大学 | channel effect inhibition method based on PLDA factor analysis method in underwater target recognition |
CN110555370B (en) * | 2019-07-16 | 2023-03-31 | 西北工业大学 | Channel effect inhibition method based on PLDA factor analysis method in underwater target recognition |
CN111462759A (en) * | 2020-04-01 | 2020-07-28 | 科大讯飞股份有限公司 | Speaker labeling method, device, equipment and storage medium |
CN111462759B (en) * | 2020-04-01 | 2024-02-13 | 科大讯飞股份有限公司 | Speaker labeling method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106019230B (en) | 2019-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109712611B (en) | Joint model training method and system | |
CN109839612A (en) | Sounnd source direction estimation method based on time-frequency masking and deep neural network | |
CN106019230A (en) | Sound source positioning method based on i-vector speaker recognition | |
Li et al. | Multiple-speaker localization based on direct-path features and likelihood maximization with spatial sparsity regularization | |
CN109490822B (en) | Voice DOA estimation method based on ResNet | |
Pfeifenberger et al. | DNN-based speech mask estimation for eigenvector beamforming | |
CN108231067A (en) | Sound scenery recognition methods based on convolutional neural networks and random forest classification | |
Hu et al. | Multiple source direction of arrival estimations using relative sound pressure based MUSIC | |
CN102968990B (en) | Speaker identifying method and system | |
CN106373589B (en) | A kind of ears mixing voice separation method based on iteration structure | |
Wan et al. | Sound source localization based on discrimination of cross-correlation functions | |
CN108417224A (en) | The training and recognition methods of two way blocks model and system | |
CN108766459A (en) | Target speaker method of estimation and system in a kind of mixing of multi-person speech | |
CN105204001A (en) | Sound source positioning method and system | |
CN102664010B (en) | Robust speaker distinguishing method based on multifactor frequency displacement invariant feature | |
CN103456302B (en) | A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight | |
CN107219512A (en) | A kind of sound localization method based on acoustic transfer function | |
Janský et al. | Adaptive blind audio source extraction supervised by dominant speaker identification using x-vectors | |
Pfeifenberger et al. | Eigenvector-based speech mask estimation for multi-channel speech enhancement | |
CN113472390B (en) | Frequency hopping signal parameter estimation method based on deep learning | |
Tong et al. | Classification and recognition of underwater target based on MFCC feature extraction | |
Lv et al. | A permutation algorithm based on dynamic time warping in speech frequency-domain blind source separation | |
Zhao et al. | Sound source localization based on srp-phat spatial spectrum and deep neural network | |
Vargas et al. | On improved training of CNN for acoustic source localisation | |
CN111243600A (en) | Voice spoofing attack detection method based on sound field and field pattern |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |