CN101640043A - Speaker recognition method based on multi-coordinate sequence kernel and system thereof - Google Patents

Speaker recognition method based on multi-coordinate sequence kernel and system thereof Download PDF

Info

Publication number
CN101640043A
CN101640043A CN200910092138A CN200910092138A CN101640043A CN 101640043 A CN101640043 A CN 101640043A CN 200910092138 A CN200910092138 A CN 200910092138A CN 200910092138 A CN200910092138 A CN 200910092138A CN 101640043 A CN101640043 A CN 101640043A
Authority
CN
China
Prior art keywords
sequence
coordinate
training
speaker
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200910092138A
Other languages
Chinese (zh)
Inventor
何亮
邓妍
刘加
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN200910092138A priority Critical patent/CN101640043A/en
Publication of CN101640043A publication Critical patent/CN101640043A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a speaker recognition method based on a multi-coordinate sequence kernel, comprising a training stage and a recognition stage. The method comprises the following steps: in the training stage, preprocessing training voice; extracting a characteristic vector sequence from the preprocessed training voice; selecting an origin of a multi-coordinate system in a characteristic vector space, and mapping the characteristic vector sequence in each coordinate system; selecting an algorithm according to the coordinate system and splicing the vector sequence of each coordinate systeminto a super-vector; determining a super-vector space and a kernel function of a support vector machine (SVM), and training with the SVM algorithm to obtain a trained speaker model; and in the recognition stage, testing the super-vector by the trained model and outputting a decision mark. In the invention, by effective modeling on the voice signal characteristic sequence, the speaker recognitionmethod helps utilize the information contained in high dimensional statistics, reduce computational complexity of an integrated circuit, and improve speaker recognition accuracy and recognition speed.

Description

Method for distinguishing speek person and system based on multi-coordinate sequence kernel
Technical field
The present invention relates to speech recognition, mode identification technology, particularly a kind of method for distinguishing speek person and system of the supporting vector machine model based on multi-coordinate sequence kernel.
Background technology
Speaker Identification is meant utilizes machine, determines the technology of given voice signal speaker ' s identity.According to the difference of identification mission, Speaker Identification is divided into two kinds of speaker verification and speaker's identifications again.The speaker verification judges that whether given voice are from given speaker; Speaker's identification is to utilize given voice, seeks given speaker in test library.Speaker Recognition Technology is mainly used in systems such as security service, human nature service.
As shown in Figure 1, in the prior art based on the basic flow sheet of the Speaker Recognition System of frequency spectrum layer, may further comprise the steps:
Step S101 is converted into voice the feature that is easy to discern.Feature commonly used has Mel frequency cepstral coefficient (MFCC), linear prediction cepstrum coefficient (LPCC) and perception linear prediction (PLP) and their feature of deriving.
Step S102 selects suitable modeling technique that feature is distinguished.Common modeling technique has gauss hybrid models (GMM) and support vector machine (SVM).
Step S103, the result handles to model output, obtains court verdict.
Wherein, the GMM model parameter is simple, and explicit physical meaning has preferable performance under training, the sufficient situation of recognition data.Yet, in actual applications,, therefore restricted the application performance of GMM because speaker's voice length is shorter.SVM seeks the optimal classification face at higher dimensional space under the guidance of structural risk minimization, the small sample training data is had good recognition capability.Recently, the SVM knowwhy is complete day by day, uses to obtain significant progress.
The GMM-SVM recognition system that GMM and SVM are used in combination can comprehensive two modeling techniques advantage.As with the mode of characteristic sequence, adopt the adaptive GMM of speaker's voice the SVM modeling method that the GMM model bank is classified to the vector space mapping.But, the GMM-SVM system does not have to solve following two aspect problems: the 1) information of not utilizing the characteristic sequence high-order statistic to be implied; 2) do not solve SVM input space vector each the dimension " lack of uniformity ".
Summary of the invention
Purpose of the present invention is intended to solve at least one of above-mentioned technological deficiency, particularly solves the defective of present speech recognition technology.
For overcoming above-mentioned technological deficiency, one aspect of the present invention has proposed a kind of method for distinguishing speek person based on multi-coordinate sequence kernel, may further comprise the steps:
Training stage:
Training utterance is carried out pre-service;
From pretreated training utterance, extract feature vector sequence;
Select the multi-coordinate initial point in space of feature vectors,, described feature vector sequence is shone upon at each coordinate system according to the metric relation between described feature vector sequence and each coordinate origin;
According to the coordinate system selection algorithm, the vector sequence of each coordinate system is spliced, be spliced into super vector;
Determine the super vector space, the kernel function of support vector machine SVM, and adopt algorithm of support vector machine to train the model measurement super vector that obtains training;
Cognitive phase:
Recognizing voice is carried out pre-service;
From pretreated voice, extract feature vector sequence;
According to the metric relation between selected each coordinate origin of feature vector sequence and training stage, feature vector sequence is shone upon at each coordinate system;
According to the coordinate system selection algorithm, the vector sequence of each coordinate system is spliced, be spliced into super vector;
Utilize the model measurement super vector that has trained, mark is adjudicated in output, and according to described judgement mark the speaker is discerned.
As one embodiment of the present of invention, adopt a pair of other training patterns to train.
As one embodiment of the present of invention, describedly select the multi-coordinate initial point to comprise in space of feature vectors:
Adopt EM algorithm training gauss hybrid models, and with the gauss hybrid models average as each coordinate origin.
As one embodiment of the present of invention, describedly select the multi-coordinate initial point to comprise in space of feature vectors:
Adopt the VQ algorithm, select the initial point of VQ code book for use as each coordinate system.
The present invention also proposes a kind of Speaker Recognition System based on the multi-coordinate sequence kernel on the other hand, comprises voice pretreatment module, characteristic extracting module, characteristic sequence mapping block, training module and identification module,
Described voice pretreatment module is used for training utterance or recognizing voice are carried out pre-service, the part of carrying out noise reduction, going music etc. and speaker to have nothing to do, output clean speech signal supply characteristic extraction module;
Described characteristic extracting module is used to read in pretreated training utterance or the recognizing voice that described voice pretreatment module provides, and extracts feature, output characteristic sequence;
Described characteristic sequence mapping block is used for according to selected subcoordinate system, and the characteristic sequence that characteristic extracting module is exported becomes super vector;
Described training module is used to utilize the super vector of characteristic sequence mapping block output, selects suitable kernel function, utilizes the SVM training algorithm to train speaker model, and sets up the speaker model storehouse;
Described identification module is used for according to shining upon super vector and the speaker model storehouse that forms for recognizing voice, and mark is adjudicated in output, and according to described judgement mark the speaker is discerned.
As one embodiment of the present of invention, adopt a pair of other training patterns to train.
As one embodiment of the present of invention, described characteristic sequence mapping block adopts EM algorithm training gauss hybrid models, and with the gauss hybrid models average as each coordinate origin.
As one embodiment of the present of invention, described characteristic sequence mapping block adopts the VQ algorithm, selects the initial point of VQ code book as each coordinate system for use.
As one embodiment of the present of invention, also comprise the model memory module, be used to preserve the speaker model storehouse that training module is set up, and offer identification module.
The present invention passes through the effective modeling of phonic signal character sequence, and the information of both having utilized the higher-dimension statistic to contain has reduced the computational complexity on integrated circuit again, has improved the accuracy rate and the recognition speed of Speaker Identification.
Aspect that the present invention adds and advantage part in the following description provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Description of drawings
Above-mentioned and/or additional aspect of the present invention and advantage are from obviously and easily understanding becoming the description of embodiment below in conjunction with accompanying drawing, wherein:
Fig. 1 is based on the basic flow sheet of the Speaker Recognition System of frequency spectrum layer in the prior art;
Fig. 2 is the method for distinguishing speek person process flow diagram based on the supporting vector machine model of multi-coordinate sequence kernel of the embodiment of the invention;
Fig. 3 is the Speaker Recognition System structural drawing based on the multi-coordinate sequence kernel of the embodiment of the invention.
Embodiment
Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings.Should be understood that below by the embodiment that is described with reference to the drawings be exemplary, only be used to explain the present invention, and can not be interpreted as limitation of the present invention.
As shown in Figure 2, method for distinguishing speek person process flow diagram for the embodiment of the invention based on the supporting vector machine model of multi-coordinate sequence kernel, modeling method of the present invention can realize according to the following steps in digital integrated circuit chip that the recognition methods of the embodiment of the invention comprises two stages: training stage and cognitive phase.
Training stage:
Step S201 carries out pre-service to the training utterance data.
Wherein, the training utterance data are carried out pre-service be may further comprise the steps: the training utterance signal is carried out zero-meanization and pre-emphasis, wherein zero-meanization is meant that whole section voice deduct its average, and pre-emphasis is that voice are carried out high-pass filtering, and filter transfer function is H (z)=1-α z -1, 0.95≤α≤1 wherein.Divide frame to voice signal, wherein, frame length 20ms, frame moves 10ms.
Step S202 is from pretreated training utterance extracting data feature.
Wherein, may further comprise the steps from pretreated training utterance extracting data feature:
Step S301 adds Hamming window to described voice signal, and wherein Hamming window function is:
ω H ( n ) = 0.54 - 0.46 cos ( 2 πn N - 1 ) 0 ≤ n ≤ N - 1 1 others .
Step S302 does discrete Fourier transform (DFT) (DFT) to the data that add Hamming window.
X ( ω k ) = Σ n = 0 N - 1 x ( n ) e - j 2 π M nk
Wherein, ω kRepresent frequency, k represents the frequency label, and N is that the DFT conversion is counted.
Step S303, select that M is arranged (m=1,2 ..., the M) bank of filters of individual wave filter, wherein, m triangular form wave filter is as giving a definition:
H m [ k ] = 0 k < f [ m - 1 ] ( k - f [ m - 1 ] ) ( f [ m ] - f [ m - 1 ] ) f [ m - 1 ] &le; k &le; f [ m ] ( f [ m + 1 ] - k ) ( f [ m + 1 ] - f [ m ] ) f [ m ] &le; k &le; f [ m + 1 0 k > f [ m ]
Wherein, &Sigma; m = 1 M H m [ k ] = 1 . F[m wherein] be the frontier point of quarter window, determine by following formula:
f [ m ] = N F s B - 1 ( B ( f l ) + m B ( f h ) - B ( f l ) M + 1 )
Wherein, f lAnd f hBe the low-limit frequency and the highest frequency of given bank of filters, B is the mapping function of frequency to the Mel frequency marking: B (f)=1125 ln (1+ (f/700)), wherein, and B -1Be the mapping function of Mel frequency marking to frequency, B -1(b)=700exp ((b/1125)-1).
Step S304 calculates the logarithm energy of each wave filter output, wherein,
S [ m ] = ln [ &Sigma; k = 0 N - 1 | X &omega; [ k ] | 2 H m [ k ] ] , 0<m≤M。
Step S305, discrete cosine transform, and calculate the MFCC coefficient.Wherein,
c [ n ] = &Sigma; m = 0 M - 1 S [ m ] cos ( &pi;n ( m - 1 / 2 ) / M ) , 0<m≤M,
Q maintains number before getting, and splicing becomes MFCC essential characteristic c=[c 0, c 1..., c Q].
Step S306 asks first order difference feature δ ', second order difference feature δ ".For the j dimensional feature, first order difference is characterized as:
&delta; j &prime; ( n ) = &Sigma; d = 1 D d ( c j ( n + d ) - c j ( n - d ) ) &Sigma; d = 1 D d 2 , j=1,2,…,N-1
Wherein, D is the size of difference window, generally speaking, and D=2.Subsequently with first order difference feature δ ' calculating second order difference feature δ ":
&delta; j &prime; &prime; ( n ) = &Sigma; d = 1 D d ( &delta; j &prime; ( n + d ) - &delta; j &prime; ( n - d ) ) &Sigma; d = 1 D d 2 , j=1,2,…,N-1
With primitive character, the first order difference feature, the splicing of second order difference feature constitutes Speaker Identification eigenvector y (n),
y(n)=[c 1(n),c 2(n),…,c Q(n),δ′ 1(n),δ′ 2(n),…,δ′ Q(n),δ″ 1(n),δ″ 2(n),…,δ″ Q(n)]。
Step S203 selects the multi-coordinate initial point and extracts speaker's super vector.
Wherein, select multi-coordinate initial point and extract speaker's super vector and may further comprise the steps:
Step S401 chooses multi-coordinate origin sequence o={o 1, o 2..., o C, wherein, C is the coordinate system number, choose algorithm and can choose the GMM average of using EM algorithm training gained, or the code book of VQ algorithm acquisition.
Step S402 selects eigenvector y (n) and origin o jTolerance f[y (n), o c], 1≤c≤C, and calculated characteristics vector y (n) is in the occupation rate of each subcoordinate system:
&gamma; [ y ( n ) | o j ] = f [ y ( n ) , o j ] &Sigma; c = 1 C f [ y ( n ) , o c ] .
Step S403 selects the feature expansion function g[y (n) of each coordinate system, c c], integrating step S402 calculates the occupation rate of gained, and eigenvector y (n) is mapped to super vector:
&upsi; ( n ) = [ &gamma; [ y ( n ) | o 1 ] g [ y ( n ) , o 1 ] , &gamma; [ y ( n ) | o 2 ] g [ y ( n ) , o 2 ] , . . . , &gamma; [ y ( n ) | o C ) g ( y ( n ) , o C ] .
Step S404, the super vector sequence υ (n) that the characteristic sequence mapping forms was averaged the time, obtained the super vector of this section voice correspondence v = 1 T &Sigma; n = 1 T v ( n ) .
Step S405, the weight vectors ω or the projector space V of calculating speaker super vector.Wherein, a kind of computing method of the ω of weight vectors are as follows: &omega; i = &Sigma; id | | &upsi; i id | | 2 , Wherein, subscript id represents speaker's index, and i represents each dimension of weight vectors, υ in this step i IdRepresent the value of the i dimension of id the pairing super vector of speaker.
Step S204 by algorithm of support vector machine, sets up speaker model.
Wherein, by algorithm of support vector machine, set up speaker model and may further comprise the steps:
Step S501, the support vector machine training algorithm.
Order input sample set is (v p, θ p), p=[1,2 ..., P], θ p∈+1 ,-1}, usually, θ p=+1 sample is called positive sample, θ p=-1 sample is called negative sample.The SVM algorithm is sought optimal classification face ω, makes that the distance between the positive and negative sample set is maximum.Optimal classification face ω gets by finding the solution following majorized function:
min L = 1 2 | | &omega; | | 2 + C ( &Sigma; p = 1 P &xi; p ) ,
Wherein, ‖ ω ‖ 2And distance is inversely proportional to ξ between the positive negative sample pBe the slack variable of introducing under the linear inseparable situation of sample, C is the wrong punishment degree of dividing sample of control.Following formula is found the solution at dual space, and majorized function becomes:
max &Sigma; p = 1 P &alpha; p - 1 2 &Sigma; q = 1 P &alpha; p &alpha; q &theta; p &theta; q K ( &upsi; p , &upsi; q ) ,
Wherein, &Sigma; p = 1 P &theta; p &alpha; p = 0 ,
α p≥0,p=1,2,…,P,
Wherein, K (υ p, υ q) be υ pAnd υ qKernel function.If optimum solution α *, then the optimal classification face is the linear combination of training sample:
&omega; * = &Sigma; p = 1 P &alpha; p * &theta; p &upsi; p ,
The optimal classification function: f ( &upsi; ) = &Sigma; p = 1 P &alpha; p * &theta; p K ( &upsi; p , &upsi; ) + b * .
Step S502, the kernel function of correction step S501.
In step S501, use K (υ p, υ q) representative vector υ pAnd υ qBetween tolerance, in the present invention it is modified to K (υ p, υ q, ζ), wherein, ζ is used to revise υ pAnd υ qBetween tolerance.If select weight vectors ω for use, then ζ refers in particular to ω, K (υ p, υ q, ζ)=K (υ pω, υ qω), υ wherein pω represents υ pInner product with ω.The method that multiple choices weight vectors ω is arranged, one is used sample to be &omega; i = &Sigma; id ( &upsi; i id &CenterDot; &upsi; i id ) , Wherein id represents speaker's label, and i represents the i dimension of super vector; If select projection subspace V for use, then ζ refers in particular to V, K (υ p, υ q, ζ)=K (V υ p, V υ q), V υ wherein pRepresent υ pTo subspace V projection, can adopt subspace analysis method estimated projection subspace V.
Step S503 adopts a pair of other training patterns, adopts the described SVM training algorithm of step S501, the kernel function that adopts step S502 to revise, and the speaker model of training gained is { ω *, b *.
Step S205 repeats above step, sets up the speaker model storehouse.
Cognitive phase:
Step S206 discerns the speaker.
At first, extract the SVM input super vector (but concrete refer step S201-203 does not repeat them here) of tested speech according to above-mentioned calculation step.Then, utilize step S503 training and speaker model { ω *, b *, the SVM input super vector of tested speech is given a mark.If use method of weighting, marking formula f (υ t)=ω *tω)+b *If use projecting method, marking formula f (υ t)=ω *(V υ t)+b *If obatained score, is judged the tested speech and the voice of training speaker model greater than certain threshold value and is come from same speaker; If obatained score, judges that the tested speech and the voice of training speaker model are not to come from same speaker smaller or equal to certain threshold value.
As shown in Figure 3, be the Speaker Recognition System structural drawing based on the multi-coordinate sequence kernel of the embodiment of the invention, this system comprises voice pretreatment module 100, characteristic extracting module 200, characteristic sequence mapping block 300, training module 400 and identification module 500.Voice pretreatment module 100 is used for training utterance or recognizing voice are carried out pre-service, the part of carrying out noise reduction, going music etc. and speaker to have nothing to do, output clean speech signal supply characteristic extraction module 200.Characteristic extracting module 200 is used to read in pretreated training utterance or the recognizing voice that voice pretreatment module 100 provides, and extracts feature, output characteristic sequence.Characteristic sequence mapping block 300 is used for according to selected subcoordinate system, and the characteristic sequence that characteristic extracting module 200 is exported becomes super vector.Training module 400 is used to utilize the super vector of characteristic sequence mapping block 300 outputs, selects suitable kernel function, utilizes SVM training algorithm training speaker model, and sets up the speaker model storehouse.Identification module 500 is used for according to shining upon super vector and the speaker model storehouse that forms for recognizing voice, and mark is adjudicated in output, and according to the judgement mark speaker is discerned.
As one embodiment of the present of invention, can adopt a pair of other training patterns to train.
As one embodiment of the present of invention, characteristic sequence mapping block 300 can adopt EM algorithm training gauss hybrid models, and with the gauss hybrid models average as each coordinate origin.
As an alternate embodiments of the present invention, characteristic sequence mapping block 300 can adopt the VQ algorithm, selects the initial point of VQ code book as each coordinate system for use.
As an alternate embodiments of the present invention, this system also comprises model memory module 600, is used to preserve the speaker model storehouse that training module 400 is set up, and offers identification module 500.
The present invention passes through the effective modeling of phonic signal character sequence, and the information of both having utilized the higher-dimension statistic to contain has reduced the computational complexity on integrated circuit again, has improved the accuracy rate and the recognition speed of Speaker Identification.
Although illustrated and described embodiments of the invention, for the ordinary skill in the art, be appreciated that without departing from the principles and spirit of the present invention and can carry out multiple variation, modification, replacement and modification that scope of the present invention is by claims and be equal to and limit to these embodiment.

Claims (9)

1, a kind of method for distinguishing speek person based on multi-coordinate sequence kernel is characterized in that, may further comprise the steps:
Training stage:
Training utterance is carried out pre-service;
From pretreated training utterance, extract feature vector sequence;
Select the multi-coordinate initial point in space of feature vectors,, described feature vector sequence is shone upon at each coordinate system according to the metric relation between described feature vector sequence and each coordinate origin;
According to the coordinate system selection algorithm, the vector sequence of each coordinate system is spliced, be spliced into super vector;
Determine the super vector space, the kernel function of support vector machine SVM, and adopt algorithm of support vector machine to train the model measurement super vector that obtains training;
Cognitive phase:
Recognizing voice is carried out pre-service;
From pretreated voice, extract feature vector sequence;
According to the metric relation between selected each coordinate origin of feature vector sequence and training stage, feature vector sequence is shone upon at each coordinate system;
According to the coordinate system selection algorithm, the vector sequence of each coordinate system is spliced, be spliced into super vector;
Utilize the model measurement super vector that has trained, mark is adjudicated in output, and according to described judgement mark the speaker is discerned.
2, the method for distinguishing speek person based on multi-coordinate sequence kernel as claimed in claim 1 is characterized in that, adopts a pair of other training patterns to train.
3, the method for distinguishing speek person based on multi-coordinate sequence kernel as claimed in claim 1 is characterized in that, describedly selects the multi-coordinate initial point to comprise in space of feature vectors:
Adopt EM algorithm training gauss hybrid models, and with the gauss hybrid models average as each coordinate origin.
4, the method for distinguishing speek person based on multi-coordinate sequence kernel as claimed in claim 1 is characterized in that, describedly selects the multi-coordinate initial point to comprise in space of feature vectors:
Adopt the VQ algorithm, select the initial point of VQ code book for use as each coordinate system.
5, a kind of Speaker Recognition System based on the multi-coordinate sequence kernel is characterized in that, comprises voice pretreatment module, characteristic extracting module, characteristic sequence mapping block, training module and identification module,
Described voice pretreatment module is used for training utterance or recognizing voice are carried out pre-service, the part of carrying out noise reduction, going music etc. and speaker to have nothing to do, output clean speech signal supply characteristic extraction module;
Described characteristic extracting module is used to read in pretreated training utterance or the recognizing voice that described voice pretreatment module provides, and extracts feature, output characteristic sequence;
Described characteristic sequence mapping block is used for according to selected subcoordinate system, and the characteristic sequence that characteristic extracting module is exported becomes super vector;
Described training module is used to utilize the super vector of characteristic sequence mapping block output, selects suitable kernel function, utilizes the SVM training algorithm to train speaker model, and sets up the speaker model storehouse;
Described identification module is used for according to shining upon super vector and the speaker model storehouse that forms for recognizing voice, and mark is adjudicated in output, and according to described judgement mark the speaker is discerned.
6, the Speaker Recognition System based on the multi-coordinate sequence kernel as claimed in claim 5 is characterized in that, adopts a pair of other training patterns to train.
7, the Speaker Recognition System based on the multi-coordinate sequence kernel as claimed in claim 5 is characterized in that, described characteristic sequence mapping block adopts EM algorithm training gauss hybrid models, and with the gauss hybrid models average as each coordinate origin.
8, the Speaker Recognition System based on the multi-coordinate sequence kernel as claimed in claim 5 is characterized in that, described characteristic sequence mapping block adopts the VQ algorithm, selects the initial point of VQ code book as each coordinate system for use.
9, the Speaker Recognition System based on the multi-coordinate sequence kernel as claimed in claim 5 is characterized in that, also comprises the model memory module, is used to preserve the speaker model storehouse that training module is set up, and offers identification module.
CN200910092138A 2009-09-01 2009-09-01 Speaker recognition method based on multi-coordinate sequence kernel and system thereof Pending CN101640043A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910092138A CN101640043A (en) 2009-09-01 2009-09-01 Speaker recognition method based on multi-coordinate sequence kernel and system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910092138A CN101640043A (en) 2009-09-01 2009-09-01 Speaker recognition method based on multi-coordinate sequence kernel and system thereof

Publications (1)

Publication Number Publication Date
CN101640043A true CN101640043A (en) 2010-02-03

Family

ID=41614993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910092138A Pending CN101640043A (en) 2009-09-01 2009-09-01 Speaker recognition method based on multi-coordinate sequence kernel and system thereof

Country Status (1)

Country Link
CN (1) CN101640043A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102543075A (en) * 2012-01-12 2012-07-04 东北石油大学 Speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on virtual instrument technology
CN103186527A (en) * 2011-12-27 2013-07-03 北京百度网讯科技有限公司 System for building music classification model, system for recommending music and corresponding method
CN103714818A (en) * 2013-12-12 2014-04-09 清华大学 Speaker recognition method based on noise shielding nucleus
CN106373576A (en) * 2016-09-07 2017-02-01 Tcl集团股份有限公司 Speaker confirmation method based on VQ and SVM algorithms, and system thereof
CN106448681A (en) * 2016-09-12 2017-02-22 南京邮电大学 Super-vector speaker recognition method
CN106888392A (en) * 2017-02-14 2017-06-23 广东九联科技股份有限公司 A kind of Set Top Box automatic translation system and method
CN106910394A (en) * 2017-02-21 2017-06-30 成都景中教育软件有限公司 A kind of dynamic geometry multi-coordinate implementation method
CN107507611A (en) * 2017-08-31 2017-12-22 苏州大学 A kind of method and device of Classification of Speech identification
CN107580722A (en) * 2015-05-27 2018-01-12 英特尔公司 Gauss hybrid models accelerator with the direct memory access (DMA) engine corresponding to each data flow
WO2018227381A1 (en) * 2017-06-13 2018-12-20 Beijing Didi Infinity Technology And Development Co., Ltd. International patent application for method, apparatus and system for speaker verification
CN109493847A (en) * 2018-12-14 2019-03-19 广州玛网络科技有限公司 Sound recognition system and voice recognition device
CN113779191A (en) * 2021-07-23 2021-12-10 中国人民解放军61623部队 User identification method based on user joint information super vector and joint information model

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186527A (en) * 2011-12-27 2013-07-03 北京百度网讯科技有限公司 System for building music classification model, system for recommending music and corresponding method
CN102543075A (en) * 2012-01-12 2012-07-04 东北石油大学 Speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on virtual instrument technology
CN103714818A (en) * 2013-12-12 2014-04-09 清华大学 Speaker recognition method based on noise shielding nucleus
CN103714818B (en) * 2013-12-12 2016-06-22 清华大学 Method for distinguishing speek person based on noise shielding core
CN107580722A (en) * 2015-05-27 2018-01-12 英特尔公司 Gauss hybrid models accelerator with the direct memory access (DMA) engine corresponding to each data flow
CN107580722B (en) * 2015-05-27 2022-01-14 英特尔公司 Gaussian mixture model accelerator with direct memory access engines corresponding to respective data streams
CN106373576A (en) * 2016-09-07 2017-02-01 Tcl集团股份有限公司 Speaker confirmation method based on VQ and SVM algorithms, and system thereof
CN106373576B (en) * 2016-09-07 2020-07-21 Tcl科技集团股份有限公司 Speaker confirmation method and system based on VQ and SVM algorithms
CN106448681A (en) * 2016-09-12 2017-02-22 南京邮电大学 Super-vector speaker recognition method
CN106888392A (en) * 2017-02-14 2017-06-23 广东九联科技股份有限公司 A kind of Set Top Box automatic translation system and method
CN106910394B (en) * 2017-02-21 2019-12-10 成都景中教育软件有限公司 Dynamic geometric multi-coordinate system implementation method
CN106910394A (en) * 2017-02-21 2017-06-30 成都景中教育软件有限公司 A kind of dynamic geometry multi-coordinate implementation method
US10276167B2 (en) 2017-06-13 2019-04-30 Beijing Didi Infinity Technology And Development Co., Ltd. Method, apparatus and system for speaker verification
TWI719304B (en) * 2017-06-13 2021-02-21 大陸商北京嘀嘀無限科技發展有限公司 Method, apparatus and system for speaker verification
US10937430B2 (en) 2017-06-13 2021-03-02 Beijing Didi Infinity Technology And Development Co., Ltd. Method, apparatus and system for speaker verification
WO2018227381A1 (en) * 2017-06-13 2018-12-20 Beijing Didi Infinity Technology And Development Co., Ltd. International patent application for method, apparatus and system for speaker verification
CN107507611A (en) * 2017-08-31 2017-12-22 苏州大学 A kind of method and device of Classification of Speech identification
CN107507611B (en) * 2017-08-31 2021-08-24 苏州大学 Voice classification recognition method and device
CN109493847B (en) * 2018-12-14 2019-10-18 广州一玛网络科技有限公司 Sound recognition system and voice recognition device
CN109493847A (en) * 2018-12-14 2019-03-19 广州玛网络科技有限公司 Sound recognition system and voice recognition device
CN113779191A (en) * 2021-07-23 2021-12-10 中国人民解放军61623部队 User identification method based on user joint information super vector and joint information model
CN113779191B (en) * 2021-07-23 2024-03-05 中国人民解放军61623部队 User identification method based on user joint information supervector and joint information model

Similar Documents

Publication Publication Date Title
CN101640043A (en) Speaker recognition method based on multi-coordinate sequence kernel and system thereof
CN109817246B (en) Emotion recognition model training method, emotion recognition device, emotion recognition equipment and storage medium
CN101833951B (en) Multi-background modeling method for speaker recognition
CN106683680B (en) Speaker recognition method and device, computer equipment and computer readable medium
CN102737633B (en) Method and device for recognizing speaker based on tensor subspace analysis
CN110457432B (en) Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium
CN107393554B (en) Feature extraction method for fusion inter-class standard deviation in sound scene classification
CN101894548B (en) Modeling method and modeling device for language identification
CN107610707A (en) A kind of method for recognizing sound-groove and device
US20160111112A1 (en) Speaker change detection device and speaker change detection method
CN105261367B (en) A kind of method for distinguishing speek person
CN111081279A (en) Voice emotion fluctuation analysis method and device
CN107731233A (en) A kind of method for recognizing sound-groove based on RNN
CN103794207A (en) Dual-mode voice identity recognition method
CN109767776B (en) Deception voice detection method based on dense neural network
CN113223536B (en) Voiceprint recognition method and device and terminal equipment
CN110164453A (en) A kind of method for recognizing sound-groove, terminal, server and the storage medium of multi-model fusion
CN105469784A (en) Generation method for probabilistic linear discriminant analysis (PLDA) model and speaker clustering method and system
CN102789779A (en) Speech recognition system and recognition method thereof
CN103489445A (en) Method and device for recognizing human voices in audio
CN108269575A (en) Update audio recognition method, terminal installation and the storage medium of voice print database
CN108962231A (en) A kind of method of speech classification, device, server and storage medium
CN110570870A (en) Text-independent voiceprint recognition method, device and equipment
CN107274890A (en) Vocal print composes extracting method and device
CN104464738B (en) A kind of method for recognizing sound-groove towards Intelligent mobile equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Open date: 20100203