CN103985384B - Text-independent speaker identification device based on random projection histogram model - Google Patents

Text-independent speaker identification device based on random projection histogram model Download PDF

Info

Publication number
CN103985384B
CN103985384B CN201410232526.2A CN201410232526A CN103985384B CN 103985384 B CN103985384 B CN 103985384B CN 201410232526 A CN201410232526 A CN 201410232526A CN 103985384 B CN103985384 B CN 103985384B
Authority
CN
China
Prior art keywords
feature
model
speaker
histogram
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410232526.2A
Other languages
Chinese (zh)
Other versions
CN103985384A (en
Inventor
于泓
马占宇
郭军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201410232526.2A priority Critical patent/CN103985384B/en
Publication of CN103985384A publication Critical patent/CN103985384A/en
Application granted granted Critical
Publication of CN103985384B publication Critical patent/CN103985384B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The embodiment of the invention discloses a text-independent speaker identification device based on a random projection histogram model. The method includes the three steps of characteristic extraction, model training and identification; wherein in the characteristic extraction step, non-normalized incremental line spectrum frequency characteristics are converted into normalized differential line spectrum frequency characteristics, the differential line spectrum frequency characteristics of consecutive frames are combined to generate composite differential line spectrum frequency characteristics for expressing the dynamic performance of signals; in the model training process, random projection parameters are designed according to the distribution characteristics of the composite differential line spectrum frequency characteristics, random projection is conducted on training data sets, and probability models are established by calculating an average histogram; in the identification process, characteristic extraction is conducted on voice signals of a person to be identified according to the first step, then the extracted characteristics are input into the models obtained in the second step, the likelihood value of each probability model is calculated, the maximum likelihood value is obtained, and then the serial number of a speaker is identified. By the adoption of the method, the text-independent speaker identification rate can be increased, and the method has great practical value.

Description

A kind of text based on Random Maps histogram model has nothing to do speaker detection device
Technical field
The invention belongs to field of audio processing to describe emphatically a kind of text based on Random Maps histogram model and to have nothing to do speaker detection device.
Background technology
Speaker Identification is that computing machine utilizes can reflecting of comprising in sound bite the information of speaker characteristic is to identify the technology of speaker ' s identity, and this technology is in information security, and the fields such as long-distance identity-certifying have very important Study and appliance and are worth.
According to the difference identifying object, speaker detection can be divided into text dependent and irrelevant two classes of text.The wherein speaker detection technology of text dependent, the keyword that requirement utilizes speaker to pronounce and crucial sentence are as training sample, and utilize identical content pronunciation to identify when distinguishing, this system uses inconvenience and the easy stolen record of key content.Speaker's recognition techniques that text is irrelevant, content of speaking is not specified when training and when recognizing, identify liking voice signal freely, need the characteristic sum method finding the information that can characterize speaker in voice signal freely, therefore speaker model relative difficulty is set up, but this technology safety easy to use.Described in the invention is the identification device that text has nothing to do.
Speaker detection usually comprises 3 ingredients (1) and extracts the feature can expressing speaker's feature from training utterance data centralization; (2) for speaker trains a model that can reflect its phonetic feature regularity of distribution; (3) agreeing with degree to carry out and make final decision by the feature calculating input voice and the training pattern to have obtained.
Conventional speaker detection system adopts MFCC (Mel-frequency Cepstral Coefficients in characteristic extraction part, mel cepstrum coefficients) or LSF (Line Spectral Frequencies, line spectral frequencies) as essential characteristic, in model training part, adopt GMM (GaussianMixture Model, gauss hybrid models) or statistic histogram as probability model.
Traditional feature is easy to be subject to noise and multidate information beyond expression of words, the feature that GMM model is only suitable for for distribution range is wider carries out modeling, although statistic histogram model can carry out modeling to the characteristic signal of Arbitrary distribution, but when lack of training samples or characteristic dimension too high time, there is a large amount of zero points in the model set up, cause result discontinuous.The method for distinguishing speek person that text described in the invention has nothing to do can solve the above problems greatly.
Summary of the invention
Improve the irrelevant speaker detection rate of text in order to the defect that solves existing for above-mentioned technology, the invention provides a kind of text based on composite difference line spectral frequencies feature and stochastic transformation histogram model and to have nothing to do speaker detection method, comprise the following steps:
One. characteristic extraction step:
A, differential lines spectral frequency characteristic extraction step: it is that K+1 ties up normalized differential lines spectral frequency feature that the K obtained from speech linear predictive coding model is tieed up the non-normalized line spectral frequencies eigentransformation increased progressively.
The step of B, generation composite difference line spectral frequencies feature: 3 adjacent frame difference line spectral frequencies features are carried out combination producing composite difference line spectral frequencies feature with the dynamic perfromance of expression signal.
Two. Random Maps histogram model training step: the training utterance for each speaker extracts T frame composite difference line spectral frequencies feature as 1 group of training dataset according to the description of step one.Adopt the method for Random Maps to carry out H stochastic transformation to this training dataset and obtain H group training characteristics.Statistics with histogram is carried out to every stack features, and utilizes the probability model of average histogram as this speaker of H group training characteristics.Final each speaker can train and obtain an one's own model.
Three. differentiate coupling step: after inputting one section of voice, the method of step one is adopted to generate 1 stack features and train in the model of each speaker obtained by this feature input step two, calculate the likelihood value of this stack features for each model, get wherein maximum likelihood value to confirm the numbering of speaker.
According to a kind of text-independent speaker detection method of an embodiment of the invention, the normalized differential lines spectral frequency feature extraction mode described by steps A is as follows:
Wherein [x 1, x 2..., x k] tfor the K before conversion ties up line spectral frequencies feature, △ x is the normalization differential lines spectral frequency feature of K+1 dimension after conversion
According to a kind of text-independent speaker detection method of an embodiment of the invention, the concrete generative process of the composite difference line spectral frequencies feature described in step B is as follows:
Suppose that the differential lines spectral frequency of t frame is characterized as △ x (t), then the composite difference line spectral frequencies of t frame is characterized as:
Sup△x(t)=[△x(t-τ) T,△x(t) T,△x(t+τ) T] T
Wherein τ is positive integer, gets τ=1 in the present invention.
According to a kind of text-independent speaker detection method of an embodiment of the invention, the model training method described in step 2 is as follows:
1) carry out Random Maps conversion to the composite difference line spectral frequencies feature of dimension D=K+1, transformation for mula is: y=Ax+b, and wherein A is the Random-Rotation scaled matrix of D × D dimension, and b is the random translation vector that D × 1 is tieed up.
2) random translation vector b=[b 1, b 2..., b i... b k+1] tin each element value be equally distributed stochastic variable between 0 ~ 1.
3) product that scaled matrix A is Random-Rotation unit matrix U and random convergent-divergent diagonal matrix Λ is rotated
A=ΛU
|U|=1
4) design process of Random-Rotation unit matrix U is as follows:
1. generate the stochastic matrix V of D × D dimension, each element in V meets being uniformly distributed between 0 ~ 1
2. carry out QR to matrix V and decompose V=QR, wherein Q is unit orthogonal matrix
3. by judging whether the determinant of Q equals 1, comes element q 11carry out revising ensure Q the value of determinant be 1
5) design process of random convergent-divergent diagonal matrix Λ is as follows:
The element of a jth dimension of composite difference line spectral frequencies feature meets Beta distribution, and its probability density function is
Beta ( x j ; α j , β j ) = Γ ( α j + β j ) Γ ( α j ) Γ ( β j ) x j α j - 1 ( 1 - x j ) β j - 1
If
R ( x j ; α j , β j ) = ∫ 0 1 B eta 2 ( x j ; α j , β j ) dx j
h j = R ( x j ; α j , β j ) - 1 2 ( 6 Π i = 1 D R ( x i ; α i , β i ) 1 2 ) 1 2 + D N - 1 2 + D
Wherein D is the dimension of composite difference line spectral frequencies feature, and N is the number of training characteristics.
Then in diagonal matrix Λ, the value of diagonal entry is
log ( λ j ) = Uniform [ θ min + log ( h j - 1 ) , θ max + log ( h j - 1 ) ]
Wherein θ min=0, θ max=2 is relaxation parameter.
6) after Random Maps, build probability model mode as follows for training data:
HD ( x ) = π ZeroDens p ( x | ZeroDens ) + 1 - π ZeroDens H Σ i H p ( x | A i , b i ) ,
First half is that probability estimate is carried out in the position at zero point in histogram, wherein for the probability that zero point in statistic histogram occurs.The prior probability that p (x|ZeroDens) is null position, priori is here compound Di Li Cray process.The proper vector of input is:
x=Sup△x(t)=[△x(t-τ) T,△x(t) T,△x(t+τ) T] T=[△x 1,△x 2,△x 3] T
p ( x | ZeroDens ) = Π n = 1 3 Γ ( Σ k = 1 K + 1 α n , k ) Π k = 1 K + 1 Γ ( α n , k ) Π k = 1 K + 1 ( Δx n , k ) α n , k - 1
Latter half is average statistics histogram probability estimate, and wherein H is the number of times carrying out Random Maps, 1 group of training dataset containing N number of training data h group training dataset is transformed to after H Random Maps
Wherein p (x|A i, b i) be the histogram probability estimate of input test data x in i-th conversion, be defined as follows:
p ( x | A i , b i ) = 1 Hv Σ j = 1 N II ( round ( y j ) , round ( y ) )
y=A ix+b i
v=|A i| -1
According to a kind of text-independent speaker detection method of an embodiment of the invention, the discriminating matching process implementation method described in step 3 is: by the characteristic data set of input be transported to and be directed to each speaker and train in probability model, calculate likelihood value.
L j ( x ~ ) = Σ i = 1 N log ( HT j ( x i ) )
Wherein for test feature collection about the likelihood value of a jth speaker model, confirm the numbering of speaker by getting wherein maximum likelihood value.
Beneficial effect of the present invention is, in terms of existing technologies, the present invention's application composite difference line spectral frequencies feature is extracted as the feature of speaker, use Random Maps histogram training probability model, provide again complete implementation system for application, experiment show high efficiency of the present invention, has a very strong practicality.
Below in conjunction with accompanying drawing, specific embodiments of the present invention is described in detail.
Fig. 1 is process flow diagram of the present invention, and wherein solid line represents training part run trend, and dotted line represents differentiates part run trend, comprises the following steps:
The first step: characteristic extraction step, extracts composite difference line spectral frequencies feature from speaker's voice sequence of training.
Step S1: be differential lines spectral frequency feature by line spectral frequencies Feature Conversion;
Step S2: combination is carried out to the differential lines spectral frequency feature obtained in S1 and obtains composite difference line spectral frequencies feature.
Second step: training probability model
Step S3: the distribution building Random Maps histogram model matching composite difference line spectral frequencies feature, realizes details as shown in Figure 2.
3rd step: discrimination process
Step S1 in the first step and step S2 is repeated to speaker's voice sequence to be identified and generates composite difference line spectral frequencies characteristic test collection, input step S3 train the model obtained.
Step S4: calculate the likelihood value for each probability model, obtains maximum likelihood value, confirms speaker's numbering.
To be specifically described each step below:
Step S1 realizes the extraction of differential lines spectral frequency feature, and it is that K+1 ties up normalized differential lines spectral frequency feature that the K obtained from speech linear predictive coding model is tieed up the non-normalized line spectral frequencies eigentransformation increased progressively, and its implementation is as follows:
Wherein [x 1, x 2..., x k] tfor the K before conversion ties up line spectral frequencies feature, △ x is the normalization differential lines spectral frequency feature of K+1 dimension after conversion.
3 adjacent frame difference line spectral frequencies features are carried out combination producing composite difference line spectral frequencies feature with the dynamic perfromance of expression signal by step S2.Suppose that the differential lines spectral frequency of t frame is characterized as △ x (t), then the composite difference line spectral frequencies of t frame is characterized as:
Sup△x(t)=[△x(t-τ) T,△x(t) T,△x(t+τ) T] T
Wherein τ is positive integer, gets τ=1 in the present invention.
Step S3: the distribution building Random Maps histogram model matching composite difference line spectral frequencies feature, concrete realization flow as shown in Figure 2:
1) prior probability located zero point in histogram is obtained according to the overall distribution of composite difference line spectral frequencies feature.
If the conjunction composite difference line spectral frequencies proper vector of input is:
x=Sup△x(t)=[△x(t-τ) T,△x(t) T,△x(t+τ) T] T=[△x 1,△x 2,△x 3] T
Then being distributed as of entirety of composite difference line spectral frequencies feature:
p ( x | ZeroDens ) = Π n = 1 3 Γ ( Σ k = 1 K + 1 α n , k ) Π k = 1 K + 1 Γ ( α n , k ) Π k = 1 K + 1 ( Δx n , k ) α n , k - 1
In histogram, the prior probability of appearance at zero point is
π ZeroDens = 1 N + 1
Then in histogram, the prior distribution of null position is:
π ZeroDensp(x|ZeroDens)
2) the conjunction composite difference line spectral frequencies proper vector of input is carried out to Random Maps structure and calculated average histogram.
The formula composite difference line spectral frequencies feature of dimension D=K+1 being carried out to Random Maps conversion is y=Ax+b, and wherein A is the Random-Rotation scaled matrix of D × D dimension, and b is the random translation vector that D × 1 is tieed up.
Random translation vector b=[b 1, b 2..., b i... b k+1] tin each element value be equally distributed stochastic variable between 0 ~ 1.
Random-Rotation scaled matrix A can be analyzed to the product of Random-Rotation unit matrix U and random convergent-divergent diagonal matrix Λ
A=ΛU
|U|=1
Wherein the design process of Random-Rotation unit matrix U is as follows:
1. generate the stochastic matrix V of D × D dimension, each element in V meets being uniformly distributed between 0 ~ 1
2. carry out QR to matrix V and decompose V=QR, wherein Q is unit orthogonal matrix
3. by judging whether the determinant of Q equals 1, comes element q 11the value carrying out revising the determinant ensureing Q is 1
The design process of random convergent-divergent diagonal matrix Λ is as follows:
1. the distribution of each element in composite difference line spectral frequencies proper vector is calculated.The element of jth dimension meets Beta distribution, and its probability density function is
Beta ( x j ; α j , β j ) = Γ ( α j + β j ) Γ ( α j ) Γ ( β j ) x j α j - 1 ( 1 - x j ) β j - 1
2. the wide h of histogrammic optimum bin in each dimension is calculated
R ( x j ; α j , β j ) = ∫ 0 1 B eta 2 ( x j ; α j , β j ) dx j
h j = R ( x j ; α j , β j ) - 1 2 ( 6 Π i = 1 D R ( x i ; α i , β i ) 1 2 ) 1 2 + D N - 1 2 + D
Wherein D is the dimension of composite difference line spectral frequencies feature, and N is the number of training characteristics.
3. the value λ of diagonal entry in diagonal matrix Λ is generated according to the wide h of optimum bin
log ( λ j ) = Uniform [ θ min + log ( h j - 1 ) , θ max + log ( h j - 1 ) ] , Wherein θ min=0, θ max=2 is relaxation parameter.
After obtaining stochastic transformation parameter A, b according to above-mentioned flow process, H stochastic transformation is carried out to training characteristics data set, 1 group of training dataset containing N number of training sample h group training dataset is generated after Random Maps wherein the average histogram of H group training dataset is:
1 - π ZeroDens H Σ i H p ( x | A i , b i )
Wherein p (x|A i, b i) be the histogram probability estimate of input test data x in i-th conversion, be defined as follows:
p ( x | A i , b i ) = 1 Hv Σ j = 1 N II ( round ( y j ) , round ( y ) )
y=A ix+b i
v=|A i| -1
Therefore the final Random Maps histogram figure probability estimate model obtained is:
HD ( x ) = π ZeroDens p ( x | ZeroDens ) + 1 - π ZeroDens H Σ i H p ( x | A i , b i )
Discriminating matching process implementation method described in step S4 is:
By the characteristic data set of input be transported to and be directed to each speaker and train in probability model, calculate likelihood value.
L j ( x ~ ) = Σ i = 1 N log ( HT j ( x i ) )
Wherein for test feature collection about the likelihood value of a jth speaker model, confirm the numbering of speaker by getting wherein maximum likelihood value.
Below by reference to the accompanying drawings the embodiment of the proposed Speaker Identification scheme had nothing to do based on the text of composite difference line spectral frequencies feature and stochastic transformation histogram model is set forth.By the description of above embodiment, one of ordinary skill in the art clearly can recognize that the mode that the present invention can add required general hardware platform by software realizes.Based on such understanding, technical scheme of the present invention can embody the part that prior art contributes in essence in other words in form of a computer software product, this software product is stored in a storage medium, comprises some instructions and performs method described in each embodiment of the present invention in order to make one or more computer equipment.
According to thought of the present invention, all will change in specific embodiments and applications.In sum, this description should not be construed as limitation of the present invention.
Above-described embodiment of the present invention, does not form the restriction to invention protection domain.Any amendment done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims (5)

1. to have nothing to do a speaker detection method based on the text of Random Maps histogram model, it is characterized in that, comprise the following steps:
One. characteristic extraction step:
A, the feature extraction of differential lines spectral frequency: it is that K+1 ties up normalized differential lines spectral frequency feature that the K obtained from speech linear predictive coding model is tieed up the non-normalized line spectral frequencies eigentransformation increased progressively;
B, generation composite difference line spectral frequencies feature: 3 adjacent frame difference line spectral frequencies features are carried out combination producing composite difference line spectral frequencies feature with the dynamic perfromance of expression signal;
Two. Random Maps histogram model training step: the training utterance for each speaker extracts T frame composite difference line spectral frequencies feature as 1 group of training dataset according to the description of step one; Adopt the method for Random Maps to carry out H stochastic transformation to this training dataset and obtain H group training characteristics; Stochastic transformation mode is y=AX+b, and wherein A is Random-Rotation scaled matrix, and b is random translation vector; Each element in b, should meet being uniformly distributed between 0 ~ 1; A is the product of unit orthogonal matrices U and diagonal matrix Λ; Whether the equally distributed square formation V that U is met between 0 ~ 1 by an all elements generates, and carries out QR decomposition to V, and be 1 carry out correction to its top left hand element and obtain U according to decomposing the Q determinant of a matrix value that obtains; The diagonal entry value of Λ meets θ min+ log (h j -1) and θ max+ log (h j -1) between be uniformly distributed, wherein θ min=0, θ max=2, h jfor the histogrammic best bin of training characteristics jth dimension is wide, this numerical value is decided by the regularity of distribution of training data; Statistics with histogram is carried out to every stack features, and utilizes the probability model of average histogram as this speaker of H group training characteristics; Final each speaker can train and obtain an one's own model;
Three. differentiate coupling step: after inputting one section of voice, the method of step one is adopted to generate 1 stack features and train in the model of each speaker obtained by this feature input step two, calculate the likelihood value of this stack features for each model, get wherein maximum likelihood value to confirm the numbering of speaker.
2. in method for distinguishing speek person according to claim 1, the feature of step one A is, during the feature extraction of differential lines spectral frequency, by traditional line spectral frequencies proper vector except after π normalization, in vector, each adjacent element subtracts each other, acquisition Differential Characteristics vector, and increase a regular element usually ensure obtain difference vector 1 norm be 1.
3. in method for distinguishing speek person according to claim 1, the feature of step one B is, 3 adjacent frame difference line spectral frequencies features combined when composite difference line spectral frequencies feature obtains, the spacing of consecutive frame is 1.
4. in method for distinguishing speek person according to claim 1, the feature of step 2 is, the probability model of speaker is defined as
HD ( x ) = π ZeroDens p ( x | ZeroDens ) + 1 - π ZeroDens H Σ i H p ( x | A i , b i ) ,
Wherein π zeroDensp (x|ZeroDens) defines and locates Probabilistic estimation zero point in histogram, define the method for estimation of average histogram probability; Wherein for the probability that zero point in statistic histogram occurs; The prior probability that p (x|ZeroDens) is null position; for the histogram probability estimate of input test data x in i-th conversion, wherein y jfor the feature that a jth training data generates after i-th conversion; Y is the feature that input test data x generates after i-th conversion,
5. method for distinguishing speek person according to claim 4, wherein prior probability p (x|ZeroDens) should utilize compound Dirichlet distribute to estimate.
CN201410232526.2A 2014-05-28 2014-05-28 Text-independent speaker identification device based on random projection histogram model Active CN103985384B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410232526.2A CN103985384B (en) 2014-05-28 2014-05-28 Text-independent speaker identification device based on random projection histogram model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410232526.2A CN103985384B (en) 2014-05-28 2014-05-28 Text-independent speaker identification device based on random projection histogram model

Publications (2)

Publication Number Publication Date
CN103985384A CN103985384A (en) 2014-08-13
CN103985384B true CN103985384B (en) 2015-04-15

Family

ID=51277327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410232526.2A Active CN103985384B (en) 2014-05-28 2014-05-28 Text-independent speaker identification device based on random projection histogram model

Country Status (1)

Country Link
CN (1) CN103985384B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108630207B (en) * 2017-03-23 2021-08-31 富士通株式会社 Speaker verification method and speaker verification apparatus
CN112331215B (en) * 2020-10-26 2022-11-15 桂林电子科技大学 Voiceprint recognition template protection algorithm

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207961A (en) * 2013-04-23 2013-07-17 曙光信息产业(北京)有限公司 User verification method and device
CN103685185A (en) * 2012-09-14 2014-03-26 上海掌门科技有限公司 Mobile equipment voiceprint registration and authentication method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103685185A (en) * 2012-09-14 2014-03-26 上海掌门科技有限公司 Mobile equipment voiceprint registration and authentication method and system
CN103207961A (en) * 2013-04-23 2013-07-17 曙光信息产业(北京)有限公司 User verification method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Vector Quantization of LSF Parameters With a Mixture of Dirichlet Distributions;Zhanyu Ma,Arne Leijon,W. Bastiaan Kleijn;《IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》;20130930;1777-1790 *

Also Published As

Publication number Publication date
CN103985384A (en) 2014-08-13

Similar Documents

Publication Publication Date Title
CN102800316B (en) Optimal codebook design method for voiceprint recognition system based on nerve network
Yujin et al. Research of speaker recognition based on combination of LPCC and MFCC
CN109637545B (en) Voiceprint recognition method based on one-dimensional convolution asymmetric bidirectional long-short-time memory network
CN103345923B (en) A kind of phrase sound method for distinguishing speek person based on rarefaction representation
CN105869624A (en) Method and apparatus for constructing speech decoding network in digital speech recognition
CN103065629A (en) Speech recognition system of humanoid robot
CN102568476B (en) Voice conversion method based on self-organizing feature map network cluster and radial basis network
CN101923855A (en) Test-irrelevant voice print identifying system
CN104123933A (en) Self-adaptive non-parallel training based voice conversion method
CN102789779A (en) Speech recognition system and recognition method thereof
Zhang et al. Speech emotion recognition using combination of features
CN106024010A (en) Speech signal dynamic characteristic extraction method based on formant curves
Wang et al. Research on speech emotion recognition technology based on deep and shallow neural network
CN103456302A (en) Emotion speaker recognition method based on emotion GMM model weight synthesis
CN102237083A (en) Portable interpretation system based on WinCE platform and language recognition method thereof
CN104464738B (en) A kind of method for recognizing sound-groove towards Intelligent mobile equipment
CN103258531A (en) Harmonic wave feature extracting method for irrelevant speech emotion recognition of speaker
Xue et al. Cross-modal information fusion for voice spoofing detection
CN103985384B (en) Text-independent speaker identification device based on random projection histogram model
Hu et al. Speaker recognition based on short utterance compensation method of generative adversarial networks
Koolagudi et al. Speaker recognition in the case of emotional environment using transformation of speech features
Feng et al. Speech emotion recognition based on LSTM and Mel scale wavelet packet decomposition
Saritha et al. Deep Learning-Based End-to-End Speaker Identification Using Time–Frequency Representation of Speech Signal
Jian et al. An embedded voiceprint recognition system based on GMM
CN103871411A (en) Text-independent speaker identifying device based on line spectrum frequency difference value

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant