CN103985384A - Text-independent speaker identification device based on random projection histogram model - Google Patents

Text-independent speaker identification device based on random projection histogram model Download PDF

Info

Publication number
CN103985384A
CN103985384A CN201410232526.2A CN201410232526A CN103985384A CN 103985384 A CN103985384 A CN 103985384A CN 201410232526 A CN201410232526 A CN 201410232526A CN 103985384 A CN103985384 A CN 103985384A
Authority
CN
China
Prior art keywords
model
speaker
histogram
feature
random
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410232526.2A
Other languages
Chinese (zh)
Other versions
CN103985384B (en
Inventor
于泓
马占宇
郭军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201410232526.2A priority Critical patent/CN103985384B/en
Publication of CN103985384A publication Critical patent/CN103985384A/en
Application granted granted Critical
Publication of CN103985384B publication Critical patent/CN103985384B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The embodiment of the invention discloses a text-independent speaker identification device based on a random projection histogram model. The method includes the three steps of characteristic extraction, model training and identification; wherein in the characteristic extraction step, non-normalized incremental line spectrum frequency characteristics are converted into normalized differential line spectrum frequency characteristics, the differential line spectrum frequency characteristics of consecutive frames are combined to generate composite differential line spectrum frequency characteristics for expressing the dynamic performance of signals; in the model training process, random projection parameters are designed according to the distribution characteristics of the composite differential line spectrum frequency characteristics, random projection is conducted on training data sets, and probability models are established by calculating an average histogram; in the identification process, characteristic extraction is conducted on voice signals of a person to be identified according to the first step, then the extracted characteristics are input into the models obtained in the second step, the likelihood value of each probability model is calculated, the maximum likelihood value is obtained, and then the serial number of a speaker is identified. By the adoption of the method, the text-independent speaker identification rate can be increased, and the method has great practical value.

Description

A kind of text-independent speaker identification device based on Random Maps histogram model
Technical field
The invention belongs to field of audio processing and described emphatically a kind of text-independent speaker identification device based on Random Maps histogram model.
Background technology
Speaker Identification is that computing machine utilizes the information that can reflect speaker characteristic comprising in sound bite to identify the technology of speaker ' s identity, and this technology is in information security, and the fields such as long-distance identity-certifying have very important research and using value.
According to the difference of identifying object, speaker can be differentiated and is divided into text dependent and text-independent two classes.Speaker's authentication technique of text dependent wherein, requires to utilize keyword that speaker pronounces and crucial sentence as training sample, utilizes identical content pronunciation to identify while distinguishing, this system is used inconvenience and the easy stolen record of key content.Speaker's recognition techniques of text-independent, when training and while recognizing, do not stipulate the content of speaking, identifying object is voice signal freely, need in voice signal freely, find feature and the method for the information that can characterize speaker, therefore set up speaker model difficulty relatively, but this utilization is convenient and safe.Described in the invention is the identification device of text-independent.
Speaker differentiates that conventionally comprising 3 ingredients (1) extracts the feature that can express speaker's feature from training utterance data centralization; (2) for speaker trains a model that can reflect its phonetic feature regularity of distribution; (3) feature of inputting voice by calculating is made final decision with the degree of agreeing with of the training pattern of having obtained.
Conventional speaker's identification system adopts MFCC (Mel-frequency Cepstral Coefficients in feature extraction part, Mel cepstrum coefficient) or LSF (Line Spectral Frequencies, line spectral frequencies) as essential characteristic, in model training part, adopt GMM (Gaussian Mixture Model, gauss hybrid models) or statistic histogram as probability model.
Traditional feature is easy to be subject to noise and multidate information beyond expression of words, GMM model is only applicable carries out modeling for the wider feature of distribution range, although statistic histogram model can carry out modeling to the characteristic signal of any distribution, but when lack of training samples or characteristic dimension are when too high, in the model of setting up, there is a large amount of zero points, cause result discontinuous.The method for distinguishing speek person of text-independent described in the invention can solve the above problems greatly.
Summary of the invention
In order to solve the existing defect of above-mentioned technology and to improve speaker's resolution of text-independent, the invention provides a kind of text-independent speaker discrimination method based on composite difference line spectral frequencies feature and stochastic transformation histogram model, comprise the following steps:
One. characteristic extraction step:
A, differential lines spectral frequency characteristic extraction step: by the non-normalized line spectral frequencies eigentransformation increasing progressively of K dimension of obtaining from speech linear predictive coding model, be that K+1 ties up normalized differential lines spectral frequency feature.
The step of B, generation composite difference line spectral frequencies feature: 3 adjacent frame difference line spectral frequencies features are carried out to combination producing composite difference line spectral frequencies feature with the dynamic perfromance of expression signal.
Two. Random Maps histogram model training step: the training utterance for each speaker extracts T frame composite difference line spectral frequencies feature as 1 group of training dataset according to the description of step 1.The method of employing Random Maps is carried out H stochastic transformation to this training dataset and is obtained H group training characteristics.Every stack features is carried out to statistics with histogram, and utilize the average histogram of H group training characteristics as this speaker's probability model.Final each speaker can train and obtain an one's own model.
Three. differentiate coupling step: input after one section of voice, adopt the method for step 1 generate 1 stack features and will in this feature input step two, train in each speaker's who obtains model, calculate this stack features for the likelihood value of each model, get the numbering that maximum likelihood value is wherein confirmed speaker.
According to speaker's discrimination method of a kind of and text-independent of an embodiment of the invention, the described normalized differential lines spectral frequency feature extraction mode of steps A is as follows:
[x wherein 1, x 2..., x k] tfor the K dimension line spectral frequencies feature before conversion, △ x is the normalization differential lines spectral frequency feature of the rear K+1 dimension of conversion
According to speaker's discrimination method of a kind of and text-independent of an embodiment of the invention, the concrete generative process of the composite difference line spectral frequencies feature of describing in step B is as follows:
The differential lines spectral frequency of supposing t frame is characterized as △ x (t), and the composite difference line spectral frequencies of t frame is characterized as:
Sup△x(t)=[△x(t-τ) T,△x(t) T,△x(t+τ) T] T
Wherein τ is positive integer, gets τ=1 in the present invention.
According to speaker's discrimination method of a kind of and text-independent of an embodiment of the invention, the model training method described in step 2 is as follows:
1) the composite difference line spectral frequencies feature of dimension D=K+1 is carried out to Random Maps conversion, transformation for mula is: y=Ax+b, and wherein A is the Random-Rotation scaled matrix of D * D dimension, b is the random translation vector of D * 1 dimension.
2) random translation vector b=[b 1, b 2..., b i... b k+1] tin each element value be equally distributed stochastic variable between 0~1.
3) rotation scaled matrix A is the product of Random-Rotation unit matrix U and random convergent-divergent diagonal matrix Λ
A=ΛU
|U|=1
4) design process of Random-Rotation unit matrix U is as follows:
1. the stochastic matrix V that generates a D * D dimension, each element in V meets being uniformly distributed between 0~1
2. matrix V is carried out to QR and decompose V=QR, wherein Q is unit orthogonal matrix
3. by judging that whether the determinant of Q equals 1, comes element q 11revise guarantee Q the value of determinant be 1
5) design process of random convergent-divergent diagonal matrix Λ is as follows:
The element of j dimension of composite difference line spectral frequencies feature meets Beta and distributes, and its probability density function is
Beta ( x j ; α j , β j ) = Γ ( α j + β j ) Γ ( α j ) Γ ( β j ) x j α j - 1 ( 1 - x j ) β j - 1
If
R ( x j ; α j , β j ) = ∫ 0 1 B eta 2 ( x j ; α j , β j ) dx j
h j = R ( x j ; α j , β j ) - 1 2 ( 6 Π i = 1 D R ( x i ; α i , β i ) 1 2 ) 1 2 + D N - 1 2 + D
Wherein D is the dimension of composite difference line spectral frequencies feature, the number that N is training characteristics.
In diagonal matrix Λ, the value of diagonal entry is
log ( λ j ) = Uniform [ θ min + log ( h j - 1 ) , θ max + log ( h j - 1 ) ]
θ wherein min=0, θ max=2 is relaxation parameter.
6) after Random Maps, to build probability model mode as follows for training data:
HD ( x ) = π ZeroDens p ( x | ZeroDens ) + 1 - π ZeroDens H Σ i H p ( x | A i , b i ) ,
First half is that probability estimate is carried out in the position at zero point in histogram, wherein for the probability occurring zero point in statistic histogram.The prior probability that p (x|ZeroDens) is null position, the priori is here compound Di Li Cray process.The proper vector of input is:
x=Sup△x(t)=[△x(t-τ) T,△x(t) T,△x(t+τ) T] T=[△x 1,△x 2,△x 3] T
p ( x | ZeroDens ) = Π n = 1 3 Γ ( Σ k = 1 K + 1 α n , k ) Π k = 1 K + 1 Γ ( α n , k ) Π k = 1 K + 1 ( Δx n , k ) α n , k - 1
Latter half is average statistics histogram probability estimate, and wherein H is the number of times that carries out Random Maps, 1 group of training dataset that contains N training data after H Random Maps, be transformed to H group training dataset
Wherein p (x|A i, b i) be the histogram probability estimate of input test data x in the i time conversion, be defined as follows:
p ( x | A i , b i ) = 1 Hv Σ j = 1 N II ( round ( y j ) , round ( y ) )
y=A ix+b i
v=|A i| -1
According to speaker's discrimination method of a kind of and text-independent of an embodiment of the invention, the discriminating matching process implementation method described in step 3 is: by the characteristic data set of input be transported to and be directed to each speaker and train in probability model, calculate likelihood value.
L j ( x ~ ) = Σ i = 1 N log ( HT j ( x i ) )
Wherein for test feature collection about the likelihood value of j speaker model, by getting maximum likelihood value wherein, confirm speaker's numbering.
Beneficial effect of the present invention is, in terms of existing technologies, the present invention's application composite difference line spectral frequencies feature is extracted as speaker's feature, use Random Maps histogram training probability model, provide again complete implementation system for application, experiment show high efficiency of the present invention, there is very strong practicality.
Below in conjunction with accompanying drawing, specific embodiments of the present invention is described in detail.
Fig. 1 is process flow diagram of the present invention, and wherein solid line represents training department's minute flow process trend, and dotted line represents to differentiate part flow process trend, comprises the following steps:
The first step: characteristic extraction step, from speaker's voice sequence of training, extract composite difference line spectral frequencies feature.
Step S1: be differential lines spectral frequency feature by line spectral frequencies Feature Conversion;
Step S2: the differential lines spectral frequency feature of obtaining in S1 is combined and obtains composite difference line spectral frequencies feature.
Second step: training probability model
Step S3: build the distribution of Random Maps histogram model matching composite difference line spectral frequencies feature, realize details as shown in Figure 2.
The 3rd step: discrimination process
Step S1 and step S2 that speaker's voice sequence to be identified is repeated in the first step generate composite difference line spectral frequencies characteristic test collection, and input step S3 trains the model obtaining.
Step S4: calculate the likelihood value for each probability model, obtain maximum likelihood value, confirm speaker's numbering.
To be specifically described each step below:
Step S1 realizes the extraction of differential lines spectral frequency feature, by the non-normalized line spectral frequencies eigentransformation increasing progressively of K dimension of obtaining from speech linear predictive coding model, is that K+1 ties up normalized differential lines spectral frequency feature, and its implementation is as follows:
[x wherein 1, x 2..., x k] tfor the K dimension line spectral frequencies feature before conversion, △ x is the normalization differential lines spectral frequency feature of the rear K+1 dimension of conversion.
Step S2 carries out combination producing composite difference line spectral frequencies feature with the dynamic perfromance of expression signal by 3 adjacent frame difference line spectral frequencies features.The differential lines spectral frequency of supposing t frame is characterized as △ x (t), and the composite difference line spectral frequencies of t frame is characterized as:
Sup△x(t)=[△x(t-τ) T,△x(t) T,△x(t+τ) T] T
Wherein τ is positive integer, gets τ=1 in the present invention.
Step S3: build the distribution of Random Maps histogram model matching composite difference line spectral frequencies feature, concrete realization flow as shown in Figure 2:
1) according to the overall distribution of composite difference line spectral frequencies feature, obtain the prior probability that in histogram, locate zero point.
If the composite difference line spectral frequencies proper vector of closing of input is:
x=Sup△x(t)=[△x(t-τ) T,△x(t) T,△x(t+τ) T] T=[△x 1,△x 2,△x 3] T
Whole being distributed as of composite difference line spectral frequencies feature:
p ( x | ZeroDens ) = Π n = 1 3 Γ ( Σ k = 1 K + 1 α n , k ) Π k = 1 K + 1 Γ ( α n , k ) Π k = 1 K + 1 ( Δx n , k ) α n , k - 1
The prior probability that occur zero point in histogram is
π ZeroDens = 1 N + 1
In histogram, the prior distribution of null position is:
π ZeroDensp(x|ZeroDens)
2) the composite difference line spectral frequencies proper vector of closing of input is carried out Random Maps structure and calculated average histogram.
The formula that the composite difference line spectral frequencies feature of dimension D=K+1 is carried out to Random Maps conversion is y=Ax+b, and wherein A is the Random-Rotation scaled matrix of D * D dimension, and b is the random translation vector of D * 1 dimension.
Random translation vector b=[b 1, b 2..., b i... b k+1] tin each element value be equally distributed stochastic variable between 0~1.
Random-Rotation scaled matrix A can be decomposed into the product of Random-Rotation unit matrix U and random convergent-divergent diagonal matrix Λ
A=ΛU
|U|=1
Wherein the design process of Random-Rotation unit matrix U is as follows:
1. the stochastic matrix V that generates a D * D dimension, each element in V meets being uniformly distributed between 0~1
2. matrix V is carried out to QR and decompose V=QR, wherein Q is unit orthogonal matrix
3. by judging that whether the determinant of Q equals 1, comes element q 11the value of revising the determinant that guarantees Q is 1
The design process of random convergent-divergent diagonal matrix Λ is as follows:
1. calculate the distribution of each element in composite difference line spectral frequencies proper vector.The element of j dimension meets Beta and distributes, and its probability density function is
Beta ( x j ; α j , β j ) = Γ ( α j + β j ) Γ ( α j ) Γ ( β j ) x j α j - 1 ( 1 - x j ) β j - 1
2. calculate the wide h of histogrammic optimum bin in each dimension
R ( x j ; α j , β j ) = ∫ 0 1 B eta 2 ( x j ; α j , β j ) dx j
h j = R ( x j ; α j , β j ) - 1 2 ( 6 Π i = 1 D R ( x i ; α i , β i ) 1 2 ) 1 2 + D N - 1 2 + D
Wherein D is the dimension of composite difference line spectral frequencies feature, the number that N is training characteristics.
3. according to the wide h of optimum bin, generate the value λ of diagonal entry in diagonal matrix Λ
log ( λ j ) = Uniform [ θ min + log ( h j - 1 ) , θ max + log ( h j - 1 ) ] , θ wherein min=0, θ max=2 is relaxation parameter.
According to above-mentioned flow process, obtain after stochastic transformation parameter A, b, training characteristics data set is carried out to H stochastic transformation, 1 group of training dataset that contains N training sample after Random Maps, generate H group training dataset wherein the average histogram of H group training dataset is:
1 - π ZeroDens H Σ i H p ( x | A i , b i )
P (x|A wherein i, b i) be the histogram probability estimate of input test data x in the i time conversion, be defined as follows:
p ( x | A i , b i ) = 1 Hv Σ j = 1 N II ( round ( y j ) , round ( y ) )
y=A ix+b i
v=|A i| -1
Therefore the Random Maps histogram figure probability estimate model finally obtaining is:
HD ( x ) = π ZeroDens p ( x | ZeroDens ) + 1 - π ZeroDens H Σ i H p ( x | A i , b i )
Discriminating matching process implementation method described in step S4 is:
By the characteristic data set of input be transported to and be directed to each speaker and train in probability model, calculate likelihood value.
L j ( x ~ ) = Σ i = 1 N log ( HT j ( x i ) )
Wherein for test feature collection about the likelihood value of j speaker model, by getting maximum likelihood value wherein, confirm speaker's numbering.
Below by reference to the accompanying drawings the embodiment of the Speaker Identification scheme of the proposed text-independent based on composite difference line spectral frequencies feature and stochastic transformation histogram model is set forth.By the description of above embodiment, one of ordinary skill in the art can clearly recognize that the mode that the present invention can add essential general hardware platform by software realizes.Understanding based on such, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of computer software product, this software product is stored in a storage medium, comprises that some instructions are used so that the method described in each embodiment of one or more computer equipment execution the present invention.
According to thought of the present invention, all will change in specific embodiments and applications.In sum, this description should not be construed as limitation of the present invention.
Above-described embodiment of the present invention, does not form the restriction to invention protection domain.Any modification of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.

Claims (10)

1. the text-independent speaker identification device based on Random Maps histogram model, is characterized in that, comprises the following steps:
One. characteristic extraction step:
A, the feature extraction of differential lines spectral frequency: by the non-normalized line spectral frequencies eigentransformation increasing progressively of K dimension of obtaining from speech linear predictive coding model, be that K+1 ties up normalized differential lines spectral frequency feature;
B, generation composite difference line spectral frequencies feature: 3 adjacent frame difference line spectral frequencies features are carried out to combination producing composite difference line spectral frequencies feature with the dynamic perfromance of expression signal.
Two. Random Maps histogram model training step: the training utterance for each speaker extracts T frame composite difference line spectral frequencies feature as 1 group of training dataset according to the description of step 1.The method of employing Random Maps is carried out H stochastic transformation to this training dataset and is obtained H group training characteristics.Every stack features is carried out to statistics with histogram, and utilize the average histogram of H group training characteristics as this speaker's probability model.Final each speaker can train and obtain an one's own model.
Three. differentiate coupling step: input after one section of voice, adopt the method for step 1 generate 1 stack features and will in this feature input step two, train in each speaker's who obtains model, calculate this stack features for the likelihood value of each model, get the numbering that maximum likelihood value is wherein confirmed speaker.
2. in method for distinguishing speek person according to claim 1, step 1 A is characterised in that, during the feature extraction of differential lines spectral frequency, traditional line spectral frequencies proper vector is removed after π normalization, in vector, each adjacent element subtracts each other, obtain Differential Characteristics vector, and increase a regular element and usually guarantee that difference vector 1 norm of obtaining is 1.
3. in method for distinguishing speek person according to claim 1, step 1 B is characterised in that, when composite difference line spectral frequencies feature is obtained, 3 adjacent frame difference line spectral frequencies features is combined, and the spacing of consecutive frame is 1.
4. in method for distinguishing speek person according to claim 1, step 2 is characterised in that, stochastic transformation mode is y=Ax+b, and wherein A is Random-Rotation scaled matrix, and b is random translation vector.
5. according to the random translation vector b described in claim 4, it is characterized in that, each element in b, should meet being uniformly distributed between 0~1.
6. according to right, want the Random-Rotation scaled matrix A described in 4 it is characterized in that, A is the product of unit Orthogonal Units matrix U and diagonal matrix Λ.
7. according to the orthogonal matrix U of unit described in claim 6, it is characterized in that, the equally distributed square formation V that U is met between 0-1 by an all elements generates, V is carried out to QR decomposition, and whether the Q determinant of a matrix value obtaining according to decomposition is 1 its upper left corner element to be revised to obtain U.
8. according to the diagonal matrix Λ described in claim 6, it is characterized in that, the diagonal entry value of Λ is
θ wherein min=0, θ max=2, h jfor training characteristics j dimension, histogrammic best bin is wide, and this numerical value is decided by the regularity of distribution of training data.
9. in method for distinguishing speek person according to claim 1, step 2 is characterised in that, speaker's probability model is defined as
Wherein the first half of equation has defined and has located Probabilistic estimation zero point in histogram, and latter half has defined the method for estimation of average histogram probability.
Wherein for the probability occurring zero point in statistic histogram.The prior probability that p (x|ZeroDens) is null position for input test data x the i time conversion in histogram probability estimate wherein
10. the prior probability p (x|ZeroDens) of null position according to claim 9 is characterized in that, this priori should utilize compound Dirichlet distribute to estimate.
CN201410232526.2A 2014-05-28 2014-05-28 Text-independent speaker identification device based on random projection histogram model Active CN103985384B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410232526.2A CN103985384B (en) 2014-05-28 2014-05-28 Text-independent speaker identification device based on random projection histogram model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410232526.2A CN103985384B (en) 2014-05-28 2014-05-28 Text-independent speaker identification device based on random projection histogram model

Publications (2)

Publication Number Publication Date
CN103985384A true CN103985384A (en) 2014-08-13
CN103985384B CN103985384B (en) 2015-04-15

Family

ID=51277327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410232526.2A Active CN103985384B (en) 2014-05-28 2014-05-28 Text-independent speaker identification device based on random projection histogram model

Country Status (1)

Country Link
CN (1) CN103985384B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108630207A (en) * 2017-03-23 2018-10-09 富士通株式会社 Method for identifying speaker and speaker verification's equipment
CN112331215A (en) * 2020-10-26 2021-02-05 桂林电子科技大学 Voiceprint recognition template protection algorithm

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207961A (en) * 2013-04-23 2013-07-17 曙光信息产业(北京)有限公司 User verification method and device
CN103685185A (en) * 2012-09-14 2014-03-26 上海掌门科技有限公司 Mobile equipment voiceprint registration and authentication method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103685185A (en) * 2012-09-14 2014-03-26 上海掌门科技有限公司 Mobile equipment voiceprint registration and authentication method and system
CN103207961A (en) * 2013-04-23 2013-07-17 曙光信息产业(北京)有限公司 User verification method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHANYU MA,ARNE LEIJON,W. BASTIAAN KLEIJN: "Vector Quantization of LSF Parameters With a Mixture of Dirichlet Distributions", 《IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108630207A (en) * 2017-03-23 2018-10-09 富士通株式会社 Method for identifying speaker and speaker verification's equipment
CN112331215A (en) * 2020-10-26 2021-02-05 桂林电子科技大学 Voiceprint recognition template protection algorithm
CN112331215B (en) * 2020-10-26 2022-11-15 桂林电子科技大学 Voiceprint recognition template protection algorithm

Also Published As

Publication number Publication date
CN103985384B (en) 2015-04-15

Similar Documents

Publication Publication Date Title
Yujin et al. Research of speaker recognition based on combination of LPCC and MFCC
CN102800316B (en) Optimal codebook design method for voiceprint recognition system based on nerve network
CN102820033B (en) Voiceprint identification method
CN109637545B (en) Voiceprint recognition method based on one-dimensional convolution asymmetric bidirectional long-short-time memory network
CN103345923B (en) A kind of phrase sound method for distinguishing speek person based on rarefaction representation
CN105869624A (en) Method and apparatus for constructing speech decoding network in digital speech recognition
CN105355214A (en) Method and equipment for measuring similarity
CN102637433A (en) Method and system for identifying affective state loaded in voice signal
Zhang et al. Speech emotion recognition using combination of features
CN103456302A (en) Emotion speaker recognition method based on emotion GMM model weight synthesis
Wataraka Gamage et al. Speech-based continuous emotion prediction by learning perception responses related to salient events: A study based on vocal affect bursts and cross-cultural affect in AVEC 2018
CN103258531A (en) Harmonic wave feature extracting method for irrelevant speech emotion recognition of speaker
Aliaskar et al. Human voice identification based on the detection of fundamental harmonics
Shen et al. Rars: Recognition of audio recording source based on residual neural network
CN104464738A (en) Vocal print recognition method oriented to smart mobile device
CN103985384B (en) Text-independent speaker identification device based on random projection histogram model
Koolagudi et al. Speaker recognition in the case of emotional environment using transformation of speech features
Rodman et al. Forensic speaker identification based on spectral moments
Feng et al. Speech emotion recognition based on LSTM and Mel scale wavelet packet decomposition
Saritha et al. Deep Learning-Based End-to-End Speaker Identification Using Time–Frequency Representation of Speech Signal
Herrera-Camacho et al. Design and testing of a corpus for forensic speaker recognition using MFCC, GMM and MLE
Jian et al. An embedded voiceprint recognition system based on GMM
CN103871411A (en) Text-independent speaker identifying device based on line spectrum frequency difference value
Li et al. Fast speaker clustering using distance of feature matrix mean and adaptive convergence threshold
Fan et al. Deceptive Speech Detection based on sparse representation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant