CN101256768A - Time frequency two-dimension converse spectrum characteristic extracting method for recognizing language species - Google Patents

Time frequency two-dimension converse spectrum characteristic extracting method for recognizing language species Download PDF

Info

Publication number
CN101256768A
CN101256768A CNA2008101033280A CN200810103328A CN101256768A CN 101256768 A CN101256768 A CN 101256768A CN A2008101033280 A CNA2008101033280 A CN A2008101033280A CN 200810103328 A CN200810103328 A CN 200810103328A CN 101256768 A CN101256768 A CN 101256768A
Authority
CN
China
Prior art keywords
frequency
frame
feature
matrix
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2008101033280A
Other languages
Chinese (zh)
Other versions
CN101256768B (en
Inventor
张卫强
刘加
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN2008101033280A priority Critical patent/CN101256768B/en
Publication of CN101256768A publication Critical patent/CN101256768A/en
Application granted granted Critical
Publication of CN101256768B publication Critical patent/CN101256768B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)
  • Complex Calculations (AREA)

Abstract

A time frequency two-dimensional cepstrum characteristic extracting method for identifying the languages is related to, characterized in that the method first computes voice signal sub-band energy in each frame, obtains a time frequency distributive matrix after jointing the multi-frame sub-band energy, then performs a two-dimensional DCT conversion, eliminates the relativity of a matrix time direction and a frequency direction, and then re-arrays the converted coefficient and decreases the dimension to obtain the final characteristic. The characteristic not only uses the short-time stability of the voice, but also extracts the long-time information for identifying the languages. The method can be used for identifying the languages.

Description

The time-frequency two-dimensional cepstrum feature extracting method that is used for languages identification
Technical field
The invention belongs to field of speech recognition, specifically, relate to a kind of time-frequency two-dimensional cepstrum feature extracting method, can be used for languages identification.
Background technology
Languages identification is meant the kind of using machine to identify its language from one section voice signal.The languages recognition technology is mainly used in people and systems such as interactive voice dialogue, speech polling and monitoring.
Use the most in the languages identification at present that universal characteristics is the MFCC (Mel frequency cepstral coefficient) and the feature of deriving thereof, also have LPCC (linear prediction cepstrum coefficient) and PLP (perception linear prediction) etc. in addition.Wherein LPCC is according to the proposition of people's sound generating mechanism, and MFCC and PLP part considered that the people's listens the perception characteristic.
In the languages identification, the above-mentioned essential characteristic of general using is carried out computing again, obtains the feature of deriving, and uses in the lump then with after the former feature splicing.The most frequently used feature of deriving is the difference feature, generally comprises first order difference and second order difference.The essential characteristic of supposing the t frame is { c j(t), j=0,1 ..., N-1}, then one jump branch is characterized as
δ j ( t ) = Σ d = 1 D d ( c j ( t + d ) - c j ( t - d ) ) Σ d = 1 D d 2 , j = 1,2 , . . . N - 1 - - - ( 1 )
Wherein D is the size of difference window, and general value is 2.In like manner, by first order difference δ j(t) calculate by formula (2) and can obtain second order difference α j(t).
α j ( t ) = Σ d = 1 D d ( δ j ( t + d ) - δ j ( t - d ) ) Σ d = 1 D d 2 , j = 1,2 , . . . N - 1 - - - ( 2 )
With essential characteristic and its single order and second order difference splicing, can obtain a new eigenvector, { c j(t), j=0,1 ..., N-1; δ j(t), j=0,1 ..., N-1; α j(t), j=0,1 ..., N-1}.
In addition, in languages identification, time sequence information is a very main feature, and in order to make full use of the time sequence information in the voice, scholars have proposed SDC (offset deviation is divided cepstrum) feature in recent years.The SDC feature is actually by K piece first order difference feature and is spliced, and can be expressed as
s (iN+j)(t)=c j(t+iS+b)-c j(t+iS-b),j=1,2,...N-1;i=0,1,...,K-1 (3)
Frame number when wherein b is for calculating first order difference feature is poor, and general value is 1; K is the piece number, and general value is 7; S is the skew frame number between each piece, and general value is 3.
With the difference feature class seemingly, SDC also can splice with essential characteristic, forms new eigenvector { c j(t), j=0,1 ..., N-1; s (iN+j)(t), j=0,1 ..., N-1, i=0,1 ..., K-1}.Experiment showed, that the simple SDC feature of this aspect ratio is more effective.
Though contained more time sequence information in the SDC feature, because it is spliced by some first order differences, can there be the problem of two aspects in this: the first, its dimension is higher, has increased the complexity of system; The second, still there is stronger correlativity between each dimension, be unfavorable for that the rear end sorter is to its modeling.
Summary of the invention
In order to solve the deficiency that existing SDC feature exists, the invention provides a kind of extracting method of time-frequency two-dimensional cepstrum feature, both reduced the correlativity between each dimension of feature, reduced the dimension of feature again, and can reduce the complexity of language recognition system and improve its performance.When using digital integrated circuit to realize, compare, adopt the present invention's (21 dimensional feature) can make characteristic storage module and sorter computing module economize on resources 62.5% with 56 dimension SDC features commonly used at present.
The invention is characterized in that described method realizes according to the following steps in digital integrated circuit chip:
Step (1): voice signal is carried out zero-meanization and pre-emphasis, and wherein zero-meanization is meant that whole section voice deduct its average, and pre-emphasis is that voice are carried out high-pass filtering, and filter transfer function is H (z)=1-0.975z -1
Step (2): voice signal is pressed frame length 20ms, and frame moves 10ms and carries out the processing of branch frame;
Step (3): the two-dimentional time-frequency distributions matrix of setting up an information when reflecting voice simultaneously stationarity and languages are long in short-term according to the following steps:
Step (3.1): described voice signal is added Hamming window, obtain data x (m), m=0,1 ..., M-1}, M are that frame data are counted;
Step (3.2): the data that add Hamming window are done DFT conversion (discrete Fourier transform (DFT)), obtain:
X ( ω k ) = Σ m = 0 M - 1 x ( m ) e - j 2 π M mk
ω wherein kRepresent frequency, k represents the frequency label;
Step (3.3): the sub belt energy e that in frequency field, calculates F quarter window in each frame by the Mel frequency marking with following formula f, F=24:
e f = 1 U f - L f + 1 Σ k = L f U f | X ( ω k ) | 2
U wherein fAnd L fBe respectively the up-and-down boundary of f subband, again F subband energy bins become a vector e:
e=[e 0,e 1,…,e F-1] T
Wherein subscript T represents transposition;
Step (3.4): get the middle T frame vector of step (3.3) and be in juxtaposition, form a two-dimentional time-frequency distributions matrix E (t), T=19:
E ( t ) = [ e ( t ) , e ( t + 1 ) , · · · , e ( t + T - 1 ) ]
Figure A20081010332800064
Step (4): matrix E (t) is carried out two-dimensional dct (discrete cosine transform), obtain two-dimentional cepstrum coefficient:
C ( p , q ) = γ p γ q Σ τ = 0 T - 1 Σ f = 0 F - 1 e f ( t + τ - 1 ) cos π ( 2 τ + 1 ) p 2 T cos π ( 2 f + 1 ) q 2 F
Wherein τ and f are summation variable, γ pAnd γ qBe normalization coefficient:
γ p = 1 / T , p = 0 2 / T , p ≥ 1 , γ q = 1 / F , q = 0 2 / F , q ≥ 1
Step (5): choose element as the upper left corner part of matrix E (t) fundamental component as feature, represent with TFC, then the rearrangement formulae that upper left corner part is arranged as vector is:
TFC ( ( p + q ) 2 + 3 p + q 2 ) = C ( p , q ) .
The invention has the beneficial effects as follows, can from voice signal, extract feature when effectively being used for languages identification long, both reduced each correlativity between tieing up of feature, reduced total dimension of feature again.Can improve the discrimination of languages identification like this, reduce the complexity of recognition system simultaneously again, reduce demand characteristic storage and sorter calculation resources.
Description of drawings
Fig. 1 is a feature extraction FB(flow block) of the present invention.
Fig. 2 is a time-frequency two-dimensional cepstrum feature numbering synoptic diagram of the present invention.
Embodiment
Because voice have stationarity in short-term, the frame length of generally choosing 20ms during feature extraction carries out short time discrete Fourier transform.Handle if get longer frame length, voice signal is no longer steady in a frame length.And the information of languages contains in long voice segments, and for example Chinese character of Chinese approximately continues 250ms, if frame moves and is 10ms, then is about as much as 25 frames.
Based on above consideration, the present invention at first adopts Fourier techniques in short-term, suppose a frame add behind the Hamming window data for x (m), m=0,1 ..., M-1}, its DFT is transformed to
X ( ω k ) = Σ m = 0 M - 1 x ( m ) e - j 2 π M mk - - - ( 4 )
ω wherein kRepresent frequency, k represents the frequency label.By Mel frequency marking each sub belt energy in the individual quarter window of frequency-domain calculations F (general value is 24), can get
e n = 1 U n - L n + 1 Σ k = L n U n | X ( ω k ) | 2 - - - ( 5 )
U wherein nAnd L nIt is respectively the up-and-down boundary of n subband.
F sub belt energy can be formed a vector
e=[e 0,e 1,…,e F-1] T (6)
Wherein subscript T represents transposition.T (general value is 19) the such vector of frame is in juxtaposition, and can form a two-dimentional time-frequency distributions matrix
E ( t ) = [ e ( t ) , e ( t + 1 ) , · · · , e ( t + T - 1 ) ]
Figure A20081010332800084
E (t) matrix had both utilized the stationarity in short-term of voice, information when having extracted languages long again.But its dimension is higher on the one hand, reaches T * F dimension; On the other hand, because the continuity of time-frequency distributions all exists certain correlativity between its horizontal (time orientation) and vertical (frequency direction) element.This two aspect is unfavorable for that all sorter is to its modeling.Can eliminate linear dependence and dimension between the feature by Linear transformation technology.
The present invention carries out two-dimensional dct transform to E (t) matrix, obtains two-dimentional cepstrum coefficient
C ( p , q ) = γ p γ q Σ τ = 0 T - 1 Σ f = 0 F - 1 e f ( t + τ - 1 ) cos π ( 2 τ + 1 ) p 2 T cos π ( 2 f + 1 ) q 2 F - - - ( 8 )
Wherein τ and f are summation variable, γ pAnd γ qBe normalization coefficient:
γ p = 1 / T , p = 0 2 / T , p ≥ 1 , γ q = 1 / F , q = 0 2 / F , q ≥ 1 - - - ( 9 )
Can remove the correlativity of vertical and horizontal like this, can make the fundamental component of E (t) matrix be compressed to matrix upper left corner part simultaneously, choose matrix upper left corner part element like this, get final product the whole matrix of approximate description, thereby reach the purpose of dimension compression.Hypothesis matrix upper left corner part element represents with TFC, and then the permutatation formula that triangular portions is arranged as vector is
TFC = ( ( p + q ) 2 + 3 p + q 2 ) = C ( p , q ) - - - ( 10 )
As depicted in figs. 1 and 2, it is as follows to implement concrete steps of the present invention:
(1) at first voice signal is carried out pre-service, comprise zero-meanization and pre-emphasis;
(2) frame length 20ms pressed in voice, frame moves 10ms and carries out the processing of branch frame;
(3) every frame voice are added Hamming window;
(4) data after the windowing are carried out the DFT conversion, obtain frequency spectrum;
(5) press Mel frequency marking sub belt energy in F quarter window in the every frame of frequency-domain calculations;
(6) each frame sub belt energy is arranged in chronological order, obtained the time-frequency energy distribution;
(7) choose the time-frequency energy of a rectangular window, the rectangular window time-axis direction length of side is the T point, and the frequency axis direction length of side is the F point, forms the time-frequency distributions matrix, carries out two-dimensional dct transform, obtains time-frequency two-dimensional cepstrum coefficient matrix;
(8) choose the coefficient in the triangle of the upper left corner in the time-frequency two-dimensional cepstrum coefficient matrix, and the coefficient in the diabolo rearranges and obtains vector, L dimension before getting, feature to the end.
The present invention tests Chinese (mainland mandarin), English (the non-southern accent of the U.S.) and the Japanese in the CallFriend database that adopts standard in the world.This database is the telephone conversation voice of 8kHz sampling μ rule compression.Training set is from the 1st dish of each languages, and totally 20 sections, every section is long double dialogue in about 30 minutes.Test set therefrom is syncopated as 500 sections, every section voice that contain 30 seconds approximately at random from the 3rd dish of each languages.
The time-frequency two-dimensional cepstrum feature that MFCC feature, SDC feature and the present invention are proposed compares test.All test sections are carried out languages respectively to each languages confirm, when adjusting false alarm rate and rate of failing to report equate, can obtain the wrong rate such as grade of system, adopt the evaluation index of the average wrong rate such as grade of each languages, wait wrong rate low more, show that system performance is good more as system.
In the experiment, adopt GMM (gauss hybrid models) as sorter, each GMM is made of 128 gaussian component.Adopt maximum likelihood method to train respectively, adopt K averaging method initialization model, use the Bauman-Welch algorithm iteration then 8 times.
MFCC feature employing 13 dimension essential characteristics (comprising C0) and single order, second order difference feature form the eigenvectors of totally 39 dimensions.The N-b-S-K parameter of SDC feature adopts 7-1-3-7 (comprising C0), adds 7 dimension MFCC features, forms the eigenvector of totally 56 dimensions altogether.The parameter that the time-frequency two-dimensional cepstrum feature adopts is F=24, T=19, and the L=21 dimension forms eigenvector before getting at last.
Experiment shows: adopt the MFCC feature, wrong rates such as languages identification are 15.57%; Adopt the SDC feature, wrong rates such as languages identification are 8.38%; Adopt the time-frequency two-dimensional cepstrum feature, wrong rates such as languages identification are 6.55%.As seen, the time-frequency two-dimensional cepstrum feature that proposes of the present invention is used for languages identification and improves a lot on performance than MFCC and SDC feature commonly used at present.
In addition, when digital integrated circuit is realized, compare, adopt the present invention's (21 dimensional feature) can make characteristic storage module and sorter computing module economize on resources 46.2% with 39 dimension MFCC features. Compare with 56 dimension SDC features, adopt the present invention's (21 dimensional feature) can make characteristic storage module and sorter computing module economize on resources 62.5%.

Claims (1)

1. the time-frequency two-dimensional cepstrum feature extracting method that is used for languages identification is characterized in that described method realizes according to the following steps in digital integrated circuit chip:
Step (1): voice signal is carried out zero-meanization and pre-emphasis, and wherein zero-meanization is meant that whole section voice deduct its average, and pre-emphasis is that voice are carried out high-pass filtering, and filter transfer function is H (z)=1-0.975z -1
Step (2): voice signal is pressed frame length 20ms, and frame moves 10ms and carries out the processing of branch frame;
Step (3): the two-dimentional time-frequency distributions matrix of setting up an information when reflecting voice simultaneously stationarity and languages are long in short-term according to the following steps:
Step (3.1): described voice signal is added Hamming window, obtain data x (m), m=0,1 ..., M-1}, M are that frame data are counted;
Step (3.2): the data that add Hamming window are done DFT conversion (discrete Fourier transform (DFT)), obtain:
X ( ω k ) = Σ m = 0 M - 1 x ( m ) e - j 2 π M mk
ω wherein kRepresent frequency, k represents the frequency label;
Step (3.3): the sub belt energy e that in frequency field, calculates F quarter window in every frame by the Mel frequency marking with following formula f, F=24:
e f = 1 U f - L f + 1 Σ k = L f U f | X ( ω k ) | 2
U wherein fAnd L fBe respectively the up-and-down boundary of f subband, again F subband energy bins become a vector e:
e=[e 0,e 1,…,e F-1] T
Wherein subscript T represents transposition;
Step (3.4): get the middle T frame vector of step (3.3) and be in juxtaposition, form a two-dimentional time-frequency distributions matrix E (t), T=19:
E ( t ) = [ e ( t ) , e ( t + 1 ) , · · · , e ( t + T - 1 ) ]
Figure A20081010332800032
Step (4): matrix E (t) is carried out two-dimensional dct (discrete cosine transform), obtain two-dimentional cepstrum coefficient:
C ( p , q ) = γ p γ q Σ τ = 0 T - 1 Σ f = 0 F - 1 e f ( t + τ - 1 ) cos π ( 2 τ + 1 ) p 2 T cos π ( 2 f + 1 ) q 2 F
Wherein τ and f are summation variable, γ pAnd γ qBe normalization coefficient:
γ p = 1 / T , p = 0 2 / T , p ≥ 1 , γ q = 1 / F , q = 0 2 / F , q ≥ 1
Step (5): choose element as the upper left corner part of matrix E (t) fundamental component as feature, represent with TFC, then the rearrangement formulae that upper left corner part is arranged as vector is:
TFC ( ( p + q ) 2 + 3 p + q 2 ) = C ( p , q ) .
CN2008101033280A 2008-04-03 2008-04-03 Time frequency two-dimension converse spectrum characteristic extracting method for recognizing language species Expired - Fee Related CN101256768B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008101033280A CN101256768B (en) 2008-04-03 2008-04-03 Time frequency two-dimension converse spectrum characteristic extracting method for recognizing language species

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008101033280A CN101256768B (en) 2008-04-03 2008-04-03 Time frequency two-dimension converse spectrum characteristic extracting method for recognizing language species

Publications (2)

Publication Number Publication Date
CN101256768A true CN101256768A (en) 2008-09-03
CN101256768B CN101256768B (en) 2011-03-30

Family

ID=39891525

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101033280A Expired - Fee Related CN101256768B (en) 2008-04-03 2008-04-03 Time frequency two-dimension converse spectrum characteristic extracting method for recognizing language species

Country Status (1)

Country Link
CN (1) CN101256768B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101702314B (en) * 2009-10-13 2011-11-09 清华大学 Method for establishing identified type language recognition model based on language pair
CN102723081A (en) * 2012-05-30 2012-10-10 林其灿 Voice signal processing method, voice and voiceprint recognition method and device
CN103021407A (en) * 2012-12-18 2013-04-03 中国科学院声学研究所 Method and system for recognizing speech of agglutinative language
CN103295583A (en) * 2012-02-24 2013-09-11 佳能株式会社 Method and equipment for extracting sub-band energy features of sound and monitoring system
CN104992424A (en) * 2015-07-27 2015-10-21 北京航空航天大学 Single-pixel rapid active imaging system based on discrete cosine transform
CN105068048A (en) * 2015-08-14 2015-11-18 南京信息工程大学 Distributed microphone array sound source positioning method based on space sparsity
CN106205638A (en) * 2016-06-16 2016-12-07 清华大学 A kind of double-deck fundamental tone feature extracting method towards audio event detection
CN109036458A (en) * 2018-08-22 2018-12-18 昆明理工大学 A kind of multilingual scene analysis method based on audio frequency characteristics parameter
CN112530407A (en) * 2020-11-25 2021-03-19 北京快鱼电子股份公司 Language identification method and system
CN114067834A (en) * 2020-07-30 2022-02-18 中国移动通信集团有限公司 Bad preamble recognition method and device, storage medium and computer equipment
CN114209325A (en) * 2021-12-23 2022-03-22 东风柳州汽车有限公司 Driver fatigue behavior monitoring method, device, equipment and storage medium
CN115840877A (en) * 2022-12-06 2023-03-24 中国科学院空间应用工程与技术中心 Distributed stream processing method and system for MFCC extraction, storage medium and computer

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FI19992351A (en) * 1999-10-29 2001-04-30 Nokia Mobile Phones Ltd voice recognizer
JP3699912B2 (en) * 2001-07-26 2005-09-28 株式会社東芝 Voice feature extraction method, apparatus, and program

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101702314B (en) * 2009-10-13 2011-11-09 清华大学 Method for establishing identified type language recognition model based on language pair
CN103295583A (en) * 2012-02-24 2013-09-11 佳能株式会社 Method and equipment for extracting sub-band energy features of sound and monitoring system
CN103295583B (en) * 2012-02-24 2015-09-30 佳能株式会社 For extracting the method for the sub belt energy feature of sound, equipment and surveillance
CN102723081A (en) * 2012-05-30 2012-10-10 林其灿 Voice signal processing method, voice and voiceprint recognition method and device
CN102723081B (en) * 2012-05-30 2014-05-21 无锡百互科技有限公司 Voice signal processing method, voice and voiceprint recognition method and device
CN103021407A (en) * 2012-12-18 2013-04-03 中国科学院声学研究所 Method and system for recognizing speech of agglutinative language
CN103021407B (en) * 2012-12-18 2015-07-08 中国科学院声学研究所 Method and system for recognizing speech of agglutinative language
CN104992424B (en) * 2015-07-27 2018-05-25 北京航空航天大学 A kind of single pixel based on discrete cosine transform quickly imaging system
CN104992424A (en) * 2015-07-27 2015-10-21 北京航空航天大学 Single-pixel rapid active imaging system based on discrete cosine transform
CN105068048A (en) * 2015-08-14 2015-11-18 南京信息工程大学 Distributed microphone array sound source positioning method based on space sparsity
CN106205638A (en) * 2016-06-16 2016-12-07 清华大学 A kind of double-deck fundamental tone feature extracting method towards audio event detection
CN106205638B (en) * 2016-06-16 2019-11-08 清华大学 A kind of double-deck fundamental tone feature extracting method towards audio event detection
CN109036458A (en) * 2018-08-22 2018-12-18 昆明理工大学 A kind of multilingual scene analysis method based on audio frequency characteristics parameter
CN114067834A (en) * 2020-07-30 2022-02-18 中国移动通信集团有限公司 Bad preamble recognition method and device, storage medium and computer equipment
CN112530407A (en) * 2020-11-25 2021-03-19 北京快鱼电子股份公司 Language identification method and system
CN112530407B (en) * 2020-11-25 2021-07-23 北京快鱼电子股份公司 Language identification method and system
CN114209325A (en) * 2021-12-23 2022-03-22 东风柳州汽车有限公司 Driver fatigue behavior monitoring method, device, equipment and storage medium
CN114209325B (en) * 2021-12-23 2023-06-23 东风柳州汽车有限公司 Driver fatigue behavior monitoring method, device, equipment and storage medium
CN115840877A (en) * 2022-12-06 2023-03-24 中国科学院空间应用工程与技术中心 Distributed stream processing method and system for MFCC extraction, storage medium and computer

Also Published As

Publication number Publication date
CN101256768B (en) 2011-03-30

Similar Documents

Publication Publication Date Title
CN101256768B (en) Time frequency two-dimension converse spectrum characteristic extracting method for recognizing language species
CN106847292B (en) Method for recognizing sound-groove and device
CN102968986B (en) Overlapped voice and single voice distinguishing method based on long time characteristics and short time characteristics
Tiwari MFCC and its applications in speaker recognition
Thomas et al. Cross-lingual and multi-stream posterior features for low resource LVCSR systems.
US6370504B1 (en) Speech recognition on MPEG/Audio encoded files
CN100514446C (en) Pronunciation evaluating method based on voice identification and voice analysis
CN101226743A (en) Method for recognizing speaker based on conversion of neutral and affection sound-groove model
CN102800316A (en) Optimal codebook design method for voiceprint recognition system based on nerve network
CN102737633A (en) Method and device for recognizing speaker based on tensor subspace analysis
CN102789779A (en) Speech recognition system and recognition method thereof
CN104732972A (en) HMM voiceprint recognition signing-in method and system based on grouping statistics
CN101887722A (en) Rapid voiceprint authentication method
CN1787070B (en) On-chip system for language learner
CN101546555A (en) Constraint heteroscedasticity linear discriminant analysis method for language identification
CN103258537A (en) Method utilizing characteristic combination to identify speech emotions and device thereof
CN106297769B (en) A kind of distinctive feature extracting method applied to languages identification
Samal et al. On the use of MFCC feature vector clustering for efficient text dependent speaker recognition
CN104240699A (en) Simple and effective phrase speech recognition method
CN103778914A (en) Anti-noise voice identification method and device based on signal-to-noise ratio weighing template characteristic matching
Al-Rawahy et al. Text-independent speaker identification system based on the histogram of DCT-cepstrum coefficients
Liu et al. Supra-Segmental Feature Based Speaker Trait Detection.
Bansod et al. Speaker Recognition using Marathi (Varhadi) Language
Meghanani et al. Pitch-synchronous DCT features: A pilot study on speaker identification
Sailaja et al. Text independent speaker identification with finite multivariate generalized gaussian mixture model and hierarchical clustering algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20161223

Address after: 100084 Zhongguancun Haidian District East Road No. 1, building 8, floor 8, A803B,

Patentee after: Beijing Hua Chong Chong Information Technology Co., Ltd.

Address before: 100084 Beijing 100084-82 mailbox

Patentee before: Qinghua UNiversity

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200317

Address after: 100084 Tsinghua University, Beijing, Haidian District

Patentee after: TSINGHUA University

Address before: 100084 Zhongguancun Haidian District East Road No. 1, building 8, floor 8, A803B,

Patentee before: BEIJING HUA KONG CHUANG WEI INFORMATION TECHNOLOGY Co.,Ltd.

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110330

Termination date: 20210403