CN103871411A - Text-independent speaker identifying device based on line spectrum frequency difference value - Google Patents
Text-independent speaker identifying device based on line spectrum frequency difference value Download PDFInfo
- Publication number
- CN103871411A CN103871411A CN201410134694.8A CN201410134694A CN103871411A CN 103871411 A CN103871411 A CN 103871411A CN 201410134694 A CN201410134694 A CN 201410134694A CN 103871411 A CN103871411 A CN 103871411A
- Authority
- CN
- China
- Prior art keywords
- model
- parameter
- line spectral
- super
- sup
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a text-independent speaker identifying device based on a line spectrum frequency difference value. A method comprises the following steps: the feature extraction step, wherein a line spectrum frequency parameter is converted into the line spectrum frequency parameter difference value through linear conversion, a generation line spectrum frequency characteristic supervector is formed by combining a current frame, a front adjacent frame and a rear adjacent frame; the model training step, wherein distribution of the characteristic supervector is simulated by utilizing a super Dirichlet mixed model, and a parameter in the model is solved; the identifying step, wherein regarding a voice sequence of an identified person, characteristics are abstracted according the step one, then the model obtained in the step two is input, a likelihood value of each probabilistic model is calculated, the largest likelihood value is obtained, and a number of the speaker is confirmed. By means of the method, the text-independent speaker identification rate can be improved, and high practical value is achieved.
Description
Technical field
The present invention has described emphatically the Speaker Recognition System of the text-independent of a kind of line spectral frequencies parameter based on linear transformation and super Di Li Cray mixture model.
Background technology
Along with the development of computer technology, utilize people's biological characteristic (as fingerprint, vocal print, face) to carry out identification or confirmation has very important research and using value.Speaker Identification is according to the speak speech parameter of feature of human physiology and behavior of reflection in speech waveform, automatically confirms that speaker whether in recorded words person's set, further confirms speaker's identity.Speaker Identification comprises that again speaker differentiates two parts with speaker verification.Speaker's identification system generally includes three parts: extraction can represent speaker's feature, and each speaker is trained to an independently model that meets the statistical law of its selected feature, finally makes a policy with the model obtaining by relatively inputting data.
Extract feature for Part I, be the good method of effect in current Speaker Identification based on sound channel signature analysis voice signal, conventional feature mainly contains: Mel-cepstrum coefficient (MFCC:Mel-frequency Cepstral Coefficients) and linear spectral coefficient (LSF:Line Spectral Frequencies).Traditional Mel-cepstrum coefficient (MFCC) vector is expressed multidate information by the method for difference, and the present invention adopts the feature super vector that line spectral frequencies difference represents to preserve original neighborhood information.What in addition, method of the present invention had also considered that Mel-cepstrum coefficient (MFCC) ignores differentiates the useful high-frequency information of speaker to machine.
In recognition methods, can be divided three classes at present: template matching method, probability model method, and Artificial Neural Network.Probability model adopts certain probability density function to describe the distribution situation of speaker's speech feature space, and using one group of parameter of this probability density function as speaker model.Gauss hybrid models (GMM:Gaussian Mixture Model) is due to the simple Speaker Recognition System that has efficiently been widely used in text-independent.But super Di Li Cray mixture model of the present invention (SDMM:super-Dirichlet Mixture Model) can better be described boundedness and the order of extracted feature.
According to the difference of identifying object, Speaker Identification can be divided into text dependent and text-independent two classes.The wherein speaker Recognition Technology of text dependent, requires speaker's the keyword of pronunciation and crucial sentence as training text, when identification according to identical content pronunciation.The speaker Recognition Technology of text-independent, in the time of training or in the time of identification, do not specify no matter be the content of speaking, identifying object is voice signal freely, need in voice signal freely, find feature and the method for the information that can characterize speaker, therefore sets up speaker model difficulty relatively.In addition, the easy stolen record of the recognition system of text dependent is emitted and is recognized, and uses inconvenience, and described in the invention is the recognition system of text-independent.
Summary of the invention
In order to solve the existing defect of above-mentioned technology and to improve speaker's resolution of text-independent, the invention provides the Speaker Identification device of the text-independent of a kind of line spectral frequencies parameter based on linear transformation and super Di Li Cray mixture model.
For achieving the above object, the method for distinguishing speek person of the text-independent that the present invention proposes comprises the following steps:
One, characteristic extraction step
A, line spectral frequencies parameter transformation step: in the linear coded prediction model of voice, be converted into line spectral frequencies parameter difference by linear transformation by line spectral frequencies parameter;
B, generation line spectral frequencies feature super vector step: form a feature super vector in conjunction with present frame two frames adjacent with its front and back and express multidate information.
Two. model training step: the frame sequence training pattern that is T by length to each speaker, use the distribution of super Di Li Cray mixture model (SDMM:super-Dirichlet Mixture Model) simulation feature super vector, solve an equation and obtain the parameter alpha in model by gradient method, finally obtain a series of models, the corresponding speaker of each model.
Three. differentiate coupling step: get in a series of probability models that the speech samples input of certain speaker in training set trained, adopt method transformation parameter and generating feature super vector in step 1, calculate the likelihood value for each probability model by the model that trains in step 2, get wherein maximum likelihood value and confirm speaker's numbering.
According to speaker's discrimination method of a kind of and text-independent of an embodiment of the invention, in line spectral frequencies parameter transformation step described in steps A, utilize the 1. non-negative characteristic of line spectral frequencies parameter, 2. in order characteristic and 3. bounded characteristic be transformed to linear spectral parameter difference Δ LSF, being characterized as of this difference: be 1. distributed in (0,1), in open interval, 2. add and be 1.This step detailed process is as follows:
1) K dimension line spectral frequencies Parametric Representation is s=[s
1, s
2..., s
k]
t, meet 0 < s
1< s
2< ..., s
k< π;
2) dimension of the K+1 after conversion line spectral frequencies parameter difference Δ LSF is
wherein
According to speaker's discrimination method of a kind of and text-independent of an embodiment of the invention, generation line spectral frequencies feature super vector step described in step B combines present frame x (t) and its consecutive frame to form a super vector, express multidate information with this, this super vector comprises three subvectors in the present invention.The interval of supposing present frame and former frame and a rear frame is all τ, only considers former frame x (t-τ) and two neighborhood frames of a rear frame x (t-τ) of present frame here, and the feature super vector of generation is 3 (K+1) dimension.Its detailed process is as follows:
1) K+1 dimension line spectral frequencies parameter difference vector x (t)=[x
1,1, x
1,2..., x
1, K+1]
t;
2) the super vector result that comprises multidate information is:
According to speaker's discrimination method of a kind of and text-independent of an embodiment of the invention, the detailed step of the model training described in step 2 is:
1) x
supin each feature subvector x (t), x (t-τ), x (t+ τ) is separate and meet Dirichlet distribute, super vector x
supmeet super Di Li Cray probability density distribution:
2) for the line spectral frequencies parameter difference subvector x (1) of a sequential ..., x (t) ..., x (T), has X=[x
sup(1) ..., x
sup(T)], carry out artificial line spectral frequency parameter difference with super Di Li Cray mixture model (SDMM):
Wherein, weight factor
3) computation model parameter, for m mixed components, parameter vector α
mbe divided into 3 subvectors, the corresponding x of each parameter subvector
supin a subvector.So we can obtain all parameters by solution equation below:
Beneficial effect of the present invention is, in terms of existing technologies, the line spectral frequencies parameter super vector of the present invention's application conversion is extracted as speaker's feature, by super Di Li Cray mixed distribution training pattern, provide again complete implementation system for application, test findings has been verified high efficiency of the present invention, has very strong practicality.
Brief description of the drawings
Fig. 1 is the flow chart of steps of method provided by the invention;
Fig. 2 is the flow chart of steps of line spectral frequencies parameter transformation;
Fig. 3 is the flow chart of steps of constitutive characteristic super vector.
Embodiment
Below in conjunction with accompanying drawing, specific embodiments of the present invention is described in detail.
Fig. 1 is process flow diagram of the present invention, and wherein dotted line represents training department's point flow process trend, and solid line represents to differentiate part flow process trend, comprises the following steps:
The first step: characteristic extraction step, speaker's voice sequence for the treatment of training carries out feature extraction
Step S1: line spectral frequencies parameter is converted to line spectral frequencies parameter difference;
Step S2: generate line spectral frequencies feature super vector;
Second step: training pattern
Step S3: use the distribution of super Di Li Cray mixture model simulation feature super vector, and solve the parameter in model;
The 3rd step: discrimination process
Speaker's voice sequence to be identified is repeated to step S1 and the step S2 generating feature super vector in the first step, and input step S3 trains the model obtaining.
Step S4: calculate the likelihood value for each probability model, obtain maximum likelihood value, confirm speaker's numbering.
To be specifically described each step below:
Step S1 realizes line spectral frequencies parameter transformation, and the line spectral frequencies parameter of the linear coded prediction model of voice is converted into line spectral frequencies parameter difference by linear transformation.It is as follows that Fig. 2 has provided the idiographic flow of the method:
1) input: line spectral frequencies parameter s=[s
1, s
2..., s
k]
t;
2), in step 11, by i, from 1 to K+1 circulation, the difference at every turn obtaining is as follows:
3) output: line spectral frequencies parameter
Step S2 generates line spectral frequencies feature super vector, by present frame x (t) and its former and later two consecutive frames super vector of formation that combines, expresses multidate information with this.The interval of supposing present frame and former frame and a rear frame is all τ, this super vector comprises three subvectors in the present invention: present frame x (t), former frame x (t-τ) and a rear frame x (t-τ), the feature super vector of generation is 3 (K+1) dimension.Fig. 3 provides its idiographic flow schematic diagram, and step is as follows:
1) input: K+1 dimension line spectral frequencies parameter difference vector x (t)=[x
1,1, x
1,2..., x
1, K+1]
t;
2) output:
Step S3 uses the distribution of super Di Li Cray mixture model simulation feature super vector, and solves the parameter in model.Detailed step is:
1) x
supin each feature subvector x (t), x (t-τ), x (t+ τ) is separate and meet Dirichlet distribute, super vector x
supmeet super Dirichlet distribute:
Wherein α
1, k, α
2, k, α
3, kit is parameter subvector.
2) for the line spectral frequencies parameter difference subvector x (1) of a sequential ..., x (t) ..., x (T), has X=[x
sup(1) ..., x
sup(T)],, by containing the super mixing Di Li Cray model (SDMM) of M component, can obtain the probability of object vector:
Wherein, weight factor
π
mthe non-negative weight of m component, and
3) computation model parameter, for m mixed components, parameter vector α
mbe divided into 3 subvectors, the corresponding x of each parameter subvector
supin a subvector.So we can obtain all parameters by solution equation below:
Step S4, in the time differentiating, trains phonetic entry to be identified in a series of models of all speakers that obtain to step S3, determine that the speaker who is differentiated is numbered the numbering of the model of likelihood value maximum.
Below the nonlinear optimization packet loss method of estimation to proposed speech linear predictive model and the embodiment of each module are set forth by reference to the accompanying drawings.By the description of above embodiment, one of ordinary skill in the art can clearly recognize that the mode that the present invention can add essential general hardware platform by software realizes, and can certainly realize by hardware, but the former is better embodiment.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of computer software product, this software product is stored in a storage medium, comprises that some instructions are in order to make one or more computer equipment carry out the method described in each embodiment of the present invention.
According to thought of the present invention, all will change in specific embodiments and applications.In sum, this description should not be construed as limitation of the present invention.
Above-described embodiment of the present invention, does not form the restriction to invention protection domain.Any amendment of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.
Claims (4)
1. a method for distinguishing speek person for the text-independent of the line spectral frequencies parameter based on linear transformation and super Di Li Cray mixture model, is characterized in that, comprises the following steps:
One. characteristic extraction step:
A, line spectral frequencies parameter transformation step: in the linear coded prediction model of voice, be converted into line spectral frequencies parameter difference by linear transformation by line spectral frequencies parameter;
B, generation line spectral frequencies feature super vector step: form a feature super vector in conjunction with present frame two frames adjacent with its front and back and express multidate information.
Two. model training step: the frame sequence training pattern that is T by length to each speaker, use the distribution of super Di Li Cray mixture model (SDMM:super-Dirichlet Mixture Model) simulation feature super vector, solve an equation and obtain the parameter alpha in model by gradient method, finally obtain a series of models, the corresponding speaker of each model.
Three. differentiate coupling step: get in a series of probability models that the speech samples input of certain speaker in training set trained, adopt method transformation parameter and generating feature super vector in step 1, calculate the likelihood value for each probability model by the model that trains in step 2, get wherein maximum likelihood value and confirm speaker's numbering.
2. speaker's discrimination method of a kind of and text-independent as claimed in claim 1, is characterized in that, the line spectral frequencies parameter transformation step described in steps A is:
1) K dimension line spectral frequencies Parametric Representation is s=[s
1, s
2..., s
k]
t, meet 0 < s
1< s
2< ..., s
k< π;
2) dimension of the K+1 after conversion line spectral frequencies parameter difference Δ LSF is
wherein
3. speaker's discrimination method of a kind of and text-independent as claimed in claim 1, generation line spectral frequencies feature super vector step described in step B combines present frame x (t) and its consecutive frame to form a super vector, express multidate information with this, this super vector comprises three subvectors in the present invention.The interval of supposing present frame and former frame and a rear frame is all τ, only considers former frame x (t-τ) and two neighborhood frames of a rear frame x (t-τ) of present frame here, and the feature super vector of generation is 3 (K+1) dimension.Its detailed process is as follows:
1) K+1 dimension line spectral frequencies parameter difference vector x (t)=[x
1,1, x
1,2..., x
1, K+1]
t;
2) the super vector result that comprises multidate information is:
4. speaker's discrimination method of a kind of and text-independent as claimed in claim 1, the detailed step of the model training described in step 2 is:
1) x
supin each feature subvector x (t), x (t-τ), x (t+ τ) is separate and meet Dirichlet distribute, super vector x
supmeet super Di Li Cray probability density distribution:
2) for the line spectral frequencies parameter difference subvector x (1) of a sequential ..., x (t) ..., x (T), has X=[x
sup(1) ..., x
sup(T)], carry out artificial line spectral frequency parameter difference with super Di Li Cray mixture model (SDMM):
Wherein, weight factor
3) computation model parameter, for m mixed components, parameter vector α
mbe divided into 3 subvectors, the corresponding x of each parameter subvector
supin a subvector.So we can obtain all parameters by solution equation below:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410134694.8A CN103871411A (en) | 2014-04-03 | 2014-04-03 | Text-independent speaker identifying device based on line spectrum frequency difference value |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410134694.8A CN103871411A (en) | 2014-04-03 | 2014-04-03 | Text-independent speaker identifying device based on line spectrum frequency difference value |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103871411A true CN103871411A (en) | 2014-06-18 |
Family
ID=50909875
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410134694.8A Pending CN103871411A (en) | 2014-04-03 | 2014-04-03 | Text-independent speaker identifying device based on line spectrum frequency difference value |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103871411A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108630207A (en) * | 2017-03-23 | 2018-10-09 | 富士通株式会社 | Method for identifying speaker and speaker verification's equipment |
CN108694949A (en) * | 2018-03-27 | 2018-10-23 | 佛山市顺德区中山大学研究院 | Method for distinguishing speek person and its device based on reorder super vector and residual error network |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103207961A (en) * | 2013-04-23 | 2013-07-17 | 曙光信息产业(北京)有限公司 | User verification method and device |
CN103685185A (en) * | 2012-09-14 | 2014-03-26 | 上海掌门科技有限公司 | Mobile equipment voiceprint registration and authentication method and system |
-
2014
- 2014-04-03 CN CN201410134694.8A patent/CN103871411A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103685185A (en) * | 2012-09-14 | 2014-03-26 | 上海掌门科技有限公司 | Mobile equipment voiceprint registration and authentication method and system |
CN103207961A (en) * | 2013-04-23 | 2013-07-17 | 曙光信息产业(北京)有限公司 | User verification method and device |
Non-Patent Citations (1)
Title |
---|
ZHANYU MA, ARNE LEIJON: "Super-Dirichlet Mixture Models Using Differential Line Spectral Frequencies for Text-Independent Speaker Identification", 《INTERSPEECH 2011》, 27 August 2011 (2011-08-27), pages 2360 - 2363 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108630207A (en) * | 2017-03-23 | 2018-10-09 | 富士通株式会社 | Method for identifying speaker and speaker verification's equipment |
CN108694949A (en) * | 2018-03-27 | 2018-10-23 | 佛山市顺德区中山大学研究院 | Method for distinguishing speek person and its device based on reorder super vector and residual error network |
CN108694949B (en) * | 2018-03-27 | 2021-06-22 | 佛山市顺德区中山大学研究院 | Speaker identification method and device based on reordering supervectors and residual error network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103310788B (en) | A kind of voice information identification method and system | |
CN102800316B (en) | Optimal codebook design method for voiceprint recognition system based on nerve network | |
CN101923855A (en) | Test-irrelevant voice print identifying system | |
Gunawan et al. | A review on emotion recognition algorithms using speech analysis | |
CN102820033A (en) | Voiceprint identification method | |
CN101226743A (en) | Method for recognizing speaker based on conversion of neutral and affection sound-groove model | |
Zhang et al. | Durian-sc: Duration informed attention network based singing voice conversion system | |
CN102968986A (en) | Overlapped voice and single voice distinguishing method based on long time characteristics and short time characteristics | |
Shahin | Speaker identification in emotional talking environments based on CSPHMM2s | |
CN102664010B (en) | Robust speaker distinguishing method based on multifactor frequency displacement invariant feature | |
CN101609672B (en) | Speech recognition semantic confidence feature extraction method and device | |
CN110047501B (en) | Many-to-many voice conversion method based on beta-VAE | |
Casale et al. | Multistyle classification of speech under stress using feature subset selection based on genetic algorithms | |
CN110085254A (en) | Multi-to-multi phonetics transfer method based on beta-VAE and i-vector | |
Abdallah et al. | Text-independent speaker identification using hidden Markov model | |
CN101419799A (en) | Speaker identification method based mixed t model | |
Chauhan et al. | Emotion recognition using LP residual | |
CN106875944A (en) | A kind of system of Voice command home intelligent terminal | |
CN103871411A (en) | Text-independent speaker identifying device based on line spectrum frequency difference value | |
Koolagudi et al. | Speaker recognition in the case of emotional environment using transformation of speech features | |
Iqbal et al. | Voice Recognition using HMM with MFCC for Secure ATM | |
Wu et al. | Non-parallel voice conversion system with wavenet vocoder and collapsed speech suppression | |
CN103985384B (en) | Text-independent speaker identification device based on random projection histogram model | |
Prajapati et al. | Feature extraction of isolated gujarati digits with mel frequency cepstral coefficients (mfccs) | |
Bansod et al. | Speaker Recognition using Marathi (Varhadi) Language |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20140618 |