CN105139855A - Speaker identification method with two-stage sparse decomposition and device - Google Patents
Speaker identification method with two-stage sparse decomposition and device Download PDFInfo
- Publication number
- CN105139855A CN105139855A CN201410231798.0A CN201410231798A CN105139855A CN 105139855 A CN105139855 A CN 105139855A CN 201410231798 A CN201410231798 A CN 201410231798A CN 105139855 A CN105139855 A CN 105139855A
- Authority
- CN
- China
- Prior art keywords
- dictionary
- speaker
- sparse decomposition
- voice
- sparse
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The present invention relates to a speaker identification method with two-stage sparse decomposition. The method comprises a step of (S1) carrying out framing and windowing processing on the discrete-time signal of inputted speech, (S2) carrying out discrete Fourier transform on each frame of signal and obtaining an amplitude value, and extracting the amplitude value as a characteristic, (S3) constructing a large dictionary, (S4) carrying out first stage sparse decomposition to obtain the sparse representation of the speech to be identified in the large dictionary, and carrying out sparse classification on the inputted speech to obtain a part of targeted speaker dictionary, (S5) splicing a part of targeted speaker dictionary, carrying out second phase sparse decomposition and using sparse representation to confirm a final identified speaker. According to the method, different speakers can be identified, and the method has the advantages of high efficiency, accuracy and usability of the identification of speaker identification. The invention also discloses a speaker identification device with two-stage sparse decomposition.
Description
Technical field
The present invention relates to the Speaker Identification field in Speech processing, particularly relate to a kind of method for distinguishing speek person and device of two benches Its Sparse Decomposition.
Background technology
At present, Speaker Identification at identity verify, network monitoring, the field such as telephone monitoring and information security extensive application.Through the extensive research of decades, typical recognition system is as Gaussian Mixture-universal background model (GMM-UBM) method, the methods such as gauss hybrid models-support vector machine method (GMM-SVM) and associating factor analysis, achieve satisfied effect under ideal conditions.But in a noisy environment, its performance will sharply decline, and which has limited the widespread use of these technology.
Researchers propose two class methods to strengthen the noise robustness of Speaker Identification.First kind method extracts the feature to noise robustness, such as linear predictor coefficient (LPCC), mel cepstrum coefficients (MFCC) and perception linear predictor coefficient (PLP) etc.The raising that these methods are only limited, only represents voice because do not have feature to have and does not represent the selective power of noise.Equations of The Second Kind method is having in noise speech the method adopting enhancing to remove noise, such as spectrum subtraction and Wiener filtering, then from the voice strengthened, extracting feature.Unfortunately, most noise is non-stable, and even some noise is as voice, is difficult to its modeling and estimates.As a result, sound enhancement method inevitably causes larger distortion, and this have impact on current method for distinguishing speek person, and therefore, people wish to have new technology to solve this difficult problem.
In the past in the several years, sparse coding is studied widely, for the Speaker Identification under noise circumstance provides possible solution.This technology one group of atom (primitive signal) represents signal, and the set of atom is being called as dictionary.By sparse coding, represent the whole of signal or most information with the linear combination of a small amount of atom.Recently, a sparse coding method being called anatomic element analysis (MCA) is employed successfully in Speaker Identification.Based on this technology, each speaker prepares a dictionary, and all speaker's dictionaries are spliced into a big dictionary.In identification, the voice of test are sparsely represented by big dictionary.In theory, speaker's word only can be represented by the dictionary of this speaker, and therefore, rarefaction representation can be directly used in classification.
The method of nearly all Speaker Identification has all used the framework of MCA, first these methods change training utterance into the vector of the super vector of GMM average or total variation, then these vector compositions big dictionary, Its Sparse Decomposition and classification is carried out with this big dictionary.It is reported, these methods have better performance than traditional GMM-UBM and GMM-SVM method.But these methods are not still considered to compensate noise, it reduce the discrimination of these methods under noise situations.
Summary of the invention
Technical matters to be solved by this invention is, for the deficiencies in the prior art, how to provide a kind of method to consider to compensate the noise in Speaker Identification, improves the key issue of the discrimination of speaker's voice under noise situations.
For this purpose, the present invention proposes a kind of method for distinguishing speek person of two benches Its Sparse Decomposition, comprise concrete following steps:
S1: framing and windowing process are carried out to the discrete-time signal of the voice of input;
S2: discrete Fourier transform (DFT) is done to each frame signal and asks range value, amplitude spectrum is extracted as feature;
S3: build a big dictionary, wherein, described big dictionary comprises general background dictionary, the characteristics dictionary of different speaker and noise dictionary;
S4: carry out first stage Its Sparse Decomposition to obtain the rarefaction representation of voice to be identified on described big dictionary, and dictionary input voice being done to rough sort obtaining portion partial objectives for speaker;
S5: splice described partial target speaker dictionary, carry out subordinate phase Its Sparse Decomposition, utilizes rarefaction representation to confirm finally to identify speaker.
Particularly, described windowing process is Hamming window, Hanning window or rectangular window.
For this purpose, the invention allows for a kind of Speaker Identification device of two benches Its Sparse Decomposition, comprising:
Framing and windowing module, the discrete-time signal for the voice to input carries out framing and windowing process;
Characteristic extracting module, for making discrete Fourier transform (DFT) to each frame signal and asking range value, extracts amplitude spectrum as feature;
Build dictionary module, for building a big dictionary, wherein, described big dictionary comprises general background dictionary, the characteristics dictionary of different speaker and noise dictionary;
First stage Its Sparse Decomposition module, for carrying out first stage Its Sparse Decomposition to obtain the rarefaction representation of voice to be identified on described big dictionary, and makes the dictionary of rough sort obtaining portion partial objectives for speaker to input voice;
Subordinate phase Its Sparse Decomposition module, for splicing described partial target speaker dictionary, carries out subordinate phase Its Sparse Decomposition, utilizes rarefaction representation to confirm finally to identify speaker.
Particularly, described windowing process is Hamming window, Hanning window or rectangular window.
The method for distinguishing speek person of a kind of two benches Its Sparse Decomposition disclosed in this invention, by first to input voice framing, windowing, and after making discrete Fourier transform (DFT) to each frame voice, asks amplitude spectrum as phonetic feature; Then one is built by common background dictionary, the big dictionary of the dictionary of different speaker and noise dictionary composition; Then voice to be identified are sparsely represented on big dictionary, and with rarefaction representation, rude classification is done to voice to be identified, obtain the dictionary of a small amount of target speaker; Finally these dictionaries are spliced again after becoming a big dictionary, input voice are represented on this big dictionary, and do last classification to determine speaker ' s identity with rarefaction representation.The present invention can identify different speakers.There is the beneficial effect of high efficiency, accuracy and the ease for use identifying speaker ' s identity.The invention also discloses a kind of Speaker Identification device of two benches Its Sparse Decomposition.
Accompanying drawing explanation
Can understanding the features and advantages of the present invention clearly by reference to accompanying drawing, accompanying drawing is schematic and should not be construed as and carry out any restriction to the present invention, in the accompanying drawings:
Fig. 1 shows the flow chart of steps of the method for distinguishing speek person of a kind of two benches Its Sparse Decomposition in the embodiment of the present invention;
Fig. 2 shows the flow example figure of the method for distinguishing speek person of a kind of two benches Its Sparse Decomposition in the embodiment of the present invention;
Fig. 3 shows the structural drawing of the Speaker Identification device of a kind of two benches Its Sparse Decomposition in the embodiment of the present invention.
Embodiment
Below in conjunction with accompanying drawing, embodiments of the present invention is described in detail.
As shown in Figure 1, the invention provides a kind of method for distinguishing speek person of two benches Its Sparse Decomposition, comprise concrete following steps:
Step S1: carry out framing and windowing process to the discrete-time signal of the voice of input, wherein, windowing process is Hamming window, Hanning window or rectangular window.
Step S2: discrete Fourier transform (DFT) is done to each frame signal and asks range value, amplitude spectrum is extracted as feature.
Step S3: build a big dictionary, wherein, big dictionary comprises general background dictionary, the characteristics dictionary of different speaker and noise dictionary.
Step S4: carry out first stage Its Sparse Decomposition to obtain the rarefaction representation of voice to be identified on described big dictionary, and dictionary input voice being done to rough sort obtaining portion partial objectives for speaker.
Step S5: splice described partial target speaker dictionary, carry out subordinate phase Its Sparse Decomposition, utilizes rarefaction representation to confirm finally to identify speaker.
As shown in Figure 2, the invention provides a kind of flow example figure of method for distinguishing speek person of two benches Its Sparse Decomposition.
Particularly, the object of pre-emphasis is the impact of minimizing sharp noise and promotes HFS, does pre-emphasis process undertaken by following formula given signal y (n):
z(n)=y(n)-0.97y(n-1)(1)
Here pre emphasis factor is 0.97.Then windowing can be Hamming window, Hanning window or rectangular window.Research shows, Hamming window has better Frequency Response than rectangle, can alleviate and leak phenomenon frequently.Windowing process is expressed as:
S
p(n)=z(n)*W(n)(2)
Above formula represents the convolution of z (n) and W (n).
Wherein,
Wherein n is time sequence number, and M is that window is long.
Then, discrete Fourier transform (DFT) (DiscreteFourierTransform, DFT) is carried out
S in formula
p(n) for the p frame voice signal after windowing, p be frame number, N represents counting of Fourier transform.
Dictionary prepares.One is had to the Speaker Recognition System of K different speaker, we devise the big dictionary that has new structure.
Ψ=[Φ
0,Φ
1,Φ
2,...,Φ
K,Φ
v](5)
Wherein, Φ
0be a general background dictionary, comprise the common feature of all speakers.Here we have used for reference the idea that GMM-UBM UBM carrys out simulation background.Φ
i(i=1 ..., K) be used to the dictionary of simulation i-th speaker's variability (feature).Φ
vthat noise dictionary is used for simulated environment noise.All atoms in Ψ are all standardized into unit norm vector.K-SVD is used for training dictionary, and dictionary training problem is described as:
Wherein, Y=[Y
1, Y
2..., Y
m] be the data set of training, each Y
ibe all the proper vector of speech frame, Φ is dictionary, X=[x
1, x
2..., x
m] be one group of sparse vector corresponding to Y, T
0it is sparse thresholding.The general background dictionary voice of a large amount of unlabelled different speakers are trained.Each Φ
iall use the voice of the i-th speaker to train, do initial value with Ψ.
The Its Sparse Decomposition problem solved in formula (4), this problem is proved to be a NP-hard, all can not thoroughly be solved completely by all possible sparse subset.If x is sparse or approximate sparse, it is by separating formula (5) by unique decision.
Wherein, λ >0 is regularization parameter, and equation also with reference to base and follows the trail of denoising (basispursuitdenoising, BPDN).
For each speech frame Y, separate formula (5) and obtain sparse coefficient y, carry out first time Its Sparse Decomposition.
The method of second time Its Sparse Decomposition and classification, first, calculates c
i=|| δ
i(y) ||
1, i=1,2 ..., K, δ
i(.) gives us a vector, and unique nonzero term of this vector is from i-th classification, namely the item zero setting of non-i-th classification in sparse coefficient y, with ascending order arrangement c
i;
Then selecting the dictionary of Q speaker, is exactly c after sequence
ithe dictionary of the maximal value of a Q corresponded to, forms a large dictionary
Separate above-mentioned formula (6) and obtain y.Through type (7) confirms this speech frame belongs to which speaker
The method for distinguishing speek person of a kind of two benches Its Sparse Decomposition disclosed in this invention, the Advantages found of said method both ways.On the one hand: the compensation becoming noise when to consider pair.This method devises a noise dictionary with noise change, can become noise during effective compensation; On the other hand: adopt two stage Its Sparse Decomposition, further increase accuracy of identification.In decomposition, the competition of speaker's atom may occur, causes method hydraulic performance decline.In order to solve this difficult problem, we have proposed a kind of method for distinguishing speek person of two benches Its Sparse Decomposition, namely after first stage Its Sparse Decomposition, the dictionary of the speaker being used to atom in big dictionary is stitched together, carry out subordinate phase Its Sparse Decomposition again, finally be shown as with secondary sparse table and confirm classification, two stage Its Sparse Decomposition method is obviously better than current method for distinguishing speek person.
As shown in Figure 3, the invention provides a kind of structural drawing of Speaker Identification device of two benches Its Sparse Decomposition.
Particularly, framing and windowing module 101 are for carrying out framing and windowing process to the discrete-time signal of the voice inputted, and wherein, windowing process is Hamming window, Hanning window or rectangular window; Characteristic extracting module 102, for making discrete Fourier transform (DFT) to each frame signal and asking range value, is extracted amplitude spectrum as feature; Build dictionary module 103 for building a big dictionary, wherein, described big dictionary comprises general background dictionary, the characteristics dictionary of different speaker and noise dictionary; First stage Its Sparse Decomposition module 104 for carrying out first stage Its Sparse Decomposition to obtain the rarefaction representation of voice to be identified on described big dictionary, and makes the dictionary of rough sort obtaining portion partial objectives for speaker to input voice; Subordinate phase Its Sparse Decomposition module 105, for splicing described partial target speaker dictionary, carries out subordinate phase Its Sparse Decomposition, utilizes rarefaction representation to confirm finally to identify speaker.
The method for distinguishing speek person of a kind of two benches Its Sparse Decomposition disclosed in this invention, by first to input voice framing, windowing, and after making discrete Fourier transform (DFT) to each frame voice, asks amplitude spectrum as phonetic feature; Then one is built by common background dictionary, the big dictionary of the dictionary of different speaker and noise dictionary composition; Then voice to be identified are sparsely represented on big dictionary, and with rarefaction representation, rude classification is done to voice to be identified, obtain the dictionary of a small amount of target speaker; Finally these dictionaries are spliced again after becoming a big dictionary, input voice are represented on this big dictionary, and do last classification to determine speaker ' s identity with rarefaction representation.The present invention can identify different speakers.There is the beneficial effect of high efficiency, accuracy and the ease for use identifying speaker ' s identity.The invention also discloses a kind of Speaker Identification device of two benches Its Sparse Decomposition.
Above embodiment is only for illustration of the present invention; and be not limitation of the present invention; the those of ordinary skill of relevant technical field; without departing from the spirit and scope of the present invention; can also make a variety of changes and modification; therefore all equivalent technical schemes also belong to category of the present invention, and scope of patent protection of the present invention should be defined by the claims.
Although describe embodiments of the present invention by reference to the accompanying drawings, but those skilled in the art can make various modifications and variations without departing from the spirit and scope of the present invention, such amendment and modification all fall into by within claims limited range.
Claims (4)
1. a method for distinguishing speek person for two benches Its Sparse Decomposition, is characterized in that, comprises concrete following steps:
S1: framing and windowing process are carried out to the discrete-time signal of the voice of input;
S2: discrete Fourier transform (DFT) is done to each frame signal and asks range value, amplitude spectrum is extracted as feature;
S3: build a big dictionary, wherein, described big dictionary comprises general background dictionary, the characteristics dictionary of different speaker and noise dictionary;
S4: carry out first stage Its Sparse Decomposition to obtain the rarefaction representation of voice to be identified on described big dictionary, and dictionary input voice being done to rough sort obtaining portion partial objectives for speaker;
S5: splice described partial target speaker dictionary, carry out subordinate phase Its Sparse Decomposition, utilizes rarefaction representation to confirm finally to identify speaker.
2. the method for claim 1, is characterized in that, described windowing process is Hamming window, Hanning window or rectangular window.
3. a Speaker Identification device for two benches Its Sparse Decomposition, is characterized in that, comprising:
Framing and windowing module, the discrete-time signal for the voice to input carries out framing and windowing process;
Characteristic extracting module, for making discrete Fourier transform (DFT) to each frame signal and asking range value, extracts amplitude spectrum as feature;
Build dictionary module, for building a big dictionary, wherein, described big dictionary comprises general background dictionary, the characteristics dictionary of different speaker and noise dictionary;
First stage Its Sparse Decomposition module, for carrying out first stage Its Sparse Decomposition to obtain the rarefaction representation of voice to be identified on described big dictionary, and makes the dictionary of rough sort obtaining portion partial objectives for speaker to input voice;
Subordinate phase Its Sparse Decomposition module, for splicing described partial target speaker dictionary, carries out subordinate phase Its Sparse Decomposition, utilizes rarefaction representation to confirm finally to identify speaker.
4. device as claimed in claim 3, it is characterized in that, described windowing process is Hamming window, Hanning window or rectangular window.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410231798.0A CN105139855A (en) | 2014-05-29 | 2014-05-29 | Speaker identification method with two-stage sparse decomposition and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410231798.0A CN105139855A (en) | 2014-05-29 | 2014-05-29 | Speaker identification method with two-stage sparse decomposition and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105139855A true CN105139855A (en) | 2015-12-09 |
Family
ID=54725177
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410231798.0A Pending CN105139855A (en) | 2014-05-29 | 2014-05-29 | Speaker identification method with two-stage sparse decomposition and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105139855A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115512708A (en) * | 2022-10-05 | 2022-12-23 | 哈尔滨理工大学 | Speaker recognition method based on discriminative dictionary and classifier combined learning |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1384960A (en) * | 1999-10-29 | 2002-12-11 | 艾利森电话股份有限公司 | Method and means for robust feature extraction for speech recognition |
CN1650349A (en) * | 2002-04-30 | 2005-08-03 | 诺基亚有限公司 | On-line parametric histogram normalization for noise robust speech recognition |
CN1653519A (en) * | 2002-03-20 | 2005-08-10 | 高通股份有限公司 | Method for robust voice recognition by analyzing redundant features of source signal |
CN102290047A (en) * | 2011-09-22 | 2011-12-21 | 哈尔滨工业大学 | Robust speech characteristic extraction method based on sparse decomposition and reconfiguration |
CN103345923A (en) * | 2013-07-26 | 2013-10-09 | 电子科技大学 | Sparse representation based short-voice speaker recognition method |
CN103413551A (en) * | 2013-07-16 | 2013-11-27 | 清华大学 | Sparse dimension reduction-based speaker identification method |
CN103456302A (en) * | 2013-09-02 | 2013-12-18 | 浙江大学 | Emotion speaker recognition method based on emotion GMM model weight synthesis |
CN103730114A (en) * | 2013-12-31 | 2014-04-16 | 上海交通大学无锡研究院 | Mobile equipment voiceprint recognition method based on joint factor analysis model |
-
2014
- 2014-05-29 CN CN201410231798.0A patent/CN105139855A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1384960A (en) * | 1999-10-29 | 2002-12-11 | 艾利森电话股份有限公司 | Method and means for robust feature extraction for speech recognition |
CN1653519A (en) * | 2002-03-20 | 2005-08-10 | 高通股份有限公司 | Method for robust voice recognition by analyzing redundant features of source signal |
CN1650349A (en) * | 2002-04-30 | 2005-08-03 | 诺基亚有限公司 | On-line parametric histogram normalization for noise robust speech recognition |
CN102290047A (en) * | 2011-09-22 | 2011-12-21 | 哈尔滨工业大学 | Robust speech characteristic extraction method based on sparse decomposition and reconfiguration |
CN103413551A (en) * | 2013-07-16 | 2013-11-27 | 清华大学 | Sparse dimension reduction-based speaker identification method |
CN103345923A (en) * | 2013-07-26 | 2013-10-09 | 电子科技大学 | Sparse representation based short-voice speaker recognition method |
CN103456302A (en) * | 2013-09-02 | 2013-12-18 | 浙江大学 | Emotion speaker recognition method based on emotion GMM model weight synthesis |
CN103730114A (en) * | 2013-12-31 | 2014-04-16 | 上海交通大学无锡研究院 | Mobile equipment voiceprint recognition method based on joint factor analysis model |
Non-Patent Citations (3)
Title |
---|
KUAJ MK ET AL: ""Speaker verification using sparse representation classification"", 《ICASSP》 * |
YONGJU HE ET AL: ""A SOLUTION TO RESIDUAL NOISE IN SPEECH DENOISING WITH SPARSE REPRESENTATION"", 《IEEE》 * |
何勇军 等: ""基于稀疏编码的鲁棒说话人识别"", 《数据采集与处理》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115512708A (en) * | 2022-10-05 | 2022-12-23 | 哈尔滨理工大学 | Speaker recognition method based on discriminative dictionary and classifier combined learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101980336B (en) | Hidden Markov model-based vehicle sound identification method | |
CN107886943A (en) | A kind of method for recognizing sound-groove and device | |
CN104835498A (en) | Voiceprint identification method based on multi-type combination characteristic parameters | |
CN107610707A (en) | A kind of method for recognizing sound-groove and device | |
CN107146601A (en) | A kind of rear end i vector Enhancement Methods for Speaker Recognition System | |
CN106952643A (en) | A kind of sound pick-up outfit clustering method based on Gaussian mean super vector and spectral clustering | |
CN108986798B (en) | Processing method, device and the equipment of voice data | |
CN101226743A (en) | Method for recognizing speaker based on conversion of neutral and affection sound-groove model | |
CN104021789A (en) | Self-adaption endpoint detection method using short-time time-frequency value | |
CN102968986A (en) | Overlapped voice and single voice distinguishing method based on long time characteristics and short time characteristics | |
CN104900235A (en) | Voiceprint recognition method based on pitch period mixed characteristic parameters | |
CN103456302B (en) | A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight | |
CN106898355B (en) | Speaker identification method based on secondary modeling | |
CN111554305B (en) | Voiceprint recognition method based on spectrogram and attention mechanism | |
CN104978507A (en) | Intelligent well logging evaluation expert system identity authentication method based on voiceprint recognition | |
CN105096955A (en) | Speaker rapid identification method and system based on growing and clustering algorithm of models | |
CN111261183A (en) | Method and device for denoising voice | |
CN109767760A (en) | Far field audio recognition method based on the study of the multiple target of amplitude and phase information | |
CN106373559A (en) | Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting | |
CN104485108A (en) | Noise and speaker combined compensation method based on multi-speaker model | |
CN106297769B (en) | A kind of distinctive feature extracting method applied to languages identification | |
CN103258537A (en) | Method utilizing characteristic combination to identify speech emotions and device thereof | |
CN104778948A (en) | Noise-resistant voice recognition method based on warped cepstrum feature | |
Al-Kaltakchi et al. | Study of statistical robust closed set speaker identification with feature and score-based fusion | |
CN105845143A (en) | Speaker confirmation method and speaker confirmation system based on support vector machine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20151209 |
|
RJ01 | Rejection of invention patent application after publication |