CN105139855A - Speaker identification method with two-stage sparse decomposition and device - Google Patents

Speaker identification method with two-stage sparse decomposition and device Download PDF

Info

Publication number
CN105139855A
CN105139855A CN201410231798.0A CN201410231798A CN105139855A CN 105139855 A CN105139855 A CN 105139855A CN 201410231798 A CN201410231798 A CN 201410231798A CN 105139855 A CN105139855 A CN 105139855A
Authority
CN
China
Prior art keywords
dictionary
speaker
sparse decomposition
voice
sparse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410231798.0A
Other languages
Chinese (zh)
Inventor
何勇军
孙广路
付茂国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN201410231798.0A priority Critical patent/CN105139855A/en
Publication of CN105139855A publication Critical patent/CN105139855A/en
Pending legal-status Critical Current

Links

Abstract

The present invention relates to a speaker identification method with two-stage sparse decomposition. The method comprises a step of (S1) carrying out framing and windowing processing on the discrete-time signal of inputted speech, (S2) carrying out discrete Fourier transform on each frame of signal and obtaining an amplitude value, and extracting the amplitude value as a characteristic, (S3) constructing a large dictionary, (S4) carrying out first stage sparse decomposition to obtain the sparse representation of the speech to be identified in the large dictionary, and carrying out sparse classification on the inputted speech to obtain a part of targeted speaker dictionary, (S5) splicing a part of targeted speaker dictionary, carrying out second phase sparse decomposition and using sparse representation to confirm a final identified speaker. According to the method, different speakers can be identified, and the method has the advantages of high efficiency, accuracy and usability of the identification of speaker identification. The invention also discloses a speaker identification device with two-stage sparse decomposition.

Description

A kind of method for distinguishing speek person of two benches Its Sparse Decomposition and device
Technical field
The present invention relates to the Speaker Identification field in Speech processing, particularly relate to a kind of method for distinguishing speek person and device of two benches Its Sparse Decomposition.
Background technology
At present, Speaker Identification at identity verify, network monitoring, the field such as telephone monitoring and information security extensive application.Through the extensive research of decades, typical recognition system is as Gaussian Mixture-universal background model (GMM-UBM) method, the methods such as gauss hybrid models-support vector machine method (GMM-SVM) and associating factor analysis, achieve satisfied effect under ideal conditions.But in a noisy environment, its performance will sharply decline, and which has limited the widespread use of these technology.
Researchers propose two class methods to strengthen the noise robustness of Speaker Identification.First kind method extracts the feature to noise robustness, such as linear predictor coefficient (LPCC), mel cepstrum coefficients (MFCC) and perception linear predictor coefficient (PLP) etc.The raising that these methods are only limited, only represents voice because do not have feature to have and does not represent the selective power of noise.Equations of The Second Kind method is having in noise speech the method adopting enhancing to remove noise, such as spectrum subtraction and Wiener filtering, then from the voice strengthened, extracting feature.Unfortunately, most noise is non-stable, and even some noise is as voice, is difficult to its modeling and estimates.As a result, sound enhancement method inevitably causes larger distortion, and this have impact on current method for distinguishing speek person, and therefore, people wish to have new technology to solve this difficult problem.
In the past in the several years, sparse coding is studied widely, for the Speaker Identification under noise circumstance provides possible solution.This technology one group of atom (primitive signal) represents signal, and the set of atom is being called as dictionary.By sparse coding, represent the whole of signal or most information with the linear combination of a small amount of atom.Recently, a sparse coding method being called anatomic element analysis (MCA) is employed successfully in Speaker Identification.Based on this technology, each speaker prepares a dictionary, and all speaker's dictionaries are spliced into a big dictionary.In identification, the voice of test are sparsely represented by big dictionary.In theory, speaker's word only can be represented by the dictionary of this speaker, and therefore, rarefaction representation can be directly used in classification.
The method of nearly all Speaker Identification has all used the framework of MCA, first these methods change training utterance into the vector of the super vector of GMM average or total variation, then these vector compositions big dictionary, Its Sparse Decomposition and classification is carried out with this big dictionary.It is reported, these methods have better performance than traditional GMM-UBM and GMM-SVM method.But these methods are not still considered to compensate noise, it reduce the discrimination of these methods under noise situations.
Summary of the invention
Technical matters to be solved by this invention is, for the deficiencies in the prior art, how to provide a kind of method to consider to compensate the noise in Speaker Identification, improves the key issue of the discrimination of speaker's voice under noise situations.
For this purpose, the present invention proposes a kind of method for distinguishing speek person of two benches Its Sparse Decomposition, comprise concrete following steps:
S1: framing and windowing process are carried out to the discrete-time signal of the voice of input;
S2: discrete Fourier transform (DFT) is done to each frame signal and asks range value, amplitude spectrum is extracted as feature;
S3: build a big dictionary, wherein, described big dictionary comprises general background dictionary, the characteristics dictionary of different speaker and noise dictionary;
S4: carry out first stage Its Sparse Decomposition to obtain the rarefaction representation of voice to be identified on described big dictionary, and dictionary input voice being done to rough sort obtaining portion partial objectives for speaker;
S5: splice described partial target speaker dictionary, carry out subordinate phase Its Sparse Decomposition, utilizes rarefaction representation to confirm finally to identify speaker.
Particularly, described windowing process is Hamming window, Hanning window or rectangular window.
For this purpose, the invention allows for a kind of Speaker Identification device of two benches Its Sparse Decomposition, comprising:
Framing and windowing module, the discrete-time signal for the voice to input carries out framing and windowing process;
Characteristic extracting module, for making discrete Fourier transform (DFT) to each frame signal and asking range value, extracts amplitude spectrum as feature;
Build dictionary module, for building a big dictionary, wherein, described big dictionary comprises general background dictionary, the characteristics dictionary of different speaker and noise dictionary;
First stage Its Sparse Decomposition module, for carrying out first stage Its Sparse Decomposition to obtain the rarefaction representation of voice to be identified on described big dictionary, and makes the dictionary of rough sort obtaining portion partial objectives for speaker to input voice;
Subordinate phase Its Sparse Decomposition module, for splicing described partial target speaker dictionary, carries out subordinate phase Its Sparse Decomposition, utilizes rarefaction representation to confirm finally to identify speaker.
Particularly, described windowing process is Hamming window, Hanning window or rectangular window.
The method for distinguishing speek person of a kind of two benches Its Sparse Decomposition disclosed in this invention, by first to input voice framing, windowing, and after making discrete Fourier transform (DFT) to each frame voice, asks amplitude spectrum as phonetic feature; Then one is built by common background dictionary, the big dictionary of the dictionary of different speaker and noise dictionary composition; Then voice to be identified are sparsely represented on big dictionary, and with rarefaction representation, rude classification is done to voice to be identified, obtain the dictionary of a small amount of target speaker; Finally these dictionaries are spliced again after becoming a big dictionary, input voice are represented on this big dictionary, and do last classification to determine speaker ' s identity with rarefaction representation.The present invention can identify different speakers.There is the beneficial effect of high efficiency, accuracy and the ease for use identifying speaker ' s identity.The invention also discloses a kind of Speaker Identification device of two benches Its Sparse Decomposition.
Accompanying drawing explanation
Can understanding the features and advantages of the present invention clearly by reference to accompanying drawing, accompanying drawing is schematic and should not be construed as and carry out any restriction to the present invention, in the accompanying drawings:
Fig. 1 shows the flow chart of steps of the method for distinguishing speek person of a kind of two benches Its Sparse Decomposition in the embodiment of the present invention;
Fig. 2 shows the flow example figure of the method for distinguishing speek person of a kind of two benches Its Sparse Decomposition in the embodiment of the present invention;
Fig. 3 shows the structural drawing of the Speaker Identification device of a kind of two benches Its Sparse Decomposition in the embodiment of the present invention.
Embodiment
Below in conjunction with accompanying drawing, embodiments of the present invention is described in detail.
As shown in Figure 1, the invention provides a kind of method for distinguishing speek person of two benches Its Sparse Decomposition, comprise concrete following steps:
Step S1: carry out framing and windowing process to the discrete-time signal of the voice of input, wherein, windowing process is Hamming window, Hanning window or rectangular window.
Step S2: discrete Fourier transform (DFT) is done to each frame signal and asks range value, amplitude spectrum is extracted as feature.
Step S3: build a big dictionary, wherein, big dictionary comprises general background dictionary, the characteristics dictionary of different speaker and noise dictionary.
Step S4: carry out first stage Its Sparse Decomposition to obtain the rarefaction representation of voice to be identified on described big dictionary, and dictionary input voice being done to rough sort obtaining portion partial objectives for speaker.
Step S5: splice described partial target speaker dictionary, carry out subordinate phase Its Sparse Decomposition, utilizes rarefaction representation to confirm finally to identify speaker.
As shown in Figure 2, the invention provides a kind of flow example figure of method for distinguishing speek person of two benches Its Sparse Decomposition.
Particularly, the object of pre-emphasis is the impact of minimizing sharp noise and promotes HFS, does pre-emphasis process undertaken by following formula given signal y (n):
z(n)=y(n)-0.97y(n-1)(1)
Here pre emphasis factor is 0.97.Then windowing can be Hamming window, Hanning window or rectangular window.Research shows, Hamming window has better Frequency Response than rectangle, can alleviate and leak phenomenon frequently.Windowing process is expressed as:
S p(n)=z(n)*W(n)(2)
Above formula represents the convolution of z (n) and W (n).
Wherein,
Wherein n is time sequence number, and M is that window is long.
Then, discrete Fourier transform (DFT) (DiscreteFourierTransform, DFT) is carried out
Y P = Σ n = 0 N - 1 S p ( n ) e - j 2 kπ / N , 0 ≤ k ≤ N - - - ( 4 )
S in formula p(n) for the p frame voice signal after windowing, p be frame number, N represents counting of Fourier transform.
Dictionary prepares.One is had to the Speaker Recognition System of K different speaker, we devise the big dictionary that has new structure.
Ψ=[Φ 012,...,Φ Kv](5)
Wherein, Φ 0be a general background dictionary, comprise the common feature of all speakers.Here we have used for reference the idea that GMM-UBM UBM carrys out simulation background.Φ i(i=1 ..., K) be used to the dictionary of simulation i-th speaker's variability (feature).Φ vthat noise dictionary is used for simulated environment noise.All atoms in Ψ are all standardized into unit norm vector.K-SVD is used for training dictionary, and dictionary training problem is described as:
min | | Y - ΦX | | 2 2 sujectto | | x i | | 0 ≤ T 0 - - - ( 6 )
Wherein, Y=[Y 1, Y 2..., Y m] be the data set of training, each Y ibe all the proper vector of speech frame, Φ is dictionary, X=[x 1, x 2..., x m] be one group of sparse vector corresponding to Y, T 0it is sparse thresholding.The general background dictionary voice of a large amount of unlabelled different speakers are trained.Each Φ iall use the voice of the i-th speaker to train, do initial value with Ψ.
The Its Sparse Decomposition problem solved in formula (4), this problem is proved to be a NP-hard, all can not thoroughly be solved completely by all possible sparse subset.If x is sparse or approximate sparse, it is by separating formula (5) by unique decision.
y = arg min y λ | | y | | 1 + 1 2 | | Y - Ψy | | 2 2 - - - ( 7 )
Wherein, λ >0 is regularization parameter, and equation also with reference to base and follows the trail of denoising (basispursuitdenoising, BPDN).
For each speech frame Y, separate formula (5) and obtain sparse coefficient y, carry out first time Its Sparse Decomposition.
The method of second time Its Sparse Decomposition and classification, first, calculates c i=|| δ i(y) || 1, i=1,2 ..., K, δ i(.) gives us a vector, and unique nonzero term of this vector is from i-th classification, namely the item zero setting of non-i-th classification in sparse coefficient y, with ascending order arrangement c i;
Then selecting the dictionary of Q speaker, is exactly c after sequence ithe dictionary of the maximal value of a Q corresponded to, forms a large dictionary
y = arg min y λ | | y | | 1 + 1 2 | | Y - Ψ ^ y | | 2 2 - - - ( 8 )
Separate above-mentioned formula (6) and obtain y.Through type (7) confirms this speech frame belongs to which speaker
j = arg max 1 ≤ j ≤ Q | | δ i ( y ) | | 1 - - - ( 9 )
The method for distinguishing speek person of a kind of two benches Its Sparse Decomposition disclosed in this invention, the Advantages found of said method both ways.On the one hand: the compensation becoming noise when to consider pair.This method devises a noise dictionary with noise change, can become noise during effective compensation; On the other hand: adopt two stage Its Sparse Decomposition, further increase accuracy of identification.In decomposition, the competition of speaker's atom may occur, causes method hydraulic performance decline.In order to solve this difficult problem, we have proposed a kind of method for distinguishing speek person of two benches Its Sparse Decomposition, namely after first stage Its Sparse Decomposition, the dictionary of the speaker being used to atom in big dictionary is stitched together, carry out subordinate phase Its Sparse Decomposition again, finally be shown as with secondary sparse table and confirm classification, two stage Its Sparse Decomposition method is obviously better than current method for distinguishing speek person.
As shown in Figure 3, the invention provides a kind of structural drawing of Speaker Identification device of two benches Its Sparse Decomposition.
Particularly, framing and windowing module 101 are for carrying out framing and windowing process to the discrete-time signal of the voice inputted, and wherein, windowing process is Hamming window, Hanning window or rectangular window; Characteristic extracting module 102, for making discrete Fourier transform (DFT) to each frame signal and asking range value, is extracted amplitude spectrum as feature; Build dictionary module 103 for building a big dictionary, wherein, described big dictionary comprises general background dictionary, the characteristics dictionary of different speaker and noise dictionary; First stage Its Sparse Decomposition module 104 for carrying out first stage Its Sparse Decomposition to obtain the rarefaction representation of voice to be identified on described big dictionary, and makes the dictionary of rough sort obtaining portion partial objectives for speaker to input voice; Subordinate phase Its Sparse Decomposition module 105, for splicing described partial target speaker dictionary, carries out subordinate phase Its Sparse Decomposition, utilizes rarefaction representation to confirm finally to identify speaker.
The method for distinguishing speek person of a kind of two benches Its Sparse Decomposition disclosed in this invention, by first to input voice framing, windowing, and after making discrete Fourier transform (DFT) to each frame voice, asks amplitude spectrum as phonetic feature; Then one is built by common background dictionary, the big dictionary of the dictionary of different speaker and noise dictionary composition; Then voice to be identified are sparsely represented on big dictionary, and with rarefaction representation, rude classification is done to voice to be identified, obtain the dictionary of a small amount of target speaker; Finally these dictionaries are spliced again after becoming a big dictionary, input voice are represented on this big dictionary, and do last classification to determine speaker ' s identity with rarefaction representation.The present invention can identify different speakers.There is the beneficial effect of high efficiency, accuracy and the ease for use identifying speaker ' s identity.The invention also discloses a kind of Speaker Identification device of two benches Its Sparse Decomposition.
Above embodiment is only for illustration of the present invention; and be not limitation of the present invention; the those of ordinary skill of relevant technical field; without departing from the spirit and scope of the present invention; can also make a variety of changes and modification; therefore all equivalent technical schemes also belong to category of the present invention, and scope of patent protection of the present invention should be defined by the claims.
Although describe embodiments of the present invention by reference to the accompanying drawings, but those skilled in the art can make various modifications and variations without departing from the spirit and scope of the present invention, such amendment and modification all fall into by within claims limited range.

Claims (4)

1. a method for distinguishing speek person for two benches Its Sparse Decomposition, is characterized in that, comprises concrete following steps:
S1: framing and windowing process are carried out to the discrete-time signal of the voice of input;
S2: discrete Fourier transform (DFT) is done to each frame signal and asks range value, amplitude spectrum is extracted as feature;
S3: build a big dictionary, wherein, described big dictionary comprises general background dictionary, the characteristics dictionary of different speaker and noise dictionary;
S4: carry out first stage Its Sparse Decomposition to obtain the rarefaction representation of voice to be identified on described big dictionary, and dictionary input voice being done to rough sort obtaining portion partial objectives for speaker;
S5: splice described partial target speaker dictionary, carry out subordinate phase Its Sparse Decomposition, utilizes rarefaction representation to confirm finally to identify speaker.
2. the method for claim 1, is characterized in that, described windowing process is Hamming window, Hanning window or rectangular window.
3. a Speaker Identification device for two benches Its Sparse Decomposition, is characterized in that, comprising:
Framing and windowing module, the discrete-time signal for the voice to input carries out framing and windowing process;
Characteristic extracting module, for making discrete Fourier transform (DFT) to each frame signal and asking range value, extracts amplitude spectrum as feature;
Build dictionary module, for building a big dictionary, wherein, described big dictionary comprises general background dictionary, the characteristics dictionary of different speaker and noise dictionary;
First stage Its Sparse Decomposition module, for carrying out first stage Its Sparse Decomposition to obtain the rarefaction representation of voice to be identified on described big dictionary, and makes the dictionary of rough sort obtaining portion partial objectives for speaker to input voice;
Subordinate phase Its Sparse Decomposition module, for splicing described partial target speaker dictionary, carries out subordinate phase Its Sparse Decomposition, utilizes rarefaction representation to confirm finally to identify speaker.
4. device as claimed in claim 3, it is characterized in that, described windowing process is Hamming window, Hanning window or rectangular window.
CN201410231798.0A 2014-05-29 2014-05-29 Speaker identification method with two-stage sparse decomposition and device Pending CN105139855A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410231798.0A CN105139855A (en) 2014-05-29 2014-05-29 Speaker identification method with two-stage sparse decomposition and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410231798.0A CN105139855A (en) 2014-05-29 2014-05-29 Speaker identification method with two-stage sparse decomposition and device

Publications (1)

Publication Number Publication Date
CN105139855A true CN105139855A (en) 2015-12-09

Family

ID=54725177

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410231798.0A Pending CN105139855A (en) 2014-05-29 2014-05-29 Speaker identification method with two-stage sparse decomposition and device

Country Status (1)

Country Link
CN (1) CN105139855A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115512708A (en) * 2022-10-05 2022-12-23 哈尔滨理工大学 Speaker recognition method based on discriminative dictionary and classifier combined learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1384960A (en) * 1999-10-29 2002-12-11 艾利森电话股份有限公司 Method and means for robust feature extraction for speech recognition
CN1650349A (en) * 2002-04-30 2005-08-03 诺基亚有限公司 On-line parametric histogram normalization for noise robust speech recognition
CN1653519A (en) * 2002-03-20 2005-08-10 高通股份有限公司 Method for robust voice recognition by analyzing redundant features of source signal
CN102290047A (en) * 2011-09-22 2011-12-21 哈尔滨工业大学 Robust speech characteristic extraction method based on sparse decomposition and reconfiguration
CN103345923A (en) * 2013-07-26 2013-10-09 电子科技大学 Sparse representation based short-voice speaker recognition method
CN103413551A (en) * 2013-07-16 2013-11-27 清华大学 Sparse dimension reduction-based speaker identification method
CN103456302A (en) * 2013-09-02 2013-12-18 浙江大学 Emotion speaker recognition method based on emotion GMM model weight synthesis
CN103730114A (en) * 2013-12-31 2014-04-16 上海交通大学无锡研究院 Mobile equipment voiceprint recognition method based on joint factor analysis model

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1384960A (en) * 1999-10-29 2002-12-11 艾利森电话股份有限公司 Method and means for robust feature extraction for speech recognition
CN1653519A (en) * 2002-03-20 2005-08-10 高通股份有限公司 Method for robust voice recognition by analyzing redundant features of source signal
CN1650349A (en) * 2002-04-30 2005-08-03 诺基亚有限公司 On-line parametric histogram normalization for noise robust speech recognition
CN102290047A (en) * 2011-09-22 2011-12-21 哈尔滨工业大学 Robust speech characteristic extraction method based on sparse decomposition and reconfiguration
CN103413551A (en) * 2013-07-16 2013-11-27 清华大学 Sparse dimension reduction-based speaker identification method
CN103345923A (en) * 2013-07-26 2013-10-09 电子科技大学 Sparse representation based short-voice speaker recognition method
CN103456302A (en) * 2013-09-02 2013-12-18 浙江大学 Emotion speaker recognition method based on emotion GMM model weight synthesis
CN103730114A (en) * 2013-12-31 2014-04-16 上海交通大学无锡研究院 Mobile equipment voiceprint recognition method based on joint factor analysis model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KUAJ MK ET AL: ""Speaker verification using sparse representation classification"", 《ICASSP》 *
YONGJU HE ET AL: ""A SOLUTION TO RESIDUAL NOISE IN SPEECH DENOISING WITH SPARSE REPRESENTATION"", 《IEEE》 *
何勇军 等: ""基于稀疏编码的鲁棒说话人识别"", 《数据采集与处理》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115512708A (en) * 2022-10-05 2022-12-23 哈尔滨理工大学 Speaker recognition method based on discriminative dictionary and classifier combined learning

Similar Documents

Publication Publication Date Title
CN101980336B (en) Hidden Markov model-based vehicle sound identification method
CN107886943A (en) A kind of method for recognizing sound-groove and device
CN104835498A (en) Voiceprint identification method based on multi-type combination characteristic parameters
CN107610707A (en) A kind of method for recognizing sound-groove and device
CN107146601A (en) A kind of rear end i vector Enhancement Methods for Speaker Recognition System
CN106952643A (en) A kind of sound pick-up outfit clustering method based on Gaussian mean super vector and spectral clustering
CN108986798B (en) Processing method, device and the equipment of voice data
CN101226743A (en) Method for recognizing speaker based on conversion of neutral and affection sound-groove model
CN104021789A (en) Self-adaption endpoint detection method using short-time time-frequency value
CN102968986A (en) Overlapped voice and single voice distinguishing method based on long time characteristics and short time characteristics
CN104900235A (en) Voiceprint recognition method based on pitch period mixed characteristic parameters
CN103456302B (en) A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight
CN106898355B (en) Speaker identification method based on secondary modeling
CN111554305B (en) Voiceprint recognition method based on spectrogram and attention mechanism
CN104978507A (en) Intelligent well logging evaluation expert system identity authentication method based on voiceprint recognition
CN105096955A (en) Speaker rapid identification method and system based on growing and clustering algorithm of models
CN111261183A (en) Method and device for denoising voice
CN109767760A (en) Far field audio recognition method based on the study of the multiple target of amplitude and phase information
CN106373559A (en) Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting
CN104485108A (en) Noise and speaker combined compensation method based on multi-speaker model
CN106297769B (en) A kind of distinctive feature extracting method applied to languages identification
CN103258537A (en) Method utilizing characteristic combination to identify speech emotions and device thereof
CN104778948A (en) Noise-resistant voice recognition method based on warped cepstrum feature
Al-Kaltakchi et al. Study of statistical robust closed set speaker identification with feature and score-based fusion
CN105845143A (en) Speaker confirmation method and speaker confirmation system based on support vector machine

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20151209

RJ01 Rejection of invention patent application after publication