CN104538035B  A kind of method for distinguishing speek person and system based on Fisher super vectors  Google Patents
A kind of method for distinguishing speek person and system based on Fisher super vectors Download PDFInfo
 Publication number
 CN104538035B CN104538035B CN201410802816.6A CN201410802816A CN104538035B CN 104538035 B CN104538035 B CN 104538035B CN 201410802816 A CN201410802816 A CN 201410802816A CN 104538035 B CN104538035 B CN 104538035B
 Authority
 CN
 China
 Prior art keywords
 speaker
 vector
 fisher
 projection matrix
 subspace
 Prior art date
Links
 238000004458 analytical methods Methods 0.000 claims abstract description 46
 238000000605 extraction Methods 0.000 claims abstract description 15
 239000000284 extracts Substances 0.000 claims abstract description 5
 239000011159 matrix materials Substances 0.000 claims description 72
 230000001603 reducing Effects 0.000 claims description 17
 238000006722 reduction reactions Methods 0.000 claims description 16
 230000000875 corresponding Effects 0.000 claims description 14
 238000000513 principal component analysis Methods 0.000 claims description 10
 230000004069 differentiation Effects 0.000 claims description 7
 241001330499 Corsiaceae Species 0.000 claims description 3
 235000003140 Panax quinquefolius Nutrition 0.000 claims description 2
 235000005035 ginseng Nutrition 0.000 claims description 2
 235000008434 ginseng Nutrition 0.000 claims description 2
 241000208340 Araliaceae Species 0.000 claims 1
 238000000034 methods Methods 0.000 abstract description 11
 238000005516 engineering processes Methods 0.000 description 6
 239000000203 mixtures Substances 0.000 description 4
 238000004891 communication Methods 0.000 description 3
 238000010168 coupling process Methods 0.000 description 3
 238000005859 coupling reactions Methods 0.000 description 3
 238000009826 distribution Methods 0.000 description 3
 230000000694 effects Effects 0.000 description 3
 238000003860 storage Methods 0.000 description 3
 230000001808 coupling Effects 0.000 description 2
 238000010586 diagrams Methods 0.000 description 2
 238000000556 factor analysis Methods 0.000 description 2
 230000002452 interceptive Effects 0.000 description 2
 238000010606 normalization Methods 0.000 description 2
 238000003672 processing method Methods 0.000 description 2
 240000004678 Panax pseudoginseng Species 0.000 description 1
 238000004364 calculation methods Methods 0.000 description 1
 239000000969 carriers Substances 0.000 description 1
 230000001413 cellular Effects 0.000 description 1
 238000010835 comparative analysis Methods 0.000 description 1
 230000000052 comparative effects Effects 0.000 description 1
 230000002996 emotional Effects 0.000 description 1
 239000000686 essences Substances 0.000 description 1
 230000003862 health status Effects 0.000 description 1
 238000004519 manufacturing process Methods 0.000 description 1
 239000000463 materials Substances 0.000 description 1
 238000006467 substitution reactions Methods 0.000 description 1
Abstract
Description
Technical field
The invention belongs to technical field of voice recognition, more particularly to a kind of Speaker Identification side based on Fisher super vectors Method and system.
Background technology
With the continuous progress of computer technology and Internet technology, smart machine has become to get in people's lives Come more indispensable.And as the interactive voice of one of interactive mode between people and smart machine, due to its have collection it is easy, It is easy to store, is difficult to imitate, voice obtains the characteristic such as of low cost, also becomes the hot spot of research field.
Current intelligent sound processing mode, according to the difference of the voice messaging utilized, is broadly divided into：Speech recognition (Speech Recognition), languages identify (Language Recognition) and Speaker Identification (Speaker Recognition) etc..Wherein, speech recognition aims at which kind of semantic information judge to be transmitted in voice signal be； The target of languages identification is the category of language or dialect type identified belonging to voice signal；Speaker Identification is then by carrying The personal characteristics of characterization speaker is taken, identifies the identity of speaker.
Since voice is the important carrier of identity information, compared with the other biological feature such as face, fingerprint, the acquisition of voice Of low cost, using simple, easy to remote data acquisition, and voicebased manmachine communication interface is more friendly, therefore speaks People's identification technology becomes important automatic identity authentication technology.
The method for the Speaker Identification being commonly used at present includes being based on gauss hybrid modelsuniversal background model (GMM UBM speaker's speech recognition) is carried out, although GMMUBM models have certain noise robustness, since the model is being instructed The influence of channel is not accounted for when practicing, when training voice and tested speech from different channels, causes its recognition performance Drastically decline.
The reduction of caused recognition performance during to overcome channel mismatch, the prior art propose one kind and are based on GMMUBM The simultaneous factor analysis (Joint Factor Analysis, JFA) of model) mode, to carry out Speaker Identification.But due to JFA is theoretical to be established in the frame foundation of GMMUBM models, it is assumed that the main letter that the GMM average super vectors of speaker are included Breath may map in two mutually independent lowerdimensional subspaces, using EM iterative algorithms to the space based on GMM model frame Loading matrix is estimated, GMM model frame can not be departed from calculating process.Method for identifying speaker based on JFA theories It is that channel compensation has been carried out to speaker model according to the parameter estimated during the test, test performance is poor.
The content of the invention
In consideration of it, the embodiment of the present invention provides a kind of method for distinguishing speek person and system based on Fisher super vectors, with Individual information using the Fisher super vectors high dimensional feature vector in voice data as characterization speaker, and using subspace point Analysis modeling technique carries out Speaker Identification on the basis of Fisher super vector high dimensional feature vectors, improves the identity of system Energy.
The embodiment of the present invention is achieved in that a kind of method for distinguishing speek person based on Fisher super vectors, the side Method includes：
Extract Fisher super vectors；
The Fisher super vectors of extraction are divided into multiple Fisher subvectors collection；
Each Fisher subvectors collection is analyzed based on nonparametric distinguishing analysis algorithm, to establish subspace speaker Model；
Reference vector and the training sample speaker of speaker to be identified is obtained according to the subspace speaker model Reference vector, and according to default computation rule, and the reference vector of the speaker to be identified and the trained sample The speaker to be identified is identified in the reference vector of this speaker.
The another object of the embodiment of the present invention is to provide a kind of Speaker Recognition System based on Fisher super vectors, institute The system of stating includes：
Extraction unit, for extracting Fisher super vectors；
Division unit, for the Fisher super vectors of extraction to be divided into multiple Fisher subvectors collection；
Model foundation unit, for being analyzed based on nonparametric distinguishing analysis algorithm each Fisher subvectors collection, To establish subspace speaker model；
Recognition unit, for obtaining the reference vector and instruction of speaker to be identified according to the subspace speaker model Practice the reference vector of sample speaker, and according to default computation rule, and the reference vector of the speaker to be identified with And the speaker to be identified is identified in the reference vector of the training sample speaker.
Existing beneficial effect is the embodiment of the present invention compared with prior art：In extraction voice data of the embodiment of the present invention Feature vector of the Fisher super vectors as speaker, and using subspace analysis modeling technique Fisher super vectors base Speaker Identification is carried out on plinth.Since the extraction of Fisher super vectors is simple, and the dimension with than JFA super vector highers, and Channel compensation processing was not done, so as to effectively improve the accuracy rate of Speaker Identification and efficiency.In addition, the embodiment of the present invention Extra hardware need not be increased in abovementioned identification process, so as to effectively reduce cost, there is stronger ease for use and reality The property used.
Brief description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, drawings in the following description be only the present invention some Embodiment, for those of ordinary skill in the art, without having to pay creative labor, can also be according to these Attached drawing obtains other attached drawings.
Fig. 1 is that the method for distinguishing speek person based on Fisher super vectors that the embodiment of the present invention one provides realizes flow Figure；
Fig. 2 is the schematic diagram for the nonparametric distinguishing analysis based on Fisher super vectors that the embodiment of the present invention one provides；
Fig. 3 is the Speaker Recognition System based on Fisher super vectors of the offer of the embodiment of the present invention one with being surpassed based on JFA The analogous diagram of the Speaker Recognition System comparative result of vector；
Fig. 4 is the composition structure of the Speaker Recognition System provided by Embodiment 2 of the present invention based on Fisher super vectors Figure.
Embodiment
In being described below, in order to illustrate rather than in order to limit, it is proposed that such as tool of particular system structure, technology etc Body details, understands the embodiment of the present invention to cut thoroughly.However, it will be clear to one skilled in the art that these are specific The present invention can also be realized in the other embodiments of details.In other situations, omit to wellknown system, device, electricity Road and the detailed description of method, in case unnecessary details hinders description of the invention.
In order to illustrate technical solutions according to the invention, illustrated below by specific embodiment.
Embodiment one：
Fig. 1 shows the realization stream for the method for distinguishing speek person based on Fisher super vectors that the embodiment of the present invention one provides Journey, details are as follows for this method process：
In step S101, Fisher super vectors are extracted.
In embodiments of the present invention, in order to further improve the accuracy rate of Speaker Identification and efficiency, the embodiment of the present invention Extract feature vector of the Fisher super vectors in voice data as speaker.
Wherein, the Fisher super vectors are corresponding by all gauss components in GMM modelWithSplicing and Into, the dimension of the Fisher super vectors is (2d+1) K, wherein：
Wherein,Value be scalar,WithValue be d dimension vector, d >=1；The feature vector sequence of speaker's voice Arrange X={ x_{t}, t=1...T }, x_{t}Represent feature vector, T represents the number of feature vector in characteristic vector sequence X, and K represents GMM The number of gauss component in model,
Kth of gauss component in GMM modelw_{k}Table Show the weight of kth of gauss component in GMM model,μ_{k}Represent GMM model in kth of Gauss into The mean vector divided, Σ_{k}Represent the covariance matrix of kth of gauss component in GMM model,Represent Σ_{k}Member on middle diagonal Element.
It is described as follows：If the characteristic vector sequence from a voice data is X={ x_{t}, t=1...T }, wherein Each feature vector x_{t}Between be mutually independent, X can represent as follows：
Between feature vector under mutually independent assumed condition, Fisher super vectors can regard as to each feature to The adduction of the regularization gradient statistic of amount, following operator:
It can be regarded as a feature vector x_{t}A point being embedded into higher dimensional space, so as to be easier to linearly be divided The structure of class device.It is further noted that between feature vector it is mutually independent hypothesis in practical situations often not into Vertical, for this problem, corresponding processing method can be mentioned in herein below.
Since GMM model can be to any continuously distributed carry out Accurate Model, it is therefore assumed that pdf model p_{λ}For GMM model.In order to obtain the corresponding Fisher super vectors of every voice data, it is necessary to which one only with speaker and channel information Vertical universal background model, p_{λ}Model is to be trained by a large amount of voice data from different speakers, different channels Common background GMM model with more gauss component number.Assuming that the GMM model has K gauss component, then the GMM model Parameter can be expressed as λ={ w_{k},μ_{k},Σ_{k}, k=1 ..., K }, wherein w_{k}, μ_{k}And Σ_{k}Represent respectively kth high in GMM model The weight of this component, mean vector and covariance matrix.GMM model is represented by the following formula：
Wherein, p_{k}Represent kth of gauss component in GMM model：
And there is the following conditions establishment：
In order to ensure p_{λ}(x) distribution of training data can effectively be described, it is assumed that each gauss component in GMM model Covariance matrix is diagonal matrix, and the element on its diagonal is with vectorRepresent.
In addition, the weight parameter w for the gauss component in GMM model_{k}, in order to avoid using the immediate constraint shape of above formula Formula, introduces parameter alpha_{k}By gauss component weight w_{k}It is expressed as form：
GMM model parameter can be expressed as again, λ={ α_{k},μ_{k},σ_{k}, k=1 ..., K }, a certain feature vector x_{t}Relative to The gradient of GMM model parameter is expressed as form：
γ in above equation_{t}(k) feature vector x is represented_{t}, can be with to the occupation rate of kth of gauss component in GMM model Calculated by the posterior probability of following formula to represent：
Gradient more than having seeks solution's expression, next solves the root mean square problem of Fisher's information inverse of a matrix. The value of posterior probability is typically very sparse, that is to say, that feature vector x_{t}Simply some gauss component is occupied Rate is higher, all smaller to the occupation rate of remaining gauss component, reflects and just refers to feature in the spatial distribution of feature vector The center of some Gaussian function of vector distance is closer, with regard to distant with a distance from other Gaussian function centers.Due to taking What is be worth is openness, and Fisher's information matrix is diagonal matrix, therefore can obtain the pressure gradient expression formula of following regularization：
In above equationValue be a scalar,WithValue be d dimension vector.Final Fisher super vectors It is corresponding by all gauss components in GMM modelWithThree splicings obtain, its dimension is (2d+1) K.
In step s 102, the Fisher super vectors of extraction are divided into multiple Fisher subvectors collection.
Particularly, all Gaussian mean vectors of UBM model are clustered using GMM algorithms, according to cluster result, Average division, or nonaverage division can be used, the Fisher super vectors are divided into multiple Fisher subvectors collection.
In step s 103, each Fisher subvectors collection is analyzed based on nonparametric distinguishing analysis algorithm, to build Vertical subspace speaker model.
Since Fisher super vectors achieve preferable recognition effect in image classification, and extraction process is easy, therefore The embodiment of the present invention is introduced into field of speech recognition, studies its application effect in the field.Due to Fisher super vectors Be also based on what UBM model obtained, thus as JFA super vectors also have GMM super vectors structure, have than JFA surpass to Measure the dimension of higher.From the point of view of theoretically, more redundancy is contained in Fisher super vectors, it is therefore desirable to using nonparametric Distinguishing analysis algorithm (NDA) carries out analysis modeling, (as shown in Figure 2) specific as follows to Fisher super vectors：
1) redundancy for being included in each Fisher subvectors and concentrating is removed using principal component analysis PCA algorithms, is obtained Projection matrix after the dimensionality reduction of each Fisher subvector collection.
It is included in specifically, being removed using principal component analysis (Principal Component Analysis, PCA) algorithm Redundancy in Fisher subvectors, corresponds to each Fisher subvectors in Nonparametric Analysis part as shown in Figure 2 Sub projection matrix W in the projection matrix expression formula of collection_{11},W_{21},...,W_{K1}Projection square as after the optimal dimensionality reduction of PCA algorithms Battle array.
2) projection matrix after the dimensionality reduction is handled using the regular WCCN algorithms of covariance in class, obtained each The corresponding subspace projection matrix of Fisher subvector collection.
It is specifically, regular (WithinClass Covariance Normalization, WCCN) using covariance in class Same speaker is reduced due to difference in class caused by the factor such as health status or emotional change, which is to be applied to In set of eigenvectors after the projection of PCA methods.Correspond to each Fisher in Nonparametric Analysis part shown in Fig. 2 Sub projection matrix W in the projection matrix expression formula of subvector collection_{12},W_{22},...,W_{K2}It is exactly after WCCN feature normalizations algorithm acts on Obtained subspace projection matrix.
3) differentiation on the class border of the subspace projection matrix is extracted using the linear distinguishing analysis NLDA algorithms of nonparametric Information, obtains the linear property distinguishing analysis projection matrix of nonparametric that each Fisher subvectors are concentrated.
Specifically, propose that nonparametric linearly distinguishes parser to extract the differentiation information on class border, so that between increasing class Difference.After the above dimensionality reduction of two steps and feature normalization denoising has been carried out, new characteristic dimension reducing further, so that Avoid in the linear distinguishing analysis of nonparametric of final step that the problem of singular matrix occurs in Scatter Matrix in obtained class. Corresponding to the sub projection matrix in the projection matrix expression formula of each Fisher subvectors collection in Nonparametric Analysis part in Fig. 2 W_{13},W_{23},...,W_{K3}It is exactly the projection matrix that nonparametric linearly distinguishes parser.The linear distinguishing analysis of nonparametric (Nonparametric Linear Discriminant Analysis, NLDA) is to linear distinguishing analysis (Linear Discriminant Analysis, LDA) algorithm a kind of improvement.
4) subspace after the regular WCCN of covariance in the projection matrix after the principal component analysis PCA dimensionality reductions, class is thrown Shadow matrix and the linear distinguishing analysis projection matrix of nonparametric splice successively in sequence, obtain total subspace projection matrix, As subspace speaker model.
Specifically, after abovementioned subspace analysis processing has been carried out respectively to each subvector collection of Fisher super vectors, It can obtain the product of the projection matrix, i.e. three above projection matrix of each Fisher subvectors collection, W_{k}=W_{k1}W_{k2}W_{k3}. After having arrived the projection matrix of all Fisher subvectors collection, they are stitched together successively in sequence to form total Fisher and surpasses The projection matrix of vector, W_{Total}=[W_{1}...W_{k}...W_{K}]。
In step S104, the reference vector and instruction of speaker to be identified are obtained according to the subspace speaker model Practice the reference vector of sample speaker, and according to default computation rule, and the reference vector of the speaker to be identified with And the speaker to be identified is identified in the reference vector of the training sample speaker.
Particularly, in the modeling of training sample speaker model and test phase, first to training sample speaker and treat Identify that the voice of speaker extracts corresponding Fisher super vectors according to the processing method in training total projection matrix, then with training Good total projection matrix W_{Total}Fisher super vectors are mapped to the subspace of lowdimensional, training sample speaker is respectively obtained and treats Identify the reference vector R of speaker_{train}And R_{test}, finally according to formulaCalculate COS distance between two reference vectors is as test score；
When the test score is less than predetermined value, it is artificial identical to judge that the speaker to be identified speaks with training sample Speaker；The artificial different speaker otherwise, it is determined that the speaker to be identified and training sample are spoken.
In order to verify the validity of the method for distinguishing speek person proposed by the present invention based on Fisher super vectors, pass through experiment The property of Speaker Recognition System of the comparative analysis based on Fisher super vectors and the Speaker Recognition System based on JFA super vectors Energy.
Experimental data is derived from 2008 speakers of NIST and evaluates and tests database, wherein training and tested speech select core evaluation and test Male's phone training in task weighs the performance of Speaker Recognition System to call test part as evaluation and test data set. The training data of UBM comes from Switchboard II phase 2, Switchboard II phase 3, Switchboard Telephone voice data in Cellular Part 2 and NIST SRE 2004,2005,2006, share 2048 Gausses into Point.
To training nonparametric subspace distinguishing analysis projection matrix development set data be taken from NIST SRE 2004, 2005th, the call voice in 2006 databases, altogether comprising 563 speakers, each speaker has 8 voice data.
The value of the parameter Q of neighbour's feature vector number is controlled to be set to 4 in the distinguishing analysis algorithm of nonparametric subspace.Non ginseng Number subspace distinguishing analysis is with latent factorial analysis, 16 are set to the division number of Fisher super vectors.
Using JFA systems as contradistinction system, the UBM used in it is identical with the above, speaker space loading matrix V Order be 300, the order of eigenchannel space loading matrix U is 100, residual error loading matrix D by each Gauss in UBM model into Diagonal entry in the diagonal covariance matrix divided is spliced.
Nonparametric is investigated for first to distinguish in subspace analysis algorithm under the various combination of each projection matrix order System performance.Due to including 563 speakers in the development set data for training subspace projection matrix altogether, so subspace Projection matrix W_{k3}Order the upper limit be 562.In order to extract the differentiation information on classification boundaries, W_{k3}Order it is unsuitable less than normal, so this By W in experiment_{k3}Order be set as 550.Further, since PCA dimensionality reductions amplitude is most in nonparametric distinguishes subspace analysis algorithm Big, W_{k1}If order cross conference cause projection after feature vector in contain excessive redundancy, W_{k1}If order it is too small The loss of necessary differentiation information can be caused again, so the step will directly affect the quality of system performance.In the part Experiment, For main system performance of investigating with the situation of change of projection matrix order, table 1 shows the nonparametric area based on Fisher super vectors Divide analysis result：
Table 1
From table 1 it follows that work as the linear distinguishing analysis projection matrix W of nonparametric_{k3}Order to timing, surpassed based on Fisher The system performance of the nonparametric distinguishing analysis Speaker Recognition System of vector is within the specific limits with PCA projection matrixes W_{k1}Order Increase and improve, work as W_{k1}Order be 1300 when, system performance preferably (EER is minimum, that is, identify error rate it is minimum；MinDCF is (i.e. most Small detection cost) for 2.73), but with W_{k1}Order continue to increase so that projection properties in PCA subspaces vector contain compared with More redundancies, causes system performance to decline.
Second has been investigated the Speaker Recognition System proposed by the invention based on Fisher super vectors with surpassing based on JFA The comparison of the Speaker Recognition System of vector, as shown in figure 3, abscissa represents probability (the False Alarm that report an error Probability), ordinate represents miss probability (Miss probability).Although Fisher+NDA system performances compare JFA System is slightly poor, but it need not train speaker information space and channel to believe using the acoustic feature of substantial amounts of original language material Space is ceased, the speaker information in Fisher super vectors is directly compressed to one by LFA algorithms using EM iteration by PCA subspaces In the subspace of more low latitudes, so Fisher systems are whether in parameter learning process or in score calculating process, its meter Calculation complexity is lower than JFA system, and operation time is also fewer than JFA systems.
Embodiment two：
Fig. 4 shows the composition knot of the Speaker Recognition System provided by Embodiment 2 of the present invention based on Fisher super vectors Structure, for convenience of description, illustrate only and the relevant part of the embodiment of the present invention.
The Speaker Recognition System based on Fisher super vectors includes：
Extraction unit 41, for extracting Fisher super vectors；
Division unit 42, for the Fisher super vectors of extraction to be divided into multiple Fisher subvectors collection；
Model foundation unit 43, for being divided based on nonparametric distinguishing analysis algorithm each Fisher subvectors collection Analysis, to establish subspace speaker model；
Recognition unit 44, for obtained according to the subspace speaker model speaker to be identified reference vector and The reference vector of training sample speaker, and according to default computation rule, and the reference vector of the speaker to be identified And the speaker to be identified is identified in the reference vector of the training sample speaker.
Further, the Fisher super vectors are corresponding by all gauss components in GMM modelWithSplicing Forming, the dimension of the Fisher super vectors is (2d+1) K, wherein：
Wherein,Value be scalar,WithValue be d dimension vector, d >=1；The feature vector sequence of speaker's voice Arrange X={ x_{t}, t=1...T }, x_{t}Represent feature vector, T represents the number of feature vector in characteristic vector sequence X, and K represents GMM The number of gauss component in model,
Kth of gauss component in GMM modelw_{k}Represent The weight of kth of gauss component in GMM model,μ_{k}Represent kth of gauss component in GMM model Mean vector, Σ_{k}Represent the covariance matrix of kth of gauss component in GMM model,Represent Σ_{k}Member on middle diagonal Element.
Further, the model foundation unit 43 includes：
First processing module 431, each Fisher subvectors collection is included in for being removed using principal component analysis PCA algorithms In redundancy, obtain the projection matrix after the dimensionality reduction of each Fisher subvectors collection；
Second processing module 432, for the regular WCCN algorithms of covariance in use class to the projection matrix after the dimensionality reduction Handled, obtain the corresponding subspace projection matrix of each Fisher subvectors collection；
3rd processing module 433, for extracting the subspace projection using the linear distinguishing analysis NLDA algorithms of nonparametric The differentiation information on the class border of matrix, obtains the linear property distinguishing analysis projection square of nonparametric that each Fisher subvectors are concentrated Battle array；
Model building module 434, for covariance in the projection matrix after the principal component analysis PCA dimensionality reductions, class to be advised The linear distinguishing analysis projection matrix of subspace projection matrix and nonparametric after whole WCCN splices successively in sequence, obtains total Subspace projection matrix.
Further, the recognition unit 44 includes：
Computing module 441, for obtaining the reference vector of speaker to be identified according to the subspace speaker model R_{train}And the reference vector R of training sample speaker_{test}, and according to formulaMeter The COS distance between two reference vectors is calculated as test score；
Identification module 442, for when the test score is less than predetermined value, judging the speaker to be identified and training Sample is spoken artificial identical speaker.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with abovementioned each work( Can unit, module division progress for example, in practical application, can be as needed and by abovementioned function distribution by different Functional unit, module are completed, will the internal structure of the system be divided into different functional units or module, more than completion The all or part of function of description.Each functional unit in embodiment can be integrated in a processing unit or Unit is individually physically present, can also two or more units integrate in a unit, abovementioned integrated unit Both it can be realized, can also be realized in the form of SFU software functional unit in the form of hardware.In addition, each functional unit, mould The specific name of block is not limited to the protection domain of the application also only to facilitate mutually distinguish.It is single in said system Member, the specific work process of module, may be referred to the corresponding process in preceding method embodiment, details are not described herein.
In conclusion the embodiment of the present invention extraction voice data in Fisher super vectors as speaker feature to Amount, and Speaker Identification is carried out on the basis of Fisher super vectors using subspace analysis modeling technique.Since Fisher surpasses Vector extraction is simple, and the dimension with than JFA super vector highers, and does not do channel compensation processing, so as to effective The accuracy rate and efficiency of Speaker Identification are improved, there is stronger usability and practicality.
Those of ordinary skill in the art may realize that each exemplary list described with reference to the embodiments described herein Member and algorithm steps, can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually Performed with hardware or software mode, applicationspecific and design constraint depending on technical solution.Professional technician Described function can be realized using distinct methods to each specific application, but this realization is it is not considered that exceed The scope of the present invention.
In embodiment provided by the present invention, it should be understood that disclosed apparatus and method, can pass through others Mode is realized.For example, device embodiment described above is only schematical, for example, the division of the module or unit, Only a kind of division of logic function, can there is an other dividing mode when actually realizing, such as multiple units or component can be with With reference to or be desirably integrated into another system, or some features can be ignored, or not perform.It is another, it is shown or discussed Mutual coupling or directcoupling or communication connection can be by some interfaces, the INDIRECT COUPLING of device or unit or Communication connection, can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separate, be shown as unit The component shown may or may not be physical location, you can with positioned at a place, or can also be distributed to multiple In network unit.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units integrate in a unit.Abovementioned integrated list Member can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and is used as independent production marketing or use When, it can be stored in a computer read/write memory medium.Based on such understanding, the technical solution of the embodiment of the present invention The part substantially to contribute in other words to the prior art or all or part of the technical solution can be with software products Form embody, which is stored in a storage medium, including some instructions use so that one Computer equipment (can be personal computer, server, or network equipment etc.) or processor (processor) perform this hair The all or part of step of bright each embodiment the method for embodiment.And foregoing storage medium includes：USB flash disk, mobile hard disk, Readonly storage (ROM, ReadOnly Memory), random access memory (RAM, Random Access Memory), magnetic Dish or CD etc. are various can be with the medium of store program codes.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations；Although with reference to foregoing reality Example is applied the present invention is described in detail, it will be understood by those of ordinary skill in the art that：It still can be to foregoing each Technical solution described in embodiment is modified, or carries out equivalent substitution to which part technical characteristic；And these are changed Or replace, the essence of appropriate technical solution is departed from the spirit and model of each embodiment technical solution of the embodiment of the present invention Enclose.
Claims (6)
 A kind of 1. method for distinguishing speek person based on Fisher super vectors, it is characterised in that the described method includes：Extract Fisher super vectors；The Fisher super vectors of extraction are divided into multiple Fisher subvectors collection；Each Fisher subvectors collection is analyzed based on nonparametric distinguishing analysis algorithm, to establish subspace speaker's mould Type；The reference vector of speaker to be identified and the ginseng of training sample speaker are obtained according to the subspace speaker model Vector is examined, and is said according to default computation rule, and the reference vector of the speaker to be identified and the training sample The speaker to be identified is identified in the reference vector of words people；The Fisher super vectors are corresponding by all gauss components in GMM modelWithIt is spliced, it is described The dimension of Fisher super vectors is (2d+1) K, wherein：Wherein,Value be scalar,WithValue be d dimension vector, d >=1；The characteristic vector sequence X of speaker's voice ={ x_{t}, t=1...T }, x_{t}Represent feature vector, T represents the number of feature vector in characteristic vector sequence X, and K represents GMM model The number of middle gauss component,Kth of gauss component in GMM modelw_{k}Represent The weight of kth of gauss component in GMM model,μ_{k}Represent kth of gauss component in GMM model Mean vector, Σ_{k}Represent the covariance matrix of kth of gauss component in GMM model,Represent Σ_{k}Member on middle diagonal Element.
 2. the method as described in claim 1, it is characterised in that the nonparametric distinguishing analysis algorithm that is based on is to each Fisher Subvector collection is analyzed, and is included with establishing subspace speaker model：The redundancy for being included in each Fisher subvectors and concentrating is removed using principal component analysis PCA algorithms, is obtained each Projection matrix after the dimensionality reduction of Fisher subvector collection；The projection matrix after the dimensionality reduction is handled using the regular WCCN algorithms of covariance in class, obtains each Fisher The corresponding subspace projection matrix of vector set；The differentiation information on the class border of the subspace projection matrix is extracted using the linear distinguishing analysis NLDA algorithms of nonparametric, is obtained The linear distinguishing analysis projection matrix of nonparametric concentrated to each Fisher subvectors；By the subspace projection matrix after the regular WCCN of covariance in the projection matrix after the principal component analysis PCA dimensionality reductions, class And the linear distinguishing analysis projection matrix of nonparametric splices successively in sequence, obtains total subspace projection matrix.
 3. method according to claim 1, it is characterised in that described to be identified according to subspace speaker model acquisition The reference vector of speaker and the reference vector of training sample speaker, according to default computation rule, and described wait to know The reference vector of other speaker and the reference vector of the training sample speaker, which carry out Speaker Identification step, to be included：The reference vector R of speaker to be identified is obtained according to the subspace speaker model_{train}And training sample speaker Reference vector R_{test}, and according to formulaCalculate the cosine between two reference vectors Distance is as test score；When the test score is less than predetermined value, judge that the speaker to be identified and training sample are spoken artificially identical say Talk about people.
 A kind of 4. Speaker Recognition System based on Fisher super vectors, it is characterised in that the system comprises：Extraction unit, for extracting Fisher super vectors；Division unit, for the Fisher super vectors of extraction to be divided into multiple Fisher subvectors collection；Model foundation unit, for being analyzed based on nonparametric distinguishing analysis algorithm each Fisher subvectors collection, to build Vertical subspace speaker model；Recognition unit, for obtaining the reference vector and training sample of speaker to be identified according to the subspace speaker model The reference vector of this speaker, and according to default computation rule, and the reference vector of the speaker to be identified and institute The speaker to be identified is identified in the reference vector for stating training sample speaker；The Fisher super vectors are corresponding by all gauss components in GMM modelWithIt is spliced, it is described The dimension of Fisher super vectors is (2d+1) K, wherein：Wherein,Value be scalar,WithValue be d dimension vector, d >=1；The characteristic vector sequence X of speaker's voice ={ x_{t}, t=1...T }, x_{t}Represent feature vector, T represents the number of feature vector in characteristic vector sequence X, and K represents GMM model The number of middle gauss component,Kth of gauss component in GMM modelw_{k}Represent The weight of kth of gauss component in GMM model,μ_{k}Represent kth of gauss component in GMM model Mean vector, Σ_{k}Represent the covariance matrix of kth of gauss component in GMM model,Represent Σ_{k}Member on middle diagonal Element.
 5. system as claimed in claim 4, it is characterised in that the model foundation unit includes：First processing module, the superfluous of each Fisher subvectors concentration is included in for being removed using principal component analysis PCA algorithms Remaining information, obtains the projection matrix after the dimensionality reduction of each Fisher subvectors collection；Second processing module, at using the regular WCCN algorithms of covariance in class to the projection matrix after the dimensionality reduction Reason, obtains the corresponding subspace projection matrix of each Fisher subvectors collection；3rd processing module, for extracting the class of the subspace projection matrix using the linear distinguishing analysis NLDA algorithms of nonparametric The differentiation information on border, obtains the linear distinguishing analysis projection matrix of nonparametric that each Fisher subvectors are concentrated；Model building module, for by after the regular WCCN of covariance in the projection matrix after the principal component analysis PCA dimensionality reductions, class Subspace projection matrix and the linear distinguishing analysis projection matrix of nonparametric splice successively in sequence, obtain total subspace Projection matrix.
 6. system according to claim 4, it is characterised in that the recognition unit includes：Computing module, for obtaining the reference vector R of speaker to be identified according to the subspace speaker model_{train}And instruction Practice the reference vector R of sample speaker_{test}, and according to formulaCalculate two references COS distance between vector is as test score；Identification module, for when the test score is less than predetermined value, judging that the speaker to be identified says with training sample The artificial identical speaker of words.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN201410802816.6A CN104538035B (en)  20141219  20141219  A kind of method for distinguishing speek person and system based on Fisher super vectors 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN201410802816.6A CN104538035B (en)  20141219  20141219  A kind of method for distinguishing speek person and system based on Fisher super vectors 
Publications (2)
Publication Number  Publication Date 

CN104538035A CN104538035A (en)  20150422 
CN104538035B true CN104538035B (en)  20180501 
Family
ID=52853551
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN201410802816.6A CN104538035B (en)  20141219  20141219  A kind of method for distinguishing speek person and system based on Fisher super vectors 
Country Status (1)
Country  Link 

CN (1)  CN104538035B (en) 
Families Citing this family (6)
Publication number  Priority date  Publication date  Assignee  Title 

CN105632502A (en) *  20151210  20160601  江西师范大学  Weighted pairwise constraint metric learning algorithmbased speaker recognition method 
CN105869645B (en) *  20160325  20190412  腾讯科技（深圳）有限公司  Voice data processing method and device 
CN106128466B (en) *  20160715  20190705  腾讯科技（深圳）有限公司  Identity vector processing method and device 
CN106297807B (en) *  20160805  20190301  腾讯科技（深圳）有限公司  The method and apparatus of training Voiceprint Recognition System 
CN106601258A (en) *  20161212  20170426  广东顺德中山大学卡内基梅隆大学国际联合研究院  Speaker identification method capable of information channel compensation based on improved LSDA algorithm 
CN107633845A (en) *  20170911  20180126  清华大学  A kind of duscriminant local message distance keeps the method for identifying speaker of mapping 
Citations (4)
Publication number  Priority date  Publication date  Assignee  Title 

CN102222500A (en) *  20110511  20111019  北京航空航天大学  Extracting method and modeling method for Chinese speech emotion combining emotion points 
CN103077720A (en) *  20121219  20130501  中国科学院声学研究所  Speaker identification method and system 
CN103578481A (en) *  20120724  20140212  东南大学  Method for recognizing crosslinguistic voice emotion 
CN104167208A (en) *  20140808  20141126  中国科学院深圳先进技术研究院  Speaker recognition method and device 
Family Cites Families (1)
Publication number  Priority date  Publication date  Assignee  Title 

DE10047723A1 (en) *  20000927  20020411  Philips Corp Intellectual Pty  Method for determining an individual space for displaying a plurality of training speakers 

2014
 20141219 CN CN201410802816.6A patent/CN104538035B/en active IP Right Grant
Patent Citations (4)
Publication number  Priority date  Publication date  Assignee  Title 

CN102222500A (en) *  20110511  20111019  北京航空航天大学  Extracting method and modeling method for Chinese speech emotion combining emotion points 
CN103578481A (en) *  20120724  20140212  东南大学  Method for recognizing crosslinguistic voice emotion 
CN103077720A (en) *  20121219  20130501  中国科学院声学研究所  Speaker identification method and system 
CN104167208A (en) *  20140808  20141126  中国科学院深圳先进技术研究院  Speaker recognition method and device 
NonPatent Citations (1)
Title 

CLUSTERING SIMILAR ACOUSTIC CLASSES IN THE FISHERVOICE FRAMEWORK;Na li， et al.;《Acoustics, Speech and Signal Processing (ICASSP)，2013 IEEE International Conference on》;20131021;77267728 * 
Also Published As
Publication number  Publication date 

CN104538035A (en)  20150422 
Similar Documents
Publication  Publication Date  Title 

US9865266B2 (en)  Method and apparatus for automated speaker parameters adaptation in a deployed speaker verification system  
EP3292515B1 (en)  Method for distinguishing one or more components of signal  
CN104143326B (en)  A kind of voice command identification method and device  
Mesgarani et al.  Discrimination of speech from nonspeech based on multiscale spectrotemporal modulations  
TWI527023B (en)  A voiceprint recognition method and apparatus  
Lee et al.  Recognition of negative emotions from the speech signal  
US20150199960A1 (en)  IVector Based Clustering Training Data in Speech Recognition  
Skowronski et al.  Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition  
CN101261832B (en)  Extraction and modeling method for Chinese speech sensibility information  
CN100377209C (en)  Speaker recognition systems  
You et al.  An SVM kernel with GMMsupervector based on the Bhattacharyya distance for speaker recognition  
CN105244029B (en)  Speech recognition postprocessing approach and system  
CN103065620B (en)  Method with which text input by user is received on mobile phone or webpage and synthetized to personalized voice in real time  
CN106847292B (en)  Method for recognizing soundgroove and device  
KR20160032536A (en)  Signal process algorithm integrated deep neural network based speech recognition apparatus and optimization learning method thereof  
CN101833951B (en)  Multibackground modeling method for speaker recognition  
US10621971B2 (en)  Method and device for extracting speech feature based on artificial intelligence  
CN105575394A (en)  Voiceprint identification method based on global change space and deep learning hybrid modeling  
Metallinou et al.  Visual emotion recognition using compact facial representations and viseme information  
CN104750674B (en)  A kind of manmachine conversation's satisfaction degree estimation method and system  
US10176811B2 (en)  Neural networkbased voiceprint information extraction method and apparatus  
EP3435374B1 (en)  Method and device for voice data processing and storage medium  
Tang et al.  Partially supervised speaker clustering  
WO2015180368A1 (en)  Variable factor decomposition method for semisupervised speech features  
CN107610707B (en)  A kind of method for recognizing soundgroove and device 
Legal Events
Date  Code  Title  Description 

PB01  Publication  
SE01  Entry into force of request for substantive examination  
C10  Entry into substantive examination  
GR01  Patent grant  
GR01  Patent grant 