CN105810199A - Identity verification method and device for speakers - Google Patents

Identity verification method and device for speakers Download PDF

Info

Publication number
CN105810199A
CN105810199A CN201410844272.XA CN201410844272A CN105810199A CN 105810199 A CN105810199 A CN 105810199A CN 201410844272 A CN201410844272 A CN 201410844272A CN 105810199 A CN105810199 A CN 105810199A
Authority
CN
China
Prior art keywords
subspace
vector
jfa
speaker
super
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410844272.XA
Other languages
Chinese (zh)
Inventor
李志锋
李娜
乔宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201410844272.XA priority Critical patent/CN105810199A/en
Publication of CN105810199A publication Critical patent/CN105810199A/en
Pending legal-status Critical Current

Links

Abstract

The invention belongs to the technical field of voices, and provides an identity verification method and a device for speakers. The method comprises the steps of extracting JFA super vectors out of training voices to generate a first sub-vector; projecting the first sub-vector into a first subspace based on the PCA algorithm; randomly sampling the first subspace to obtain Q second subspaces; respectively mapping vectors in the Q second subspaces into Q third subspaces; analyzing and modeling the Q third subspaces based on the non-parametric linear discriminant analysis; respectively projecting the JFA super vectors of each training voice and each test voice into the Q third subspaces based on a projection matrix W2 * W3 to obtain Q target speaker reference vectors and Q test reference vectors; fusing the outputs of Q classifiers; and deeming a speaker of a training voice corresponding to a highest score of a fusion result as the speaker of a test voice. The method and the device well improve the system performance of a speaker identity confirmation system.

Description

The indentity identifying method of a kind of speaker and device
Technical field
The invention belongs to voice technology field, particularly relate to indentity identifying method and the device of a kind of speaker.
Background technology
The high speed development of the network information technology allows one to obtain various information easily, but consequently also creates various information security issue, and under this background, identity identifying technology is particularly important.Compared to authentication means such as fingerprint, iris, face, handwritten signatures, the voice of people becomes the emphasis of identity identifying technology development because it has the characteristic that collection is easy, is prone to storage and is difficult to imitate, and its key problem in technology is in that the identity validation of speaker.
It is a kind of popular at present method that speaker is carried out identity validation that the speech data of the different durations of speaker is converted to the high dimensional feature data with same dimension by certain algorithm, in order to solve " dimension disaster " problem and the small sample problem that high dimensional feature data are brought, researcheres propose the speaker ' s identity based on subspace analysis method and confirm algorithm, but, current subspace analysis method yet suffers from problems with: speaker ' s identity is confirmed that the performance impact of system is bigger by the dimension size of subspace.
Summary of the invention
The purpose of the embodiment of the present invention is in that to provide indentity identifying method and the device of a kind of speaker, aiming to solve the problem that in the method being currently based on subspace analysis speaker is carried out identity validation, speaker ' s identity is confirmed the problem that the performance impact of system is bigger by the dimension size of subspace.
The embodiment of the present invention is achieved in that the indentity identifying method of a kind of speaker, including:
Training voice is extracted simultaneous factor analysis JFA super vector Mih=[mih1,mih2,...,mihN], wherein, described MihRepresent the JFA super vector of the h article training voice of i-th speaker in training set;
From the JFA super vector M of described training voiceih=[mih1,mih2,...,mihN] in extract k mean vector, generate the first subvector Sih=[m'ih1,m'ih2,...,m'ihk];
Utilize pivot analysis PCA algorithm by described first subvector Sih=[m'ih1,m'ih2,...,m'ihk] project in the first subspace that dimension is J;
Described first subspace is carried out stochastical sampling, obtains Q the second subspace;
Respectively the vector projecting to Q described second subspace is carried out covariance normalization WCCN process in class, train projection matrix W2, then the vector projected in Q described second subspace is respectively mapped in Q the 3rd subspace by described projection matrix W2;
Utilize nonparametric linearly to distinguish analysis and Q described 3rd subspace is analyzed modeling, obtain projection matrix W3;
Utilize projection matrix W2*W3, the JFA super vector that every is trained voice is projected to respectively Q described 3rd subspace, obtains Q target speaker's reference vector;
Extract the JFA super vector of tested speech;
Utilize described projection matrix W2*W3, the JFA super vector of described tested speech is projected to respectively Q described 3rd subspace, obtain Q test reference vector;
Calculate the COS distance between described test reference vector and Q described target speaker's reference vector respectively, obtain the output of Q grader;
By preset algorithm, the output of Q described grader is merged;
The speaker that speaker verification is described tested speech by training voice corresponding for the fusion results of highest scoring.
The another object of the embodiment of the present invention is in that to provide the identity confirmation device of a kind of speaker, including:
First extraction unit, for extracting simultaneous factor analysis JFA super vector M to training voiceih=[mih1,mih2,...,mihN], wherein, described MihRepresent the JFA super vector of the h article training voice of i-th speaker in training set;
First dimensionality reduction unit, is used for the JFA super vector M from described training voiceih=[mih1,mih2,...,mihN] in extract k mean vector, generate the first subvector Sih=[m'ih1,m'ih2,...,m'ihk];
Second dimensionality reduction unit, is used for utilizing pivot analysis PCA algorithm by described first subvector Sih=[m'ih1,m'ih2,...,m'ihk] project in the first subspace that dimension is J;
Stochastical sampling unit, for described first subspace is carried out stochastical sampling, obtains Q the second subspace;
WCCN processing unit, for respectively the vector projecting to Q described second subspace being carried out covariance normalization WCCN process in class, train projection matrix W2, then the vector projected in Q described second subspace is respectively mapped in Q the 3rd subspace by described projection matrix W2;
Nonparametric linearly distinguishes analytic unit, is used for utilizing nonparametric linearly to distinguish analysis and Q described 3rd subspace is analyzed modeling, obtain projection matrix W3;
First reference vector generates unit, is used for utilizing projection matrix W2*W3, the JFA super vector that every is trained voice projects to Q described 3rd subspace respectively, obtains Q target speaker's reference vector;
Second extraction unit, for extracting the JFA super vector of tested speech;
Second reference vector generates unit, is used for utilizing described projection matrix W2*W3, and the JFA super vector of described tested speech projects to Q described 3rd subspace respectively, obtains Q test reference vector;
Output unit, for calculating the COS distance between described test reference vector and Q described target speaker's reference vector respectively, obtains the output of Q grader;
Integrated unit, for merging the output of Q described grader by preset algorithm;
Confirmation unit, for the speaker that speaker verification is described tested speech by training voice corresponding for the fusion results of highest scoring.
The embodiment of the present invention adopts the algorithm frame based on the sampling of double-deck subspace, except directly adopting subspace analysis method and original high-dimensional feature space carried out dimensionality reduction, additionally use stochastic subspace sampling method and construct the subspace that some dimensions are relatively low, then a grader is trained for every sub spaces, final court verdict exports fusion by multi-categorizer and obtains, and improves speaker ' s identity well and confirms the systematic function of system.
Accompanying drawing explanation
Fig. 1 is the flowchart of the indentity identifying method of the speaker that the embodiment of the present invention provides;
Fig. 2 is the algorithm frame figure of the indentity identifying method of the speaker that the embodiment of the present invention provides;
Fig. 3 is the structured flowchart of the identity confirmation device of the speaker that the embodiment of the present invention provides.
Detailed description of the invention
In order to make the purpose of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein is only in order to explain the present invention, is not intended to limit the present invention.
What Fig. 1 illustrated the indentity identifying method of the speaker that the embodiment of the present invention provides realizes flow process, and details are as follows:
In S101, training voice is extracted simultaneous factor analysis (JointFactorAnalysis, JFA) super vector Mih=[mih1,mih2,...,mihN], wherein, described MihRepresent the JFA super vector of the h article training voice of i-th speaker in training set.
nullJFA theory is thought,Based on " gauss hybrid models-universal background model " (GaussianMixtureModel-UniversalBackgroundModel,GMM-UBM) speaker ' s identity confirms in algorithm frame,By maximum a posteriori probability (MaximunAPosteriori,MAP) the average super vector of the speaker model that method obtains mainly contains speaker and channel two parts information,And this equal Gaussian distributed of two parts information,According to JFA method, the channel information in speaker model is removed,The performance of speaker identification system can be greatly improved,Therefore,The embodiment of the present invention utilizes JFA method advantage under solving channel mismatch situation,Using feature as speaker of the average super vector that adopts the speaker model after JFA method denoising,First,In S101,Utilize JFA method that the training voice in training set is carried out JFA super vector extraction one by one,This JFA super vector represents the super vector being stitched together in order by the average super vector of each gauss component in speaker model.
In S102, from the JFA super vector M of described training voiceih=[mih1,mih2,...,mihN] in extract k mean vector, generate the first subvector Sih=[m'ih1,m'ih2,...,m'ihk]。
A part of redundancy is removed in order to preliminary in the higher-dimension original feature space that generates at S101, in S102, a part is have chosen from the mean vector of composition JFA super vector, form the subspace that a dimension is relatively low, this subspace contains the most of useful information in JFA super vector, at this, if corresponding to the first subvector S of JFA super vector in this subspaceih=[m'ih1,m'ih2,...,m'ihk]。
As one embodiment of the present of invention, S102 particularly as follows:
From the JFA super vector M of described training voiceih=[mih1,mih2,...,mihN] in extract the mean vector being arranged in front k, generate the first subvector Sih=[m'ih1,m'ih2,...,m'ihk]。
In S103, utilize pivot analysis (PrincipalComponentAnalysis, PCA) algorithm by described first subvector Sih=[m'ih1,m'ih2,...,m'ihk] project in the first subspace that dimension is J, generate the second subvector Oih=[o1,o2,...,oJ]。
Due to the first subvector Sih=[m'ih1,m'ih2,...,m'ihk] still there is higher dimension, and the numeric distribution ratio of each dimension is sparse, still comprises substantial amounts of redundancy, therefore, adopts PCA method to the first subvector Sih=[m'ih1,m'ih2,...,m'ihk] carry out optimum dimensionality reduction compression, it is projected in the lower-dimensional subspace that dimension is J by projection matrix W1, obtains the second subvector Oih=[o1,o2,...,oJ]。
In S104, described first subspace is carried out stochastical sampling, obtain Q the second subspace T1,T2,...,TQ
In S105, respectively the vector projecting to Q described second subspace is carried out covariance normalization (WithinClassCovarianceNormalization in class, WCCN) process, train projection matrix W2, then the vector projected in Q described second subspace is respectively mapped in Q the 3rd subspace by described projection matrix W2.
In the present embodiment, each stochastic subspace that S104 is obtained carries out segment processing, first the development set data projected in each stochastic subspace are carried out WCCN process, WCCN projection matrix W2 is trained by development set data, again the low dimensional feature vector in stochastic subspace is mapped to a new subspace by WCCN projection matrix W2 further, namely in the 3rd subspace, thus obtaining Q new stochastic subspace.
In S106, utilize nonparametric linearly to distinguish analysis and Q described 3rd subspace is analyzed modeling, obtain projection matrix W3.
Now, the projection matrix of each 3rd subspace is W2*W3.
Subspace analysis result according to S101 to S106, completes speaker ' s identity and confirms the training process of system, and for Q the 3rd subspace, every sub spaces all can obtain a corresponding subspace grader.
Followed by be then the speaker ' s identity test process (or for categorizing process) that confirms system:
In S107, utilize projection matrix W2*W3, the JFA super vector that every is trained voice is projected to respectively Q described 3rd subspace, obtains Q target speaker reference vector Rtrain(q), q=1,2 ..., Q.
In S108, extract the JFA super vector of tested speech.
For tested speech, specifically, J=m+Vy+Dz can be utilized to convert the voice of speaker to JFA super vector, wherein, described J represents JFA super vector, m represents UBM average super vector, and V, D represent speaker space loading matrix and residual error space loading matrix, y, z respectively speaker's factor and the residual error factor respectively.
In S109, utilize described projection matrix W2*W3, the JFA super vector of described tested speech is projected to respectively Q described 3rd subspace, obtains Q test reference vector Rtest(q), q=1,2 ..., Q.
The processing method that S109 and S107 adopts is identical.
In S110, calculate the COS distance between described test reference vector and Q described target speaker's reference vector respectively, obtain the output of Q grader.
The computing formula of S110 is:
D ( R train , R test ) = | | R train T R test | | R train T R train R test T R test .
In S111, by preset algorithm, the output of Q described grader is merged.
In the present embodiment, Q result of calculation can be exported respectively by S110, Q grader, then, in S111, according to default algorithm, this Q result of calculation be merged.
As one embodiment of the present of invention, S111 is particularly as follows: carry out linear fusion by the output of Q described grader.
Or, it is also possible to by ballot method, the output of Q grader is merged.
In S112, by the speaker that speaker verification is described tested speech of training voice corresponding for the fusion results of highest scoring.
Corresponding to different training voices, through above-mentioned S107 to S111, correspondence can obtain different score output, then according to every height training the score that voice is corresponding to export, by the speaker that speaker verification is tested speech of training voice corresponding for the fusion results of highest scoring.
Corresponding to the indentity identifying method of speaker above and shown in Fig. 1, Fig. 2 illustrates the algorithm frame of the indentity identifying method of the speaker that the embodiment of the present invention provides.As can be seen from Figure 2, in embodiments of the present invention, is sampled in subspace in original feature space and combine with the stochastical sampling in the subspace obtained after dimensionality reduction, the average that ground floor subspace sampling in this algorithm frame is each gauss component for composition JFA super vector carries out, purpose is to remove a part of redundancy, determine the subspace of a suitable dimension, owing to JFA super vector and GMM average super vector are the same in composition structure, can regard as and be spliced in order by the mean vector of each gauss component in GMM model, therefore, it is what to carry out with the mean vector in JFA super vector for ultimate unit to the subspace sampling of ground floor in Fig. 2 algorithm frame;The second layer is then that the subspace to the more low dimensional obtained after PCA dimensionality reduction in ground floor subspace carries out stochastical sampling, forms some new subspaces.The mode of choosing of above two-layer subspace is different, and choosing of ground floor subspace is nonrandom, and choosing of second layer subspace is random.
It follows that evaluate and test the indentity identifying method of the speaker that the embodiment of the present invention provides by concrete experiment speaker ' s identity is confirmed the systematic function impact of system:
In this experiment, experimental data takes from National Institute of Standards and Technology (NationalInstituteofStandardsandTechnology, NIST) 2008 speakers evaluate and test data base, wherein, training voice and tested speech select the male's phone training part in core evaluation and test task and male's call test part;The training data of UBM from SwitchboardIIphase2, SwitchboardIIphase3, SwitchboardCellularPart2 and NISTSRE2004, the telephone voice data in 2005,2006, have 2048 gauss components;In order to train the data of projection matrix W3 that nonparametric linearly distinguishes analysis to be taken from NISTSRE2004,2005 and 2006 call voices in data base, comprising 563 speakers altogether, each speaker has 8 speech datas;The training data of the UBM in JFA system is same as above, the order of speaker space loading matrix V is 300, the order of eigenchannel space loading matrix U is 100, and residual error loading matrix D is spliced by the diagonal entry in the diagonal covariance matrix of each gauss component in UBM model.If no special instructions, in this experiment, PCA, WCCN and nonparametric are linearly distinguished the dimension of the projection matrix of analysis and are respectively as follows: (51 × k) × J, (E1+E2) × 799 and 799 × 550, the number of stochastic subspace and the number Q of fundamental classifier are set as 10, and nonparametric is linearly distinguished in analysis, and the number of neighbour's sample is set as 4.
In sampling in the subspace of algorithm frame ground floor, generate after original feature space through S101, S102 chooses front 1280 the Gaussian mean vectors in the JFA super vector after sequence, it is thus achieved that the first subvector Sih, but, the dimension of this first subvector is still significantly high relative to the training sample that training data is concentrated, therefore, in order to train reliable and stable subspace grader, it is necessary to the first subvector projects to the PCA subspace of low-dimensional further.In this experiment, if the dimension of the characteristic vector after PCA dimensionality reduction is J.
In the subspace sampling of the second layer of algorithm frame, before carrying out stochastical sampling, in order to ensure the performance of each subspace, basis grader, first front E1 the pivot component containing maximum quantity of information in the first subspace is fixed up (namely choosing front E1 the pivot component containing maximum quantity of information in described first subspace), and random sampling algorithms is only applied to remaining J-E1 the pivot component in the first subspace, by random sampling algorithms, randomly selecting E2 pivot component dimension from this J-E1 pivot component is the stochastic subspace of E1+E2.
In the experimentation of second layer subspace sampling, the relatively figure of merit of J is determined by cross validation, the value of J is fixed as 1200 or 1300, the value of E1+E2 is fixed as 800, for different (E1, E2), random establishment 10 sub spaces, that is, creating 10 fundamental classifier, final fusion results is obtained by linear fusion.
Table 1 lists (E1, the speaker ' s identity confirmation method of two-layer sampling subspace when seven kinds of various combinations E2) experimental result on NISTSRE2008 data base, wherein list the result of wherein best experimental result, worst experimental result and emerging system, key assignments in table is (EER (%), minDCF × 100), EER is the error rates (EqualErrorRate) such as identification, minDCF is minimum detection cost (MinimumDetectionCostFunction), is the performance parameter of systematic function:
Table 1
From table 1, it is possible to observe and obtain:
(1) for every kind of combination of E1 and E2, the performance of single fundamental classifier is also unstable, this point from best single fundamental classifier and worst single fundamental classifier the result in EER and minDCF find out;
(2) in the first six group experiment, the EER of often best in group single fundamental classifier is below last group (800,0) system combined, E2 in last group experiment is 0, the experiment of this group does not adopt stochastic subspace, this shows, the size of characteristic value is not weigh the absolute standard of its separating capacity representated by corresponding pivot component, pivot component corresponding to some less characteristic values is likely to containing more differentiation information, this be also to ensure that constructed by the stochastic subspace fundamental classifier that goes out there is certain diversity and complementary immanent cause;
(3) experimental result after multiple Classifiers Combination is all little than fundamental classifier in EER and minDCF, and in each group experiment, the fusion results of multi-categorizer is more stable on the whole, and this shows that carrying out multiple Classifiers Combination can be effectively improved systematic function;
(4) speaker ' s identity based on two-layer sampling subspace that the present invention proposes confirms that system confirms system relative to the single speaker ' s identity being provided without stochastic subspace, and EER value falls below 4.01 from 4.32, and minDCF also has slight drop.
The embodiment of the present invention adopts the algorithm frame based on the sampling of double-deck subspace, except directly adopting subspace analysis method and original high-dimensional feature space carried out dimensionality reduction, additionally use stochastic subspace sampling method and construct the subspace that some dimensions are relatively low, then a grader is trained for every sub spaces, final court verdict exports fusion by multi-categorizer and obtains, and improves speaker ' s identity well and confirms the systematic function of system.
Corresponding to embodiment described above, Fig. 3 illustrates the structured flowchart of the identity confirmation device of the speaker that the embodiment of the present invention provides.For the ease of illustrating, illustrate only part related to the present embodiment.
With reference to Fig. 3, this device includes:
First extraction unit 301, extracts JFA super vector M to training voiceih=[mih1,mih2,...,mihN], wherein, described MihRepresent the JFA super vector of the h article training voice of i-th speaker in training set.
First dimensionality reduction unit 302, from the JFA super vector M of described training voiceih=[mih1,mih2,...,mihN] in extract k mean vector, generate the first subvector Sih=[m'ih1,m'ih2,...,m'ihk]。
Second dimensionality reduction unit 303, utilizes PCA algorithm by described first subvector Sih=[m'ih1,m'ih2,...,m'ihk] project in the first subspace that dimension is J.
Stochastical sampling unit 304, carries out stochastical sampling to described first subspace, obtains Q the second subspace.
WCCN processing unit 305, carries out WCCN process to the vector projecting to Q described second subspace respectively, trains projection matrix W2, then is respectively mapped in Q the 3rd subspace by described projection matrix W2 by the vector projected in Q described second subspace.
Nonparametric linearly distinguishes analytic unit 306, utilizes nonparametric linearly to distinguish analysis and Q described 3rd subspace is analyzed modeling, obtain projection matrix W3.
First reference vector generates unit 307, utilizes projection matrix W2*W3, the JFA super vector that every is trained voice projects to Q described 3rd subspace respectively, obtains Q target speaker's reference vector.
Second extraction unit 308, extracts the JFA super vector of tested speech.
Second reference vector generates unit 309, utilizes described projection matrix W2*W3, and the JFA super vector of described tested speech projects to Q described 3rd subspace respectively, obtains Q test reference vector.
Output unit 310, calculates the COS distance between described test reference vector and Q described target speaker's reference vector respectively, obtains the output of Q grader.
Integrated unit 311, is merged the output of Q described grader by preset algorithm.
Confirmation unit 312, by the speaker that speaker verification is described tested speech of training voice corresponding for the fusion results of highest scoring.
Alternatively, described first dimensionality reduction unit 302 specifically for:
From the JFA super vector M of described training voiceih=[mih1,mih2,...,mihN] in extract the mean vector being arranged in front k, generate the first subvector Sih=[m'ih1,m'ih2,...,m'ihk]。
Alternatively, described stochastical sampling unit 304 includes:
First chooses subelement, chooses front E1 the pivot component containing maximum quantity of information in described first subspace.
Second chooses subelement, randomly selects E2 pivot component by random sampling algorithms from residue J-E1 pivot component of described first subspace.
Generate unit, generate the second subspace that Q dimension is E1+E2.
Alternatively, described second extraction unit 308 specifically for:
Utilize J=m+Vy+Dz that tested speech converts to the JFA super vector of described tested speech, wherein, described J represents described JFA super vector, described m represents universal background model UBM average super vector, described V and described D represents speaker space loading matrix and residual error space loading matrix, described y and described z respectively speaker's factor and the residual error factor respectively.
Alternatively, described integrated unit 311 specifically for:
The output of Q described grader is carried out linear fusion.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all any amendment, equivalent replacement and improvement etc. made within the spirit and principles in the present invention, should be included within protection scope of the present invention.

Claims (10)

1. the indentity identifying method of a speaker, it is characterised in that including:
Training voice is extracted simultaneous factor analysis JFA super vector Mih=[mih1,mih2,...,mihN], wherein, described MihRepresent the JFA super vector of the h article training voice of i-th speaker in training set;
From the JFA super vector M of described training voiceih=[mih1,mih2,...,mihN] in extract k mean vector, generate the first subvector Sih=[m'ih1,m'ih2,...,m'ihk];
Utilize pivot analysis PCA algorithm by described first subvector Sih=[m'ih1,m'ih2,...,m'ihk] project in the first subspace that dimension is J;
Described first subspace is carried out stochastical sampling, obtains Q the second subspace;
Respectively the vector projecting to Q described second subspace is carried out covariance normalization WCCN process in class, train projection matrix W2, then the vector projected in Q described second subspace is respectively mapped in Q the 3rd subspace by described projection matrix W2;
Utilize nonparametric linearly to distinguish analysis and Q described 3rd subspace is analyzed modeling, obtain projection matrix W3;
Utilize projection matrix W2*W3, the JFA super vector that every is trained voice is projected to respectively Q described 3rd subspace, obtains Q target speaker's reference vector;
Extract the JFA super vector of tested speech;
Utilize described projection matrix W2*W3, the JFA super vector of described tested speech is projected to respectively Q described 3rd subspace, obtain Q test reference vector;
Calculate the COS distance between described test reference vector and Q described target speaker's reference vector respectively, obtain the output of Q grader;
By preset algorithm, the output of Q described grader is merged;
The speaker that speaker verification is described tested speech by training voice corresponding for the fusion results of highest scoring.
2. the method for claim 1, it is characterised in that the described JFA super vector M from described training voiceih=[mih1,mih2,...,mihN] in extract k mean vector, generate the first subvector Sih=[m'ih1,m'ih2,...,m'ihk] including:
From the JFA super vector M of described training voiceih=[mih1,mih2,...,mihN] in extract the mean vector being arranged in front k, generate the first subvector Sih=[m'ih1,m'ih2,...,m'ihk]。
3. the method for claim 1, it is characterised in that described described first subspace is carried out stochastical sampling, obtains Q the second subspace and includes:
Described first subspace is chosen front E1 the pivot component containing maximum quantity of information;
From residue J-E1 pivot component of described first subspace, E2 pivot component is randomly selected by random sampling algorithms;
Generate the second subspace that Q dimension is E1+E2.
4. the method for claim 1, it is characterised in that the JFA super vector of described extraction tested speech includes:
Utilize J=m+Vy+Dz that tested speech converts to the JFA super vector of described tested speech, wherein, described J represents described JFA super vector, described m represents universal background model UBM average super vector, described V and described D represents speaker space loading matrix and residual error space loading matrix, described y and described z respectively speaker's factor and the residual error factor respectively.
5. the method for claim 1, it is characterised in that described by preset algorithm, the output of Q described grader carried out fusion and include:
The output of Q described grader is carried out linear fusion.
6. the identity confirmation device of a speaker, it is characterised in that including:
First extraction unit, for extracting simultaneous factor analysis JFA super vector M to training voiceih=[mih1,mih2,...,mihN], wherein, described MihRepresent the JFA super vector of the h article training voice of i-th speaker in training set;
First dimensionality reduction unit, is used for the JFA super vector M from described training voiceih=[mih1,mih2,...,mihN] in extract k mean vector, generate the first subvector Sih=[m'ih1,m'ih2,...,m'ihk];
Second dimensionality reduction unit, is used for utilizing pivot analysis PCA algorithm by described first subvector Sih=[m'ih1,m'ih2,...,m'ihk] project in the first subspace that dimension is J;
Stochastical sampling unit, for described first subspace is carried out stochastical sampling, obtains Q the second subspace;
WCCN processing unit, for respectively the vector projecting to Q described second subspace being carried out covariance normalization WCCN process in class, train projection matrix W2, then the vector projected in Q described second subspace is respectively mapped in Q the 3rd subspace by described projection matrix W2;
Nonparametric linearly distinguishes analytic unit, is used for utilizing nonparametric linearly to distinguish analysis and Q described 3rd subspace is analyzed modeling, obtain projection matrix W3;
First reference vector generates unit, is used for utilizing projection matrix W2*W3, the JFA super vector that every is trained voice projects to Q described 3rd subspace respectively, obtains Q target speaker's reference vector;
Second extraction unit, for extracting the JFA super vector of tested speech;
Second reference vector generates unit, is used for utilizing described projection matrix W2*W3, and the JFA super vector of described tested speech projects to Q described 3rd subspace respectively, obtains Q test reference vector;
Output unit, for calculating the COS distance between described test reference vector and Q described target speaker's reference vector respectively, obtains the output of Q grader;
Integrated unit, for merging the output of Q described grader by preset algorithm;
Confirmation unit, for the speaker that speaker verification is described tested speech by training voice corresponding for the fusion results of highest scoring.
7. device as claimed in claim 6, it is characterised in that described first dimensionality reduction unit specifically for:
From the JFA super vector M of described training voiceih=[mih1,mih2,...,mihN] in extract the mean vector being arranged in front k, generate the first subvector Sih=[m'ih1,m'ih2,...,m'ihk]。
8. device as claimed in claim 6, it is characterised in that described stochastical sampling unit includes:
First chooses subelement, for choosing front E1 the pivot component containing maximum quantity of information in described first subspace;
Second chooses subelement, for randomly selecting E2 pivot component from residue J-E1 pivot component of described first subspace by random sampling algorithms;
Generate unit, for generating the second subspace that Q dimension is E1+E2.
9. device as claimed in claim 6, it is characterised in that described second extraction unit specifically for:
Utilize J=m+Vy+Dz that tested speech converts to the JFA super vector of described tested speech, wherein, described J represents described JFA super vector, described m represents universal background model UBM average super vector, described V and described D represents speaker space loading matrix and residual error space loading matrix, described y and described z respectively speaker's factor and the residual error factor respectively.
10. device as claimed in claim 6, it is characterised in that described integrated unit specifically for:
The output of Q described grader is carried out linear fusion.
CN201410844272.XA 2014-12-30 2014-12-30 Identity verification method and device for speakers Pending CN105810199A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410844272.XA CN105810199A (en) 2014-12-30 2014-12-30 Identity verification method and device for speakers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410844272.XA CN105810199A (en) 2014-12-30 2014-12-30 Identity verification method and device for speakers

Publications (1)

Publication Number Publication Date
CN105810199A true CN105810199A (en) 2016-07-27

Family

ID=56420092

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410844272.XA Pending CN105810199A (en) 2014-12-30 2014-12-30 Identity verification method and device for speakers

Country Status (1)

Country Link
CN (1) CN105810199A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106531170A (en) * 2016-12-12 2017-03-22 姜卫武 Spoken language assessment identity authentication method based on speaker recognition technology
CN109065022A (en) * 2018-06-06 2018-12-21 平安科技(深圳)有限公司 I-vector vector extracting method, method for distinguishing speek person, device, equipment and medium
CN109145148A (en) * 2017-06-28 2019-01-04 百度在线网络技术(北京)有限公司 Information processing method and device
CN109165726A (en) * 2018-08-17 2019-01-08 联智科技(天津)有限责任公司 A kind of neural network embedded system for without speaker verification's text
CN110010137A (en) * 2019-04-04 2019-07-12 杭州电子科技大学 A kind of method for identifying speaker and system based on tensor structure and rarefaction representation
WO2019237519A1 (en) * 2018-06-11 2019-12-19 平安科技(深圳)有限公司 General vector training method, voice clustering method, apparatus, device and medium
US10909991B2 (en) 2018-04-24 2021-02-02 ID R&D, Inc. System for text-dependent speaker recognition and method thereof
US10970573B2 (en) 2018-04-27 2021-04-06 ID R&D, Inc. Method and system for free text keystroke biometric authentication

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1758263A (en) * 2005-10-31 2006-04-12 浙江大学 Multi-model ID recognition method based on scoring difference weight compromised
CN101847208A (en) * 2010-06-11 2010-09-29 哈尔滨工程大学 Secondary classification fusion identification method for fingerprint and finger vein bimodal identification
CN101894550A (en) * 2010-07-19 2010-11-24 东南大学 Speech emotion classifying method for emotion-based characteristic optimization
CN102045162A (en) * 2009-10-16 2011-05-04 电子科技大学 Personal identification system of permittee with tri-modal biometric characteristic and control method thereof
CN102664011A (en) * 2012-05-17 2012-09-12 吉林大学 Method for quickly recognizing speaker
CN103077720A (en) * 2012-12-19 2013-05-01 中国科学院声学研究所 Speaker identification method and system
CN103514170A (en) * 2012-06-20 2014-01-15 中国移动通信集团安徽有限公司 Speech-recognition text classification method and device
CN104167208A (en) * 2014-08-08 2014-11-26 中国科学院深圳先进技术研究院 Speaker recognition method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1758263A (en) * 2005-10-31 2006-04-12 浙江大学 Multi-model ID recognition method based on scoring difference weight compromised
CN102045162A (en) * 2009-10-16 2011-05-04 电子科技大学 Personal identification system of permittee with tri-modal biometric characteristic and control method thereof
CN101847208A (en) * 2010-06-11 2010-09-29 哈尔滨工程大学 Secondary classification fusion identification method for fingerprint and finger vein bimodal identification
CN101894550A (en) * 2010-07-19 2010-11-24 东南大学 Speech emotion classifying method for emotion-based characteristic optimization
CN102664011A (en) * 2012-05-17 2012-09-12 吉林大学 Method for quickly recognizing speaker
CN103514170A (en) * 2012-06-20 2014-01-15 中国移动通信集团安徽有限公司 Speech-recognition text classification method and device
CN103077720A (en) * 2012-12-19 2013-05-01 中国科学院声学研究所 Speaker identification method and system
CN104167208A (en) * 2014-08-08 2014-11-26 中国科学院深圳先进技术研究院 Speaker recognition method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
NA LI ET AL: "An analysis framework of two-level sampling subspace for speaker verification", 《2013 IEEE INTERNATIONAL CONFERENCE OF IEEE REGION 10 (TENCON 2013)》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106531170A (en) * 2016-12-12 2017-03-22 姜卫武 Spoken language assessment identity authentication method based on speaker recognition technology
CN109145148A (en) * 2017-06-28 2019-01-04 百度在线网络技术(北京)有限公司 Information processing method and device
US10909991B2 (en) 2018-04-24 2021-02-02 ID R&D, Inc. System for text-dependent speaker recognition and method thereof
US10970573B2 (en) 2018-04-27 2021-04-06 ID R&D, Inc. Method and system for free text keystroke biometric authentication
CN109065022A (en) * 2018-06-06 2018-12-21 平安科技(深圳)有限公司 I-vector vector extracting method, method for distinguishing speek person, device, equipment and medium
CN109065022B (en) * 2018-06-06 2022-08-09 平安科技(深圳)有限公司 Method for extracting i-vector, method, device, equipment and medium for speaker recognition
WO2019237519A1 (en) * 2018-06-11 2019-12-19 平安科技(深圳)有限公司 General vector training method, voice clustering method, apparatus, device and medium
CN109165726A (en) * 2018-08-17 2019-01-08 联智科技(天津)有限责任公司 A kind of neural network embedded system for without speaker verification's text
CN110010137A (en) * 2019-04-04 2019-07-12 杭州电子科技大学 A kind of method for identifying speaker and system based on tensor structure and rarefaction representation
CN110010137B (en) * 2019-04-04 2021-09-28 杭州电子科技大学 Speaker confirmation method and system based on tensor structure and sparse representation

Similar Documents

Publication Publication Date Title
CN105810199A (en) Identity verification method and device for speakers
CN110147726B (en) Service quality inspection method and device, storage medium and electronic device
CN107194341B (en) Face recognition method and system based on fusion of Maxout multi-convolution neural network
CN104167208B (en) A kind of method for distinguishing speek person and device
He et al. Performance evaluation of score level fusion in multimodal biometric systems
CN100356388C (en) Biocharacteristics fusioned identity distinguishing and identification method
CN101226590B (en) Method for recognizing human face
CN105261367B (en) A kind of method for distinguishing speek person
Sarhan et al. Multimodal biometric systems: a comparative study
CN101976360B (en) Sparse characteristic face recognition method based on multilevel classification
CN104538035B (en) A kind of method for distinguishing speek person and system based on Fisher super vectors
CN105160299A (en) Human face emotion identifying method based on Bayes fusion sparse representation classifier
Song et al. Speech emotion recognition using transfer non-negative matrix factorization
Mane et al. Review of multimodal biometrics: applications, challenges and research areas
CN103246880B (en) Based on the face identification method of the remarkable pattern feature statistics in multistage local
KR101016758B1 (en) Method for identifying image face and system thereof
CN113158777A (en) Quality scoring method, quality scoring model training method and related device
Mondal et al. Secure and hassle-free EVM through deep learning based face recognition
Ou et al. LinCos-softmax: Learning angle-discriminative face representations with linearity-enhanced cosine logits
CN102237089A (en) Method for reducing error identification rate of text irrelevant speaker identification system
Diez et al. New insight into the use of phone log-likelihood ratios as features for language recognition
CN116152870A (en) Face recognition method, device, electronic equipment and computer readable storage medium
CN103116758A (en) Color face identification method based on RGB (red, green and blue) color feature double identification analysis
Baaqeel et al. A self-adapting face authentication system with deep learning
Hu et al. On-line signature verification based on fusion of global and local information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160727

RJ01 Rejection of invention patent application after publication