CN102024455B - Speaker recognition system and method - Google Patents

Speaker recognition system and method Download PDF

Info

Publication number
CN102024455B
CN102024455B CN200910170552.6A CN200910170552A CN102024455B CN 102024455 B CN102024455 B CN 102024455B CN 200910170552 A CN200910170552 A CN 200910170552A CN 102024455 B CN102024455 B CN 102024455B
Authority
CN
China
Prior art keywords
speaker
eigenvector
feature
sigma
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN200910170552.6A
Other languages
Chinese (zh)
Other versions
CN102024455A (en
Inventor
刘昆
吴伟国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Corp
Original Assignee
Sony Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Corp filed Critical Sony Corp
Priority to CN200910170552.6A priority Critical patent/CN102024455B/en
Publication of CN102024455A publication Critical patent/CN102024455A/en
Application granted granted Critical
Publication of CN102024455B publication Critical patent/CN102024455B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a speaker recognition system and a speaker recognition method. The speaker recognition system comprises a characteristic extraction unit, a background model generation unit, a registered speaker model generation unit, a metric value calculation unit and a recognition unit, wherein the characteristic extraction unit is configured to extract a characteristic vector of speech data of a speaker; the background model generation unit is configured to perform internal clustering on the characteristic vector of the speech data of a background speaker and generate a universal background model aiming at a normal speaker according to the result of the internal clustering; the registered speaker model generation unit is configured to adapt to the universal background model by using the characteristic vector of the speech data of each registered speaker so as to generate a registered speaker model of each registered speaker; the metric value calculation unit is configured to calculate metric values of the characteristic vector of a tested speaker on the universal background model generated by the background model generation unit and on the registered speaker model of each registered speaker, which is generated by the registered speaker model generation model; and the recognition unit is configured to recognize the tested speaker according to the metric values calculated by the metric value calculation unit.

Description

Speaker Recognition System and method thereof
Technical field
Relate generally to Speaker Recognition System of the present invention and method thereof.More particularly, the present invention relates to a kind of speaker dependent's recognition system and method based on universal background model (universal background model, UBM) and registration speaker model.
Background technology
The biometrics identification technology that various countries are mainly studied at present comprises the identification of hand shape, fingerprint recognition, face recognition, Application on Voiceprint Recognition, iris recognition, signature identification etc.In these biological characteristics, fingerprint, iris, image surface etc. all belong to exposed conveying appliance physical trait, easily in the situation that suffering violence infringement, are pretended to be by force by offender by litigant's physical trait.People's sound characteristic belongs to built-in physical trait, as long as litigant does not lift up one's voice, without any stolen possibility, has therefore obtained deep research and development in biometrics identification technology field.
Application on Voiceprint Recognition (Speaker Recognition) is exactly the technology of utilizing the intrinsic physiological characteristic of human body or behavioural characteristic to carry out personal identification qualification, belongs to the one of biometrics identification technology.Application on Voiceprint Recognition, also referred to as Speaker Identification, is by the speaker's who receives voice signal is analyzed and extracted, and automatically determines that whether speaker gathers the inside set up speaker, and whose process definite speaker is.Speaker's the predetermined Application on Voiceprint Recognition of the content of speaking is called the Application on Voiceprint Recognition of text-dependent (text-dependent), speaker's the content of speaking is uncertain in advance, say what content can Application on Voiceprint Recognition be called and the Application on Voiceprint Recognition of text-independent (text-independent).
The method for distinguishing speek person of current main-stream is the method for distinguishing speek person based on GMM (Gaussian mixture model, mixed Gauss model)-UBM (universal background model, universal background model).Speaker Recognition System based on GMM-UBM is mainly divided into three parts, UBM training, speaker dependent's model adaptation and Speaker Identification test.Specifically, train out a universal background model from individual even thousands of speakers' the data of hundreds of in advance, then go out the mixed Gauss model relevant to speaker dependent by speaker dependent's data from general background model self-adaptation, and use the model that this self-adaptation goes out to carry out Speaker Identification.
Its advantage is that speaker dependent's model is that the training utterance self-adaptation according to speaker obtains on UBM.Like this, the pronunciation character covering for speaker's training utterance can be with this speaker's self pronunciation modeling, the pronunciation character not covering for speaker's training utterance is similar to UBM, reduce thus tested speech and training utterance on acoustic space due to different the brought impacts that distribute.In addition, in carrying out identity validation, can with the score of tested speech on UBM as one with reference to threshold values.
A good UBM background model is by a large amount of background speakers' language voice training out.For simple recognition system, use the background model training, just can reach satisfied recognition effect.And for a specific application, before being put to practicality, should gather the pronunciation sample of some actual channel, utilize adaptive algorithm to train and upgrade background model, to reach best recognition performance.
But, in the Speaker Recognition System based on GMM-UBM, only represent speaker's statistical average pronunciation character with a UBM, and UBM model training needs a large amount of speakers' language voice, also to consider M-F, ratio of age of speaker etc., to adopt GMM to carry out modeling simultaneously.
An important deficiency based on GMM modeling is exactly to train the GMM physical significance obtaining very indefinite, and unclear each gaussian component finally by which feature contribution is obtained.In addition, due to a large amount of speakers' of needs language voice, therefore the GMM training time is very long.
Summary of the invention
In view of the foregoing, the present invention proposes a kind of new Speaker Recognition System and method for distinguishing speek person, also can reduce the training time of GMM with the physical significance of clear and definite universal background model.
Specifically, according to an aspect of the present invention, provide a kind of Speaker Recognition System, comprising: feature extraction unit, is configured to the eigenvector of the speech data that extracts speaker; Background model generation unit, the eigenvector that is configured to the speech data to background speaker carries out inner cluster and generates the universal background model for general speaker according to the result of inner cluster; Registration speaker model generation unit, is configured to the eigenvector of the speech data that utilizes each registration speaker to universal background model self-adaptation, generates each registration speaker's registration speaker model; Metric computing unit, is configured to calculate the metric on each registration speaker's that universal background model that test speaker's eigenvector generates at background model generation unit and registration speaker model generation unit generate registration speaker model; And recognition unit, be configured to the metric identification test speaker who calculates according to metric computing unit.
According to another aspect of the present invention, provide a kind of method for distinguishing speek person, comprising: the eigenvector that extracts speaker's speech data; The eigenvector of the speech data to background speaker carries out inner cluster and generates the universal background model for general speaker according to the result of inner cluster; Utilize the eigenvector of each speech data of registering speaker to universal background model self-adaptation, generate each registration speaker's registration speaker model; The metric of calculating test speaker's eigenvector on universal background model and each registration speaker's registration speaker model; And according to calculated metric identification test speaker.
According to one embodiment of present invention, generating universal background model comprises: the eigenvector of the speech data to background speaker carries out inner cluster, to generate series of features subclass; From the speaker's that has powerful connections the feature subclass generating, select cluster centre, be divided into feature space so that all feature subclasses are carried out to space; And all feature subclasses that comprise in each feature space are characterized, to generate the universal background model for general speaker.
Preferably, in inner cluster, an eigenvector KDTree of structure of the speech data to each background speaker also carries out inner cluster according to nearest neighbouring rule.
According to a specific embodiment of the present invention, inner cluster comprises: the eigenvector that extracts background speaker's the speech data that has voice segments; The eigenvector of extraction is configured to KDTree, make the value of the eigenvector of the dimension corresponding with this layer of all nodes on the left subtree of each root node on every one deck all be less than the value of the eigenvector of this dimension of this root node, on every one deck, on the right subtree of each root node, the value of the eigenvector of the dimension corresponding with this layer of all nodes is all greater than the value of the eigenvector of this dimension of this root node; And each root node and subtree thereof on arbitrary layer of constructed KDTree are clustered into the feature subclass with common feature.
Preferably, in inner cluster, described each root node is screened, retain the root node that corresponding child node number is many.
According to one embodiment of present invention, adopt ultimate range example-based approach, K-Mean method, minimum distance method, group average distance method or gravity model appoach to select cluster centre from generated all registration speakers' feature subclass.
According to a preferred embodiment of the present invention, characterize with Gaussian function all feature subclasses that comprise in each feature space.Wherein, calculate average and the variance of the eigenvector that all feature subclasses in each feature space comprise, to obtain the normal distyribution function of each feature space.
In addition, according to one embodiment of present invention, generate registration speaker model and comprise: the eigenvector F that obtains registration speaker's speech data; For each eigenvector F, calculate its posterior probability p to each feature space k k,
p k = 1 ( 2 π ) d / 2 | Σ k | 1 / 2 exp { - 1 2 ( x - μ k ) T Σ - 1 ( x - μ k ) } , k = 1,2 , . . . , N ,
Wherein, μ kthe average of the eigenvector comprising for all feature subclasses in each feature space, ∑ kthe variance of the eigenvector comprising for all feature subclasses in each feature space, N is the quantity that feature space is divided, d is intrinsic dimensionality; Calculate and upgrade the factor α = 1 γ + p k , γ is empirical value; Average to each feature space is upgraded: μ ' kk(1-α)+α * F; And universal background model is carried out to self-adaptation by the average of feature space after upgrading, to generate this registration speaker's registration speaker model.
In addition, according to a preferred embodiment of the present invention, in metric calculates, obtain the eigenvector of test speaker's speech data, and all eigenvectors of the test speaker's that calculating is obtained respectively speech data are to universal background model M bwith registration speaker model M rposterior probability P band P r,
P B = 1 m Σ i = 1 m p B i
P R = 1 m Σ i = 1 m p R i
p B i = Σ k = 1 N log ( w k p k B )
= N * log w k - Σ k = 1 N log ( ( 2 π ) d / 2 | Σ k B | 1 / 2 ) + Σ k = 1 N ( - 1 2 ( x - μ k B ) T Σ k - 1 ( x - μ k B ) )
p R i = Σ k = 1 N log ( w k p k R )
= N * log w k - Σ k = 1 N log ( ( 2 π ) d / 2 | Σ k R | 1 / 2 ) + Σ k = 1 N ( - 1 2 ( x - μ k R ) T Σ k - 1 ( x - μ k R ) )
Wherein, m is the quantity of the eigenvector of obtained test speaker's speech data, p k band p k rbe respectively eigenvector at universal background model M bwith registration speaker model M rk feature space on posterior probability; w kfor the weight of each feature space, w k = 1 N ; And in identification, the marking P of the speech data that calculates this test speaker to each registration speaker model r-P b, obtain maximal value P maxand according to the threshold value identification test speaker who sets.
Can find out, the Speaker Recognition System and the method for distinguishing speek person that propose according to the present invention, in the time generating universal background model, first by background speaker everyone KDTree of latent structure and carry out the inner cluster of speaker according to nearest neighbouring rule, then the cluster between the mankind of speaking in had powerful connections speaker, and Renewal model parameter is to obtain background model.Clearly, the physical significance of this universal background model is obvious.And the computation complexity of KDTree is less than GMM, therefore the training time shortens.
In addition, the present invention is also provided for realizing the computer program of above-mentioned method for distinguishing speek person.
In addition, the present invention also provides at least computer program of computer-readable medium form, records the computer program code for realizing above-mentioned method for distinguishing speek person on it.
Brief description of the drawings
Below with reference to the accompanying drawings illustrate embodiments of the invention, can understand more easily above and other objects, features and advantages of the present invention.In the accompanying drawings, identical or corresponding technical characterictic or parts will adopt identical or corresponding Reference numeral to represent.In accompanying drawing:
Fig. 1 illustrates the block diagram of Speaker Recognition System according to an embodiment of the invention;
Fig. 2 illustrates the block diagram of background model generation unit according to an embodiment of the invention;
Fig. 3 illustrates the block diagram of inner cluster cell according to an embodiment of the invention;
Fig. 4 illustrates the KDTree of the constructed speaker characteristic vector of a concrete example according to the present invention;
Fig. 5 illustrates the process flow diagram of the processing procedure of method for distinguishing speek person according to an embodiment of the invention;
Fig. 6 illustrates according to the process flow diagram of the universal background model training process of a concrete example of the present invention; And
Fig. 7 illustrates for implementing the structure calcspar according to the messaging device of method for distinguishing speek person of the present invention.
Embodiment
Embodiments of the invention are described with reference to the accompanying drawings.It should be noted that for purposes of clarity, in accompanying drawing and explanation, omitted expression and the description of unrelated to the invention, parts known to persons of ordinary skill in the art and processing.
First with reference to accompanying drawing, particularly Fig. 1 to Fig. 4, describes according to the general work process of the Speaker Recognition System of the embodiment of the present invention.As shown in Figure 1, comprise according to the Speaker Recognition System of the embodiment of the present invention: feature extraction unit 101, is configured to the eigenvector of the speech data that extracts speaker; Background model generation unit 103, the eigenvector that is configured to the speech data to background speaker carries out inner cluster and generates the universal background model for general speaker according to the result of inner cluster; Registration speaker model generation unit 105, is configured to the eigenvector of the speech data that utilizes each registration speaker to universal background model self-adaptation, generates each registration speaker's registration speaker model; Metric computing unit 107, is configured to calculate the metric on test speaker's the universal background model that generates at background model generation unit 103 of eigenvector and each registration speaker's that registration speaker model generation unit 105 generates registration speaker model; And recognition unit 109, be configured to the metric identification test speaker who calculates according to metric computing unit 107.
Be described in detail below in conjunction with 2 to 4 pairs of each included modules of Speaker Recognition System according to the present invention of accompanying drawing.
According to the Speaker Recognition System of this embodiment of the invention, first by feature extraction unit 101, the eigenvector of the speech data to speaker extracts.Here, for different situations, feature extraction unit 101 extract different speakers speech data eigenvector and send to different subsequent treatment unit.For example, generating when universal background model, feature extraction unit 101 extract a large amount of speakers speech data eigenvector and send to background model generation unit 103.For registration speaker model, the eigenvector of each registration speaker's that extraction will be registered respectively speech data also sends to registration speaker model generation unit 105.In the time of identification, the eigenvector of the test speaker's that extraction will be identified speech data also sends to metric computing unit 107.
The eigenvector of 103 speakers' that have powerful connections that feature extraction unit 101 is provided of background model generation unit speech data carries out inner cluster and generates the universal background model for general speaker according to the result of inner cluster, and Fig. 2 illustrates the block diagram of background model generation unit 103 according to an embodiment of the invention.
As shown in Figure 2, comprise according to the background model generation unit 103 of this embodiment: inner cluster cell 201, the eigenvector that is configured to the speech data to background speaker carries out inner cluster, to generate series of features subclass; Feature subclass spatial division unit 203, is configured to select cluster centre the speaker's that has powerful connections who generates from inner cluster cell 201 feature subclass, is divided into feature space so that all feature subclasses are carried out to space; And feature space characterization unit 205, all feature subclasses that are configured to comprising in each feature space characterize, to generate the universal background model for general speaker.
According to example of the present invention, the eigenvector of everyone speech data in background speaker is constructed a KDTree by inner cluster cell 201, and carry out the inner cluster of speaker according to nearest neighbouring rule, thereby obtain series of features subclass.Fig. 3 shows according to the block diagram of the inside cluster cell 201 of this embodiment.
As shown in Figure 3, comprise according to the inside cluster cell 201 of this embodiment: voice segments extraction unit 301, is configured to the eigenvector of the speech data that has voice segments that extracts background speaker; KDTree construction unit 303, the eigenvector that is configured to voice segments extraction unit 301 to extract is configured to KDTree, make the value of the eigenvector of the dimension corresponding with this layer of all nodes on the left subtree of each root node on every one deck all be less than the value of the eigenvector of this dimension of this root node, on every one deck, on the right subtree of each root node, the value of the eigenvector of the dimension corresponding with this layer of all nodes is all greater than the value of the eigenvector of this dimension of this root node; And feature subclass generation unit 305, be configured to each root node and subtree thereof on arbitrary layer of KDTree constructed KDTree construction unit 303 to be clustered into the feature subclass with common feature.
Here, according to example of the present invention, the feature that voice segments extraction unit 301 extracts comprises 18 dimension MFCC (Mel Frequency Cepstral Coefficients, Mel frequency cepstral coefficient) feature, 18 dimension difference MFCC features and 9 dimension prosodic features.Certainly, these features are merely given as examples, and can and require according to different concrete conditions choose some feature wherein or select other can characterize the eigenvector of speaker's characteristic voice in the time of specific implementation.
After receiving the above-mentioned eigenvector that voice segments extraction unit 301 extracts, KDTree construction unit 303 sorts these eigenvectors, builds the KDTree for this speaker.
KDTree (k ties up search tree) is a kind ofly promoted and the version (k is the dimension in space) of the tree for multidimensional retrieval that comes by binary search tree.What KDTree was different from binary search tree is, and its each node represents a point of k dimension space, and every one deck is all made branch decision-making according to the resolving device of this layer (discriminator) to corresponding object.
In KDTree, top mode is divided by a dimension, and second layer node is divided according to another dimension, and between the each dimension of remainder, constantly divides by that analogy, until counting while being less than given maximum number of points in a node finishes to divide.
Specifically, in the time building a speaker's the KDTree of eigenvector of speech data, first select root node, the value of the first dimension of more all eigenvectors, the line ordering of going forward side by side, selecting the eigenvector at sequence intermediate value place after sequence is root node.Then, from first eigenvector, searching position insert this eigenvector in this KDTree successively.An eigenvector is divided into the left subtree of this KDTree or the rule of right subtree is as follows: taking i layer as example, if its left subtree non-NULL, on its left subtree, the i dimension value of all nodes is all less than the i dimension value of its root node; If its right subtree non-NULL, on its right subtree, the i dimension value of all nodes is all greater than the i dimension value of its root node; And its left and right subtree is also respectively KDTree.
According to mentioned above principle, be F={f1 such as establishing a certain eigenvector, f2, f3, ..., fn}, wherein fi is i dimensional feature, so in ground floor (root node), the relatively size of the first dimensional feature of f1 and root node, if f1 is less than the first dimensional feature of root node, this eigenvector is divided in the left subtree of root node, then enter the second layer.In the second layer, if left subtree is not empty, compare the size of the second dimensional feature of the second dimensional feature f2 and left subtree root node feature, judgement belongs to left subtree or right subtree and divides, then enter the 3rd layer and subsequent each layer, until this subtree is while being empty, till this eigenvector is inserted to this position.
According to such criteria for classifying, the eigenvector of each speaker's speech data can be divided in each feature dimensions, and be built this speaker's KDTree.For instance, suppose that eigenvector list is for [(2,3,9), (5,4,2), (9,6,4), (4,7,0), (8,1,8), (7,2,3)].First select now root node (5,4,2) or (7,2,3), because 5 and 7 are intermediate values in the first dimensional feature, can select arbitrarily one as root node, select here (7,2,3).Then, from first eigenvector (2,3,9) start, by its first dimensional feature 2 and root node (7,2,3) the first dimensional feature 7 relatively, owing to being less than 7, thereby belongs to left subtree, again because left subtree is empty, so eigenvector (2,3,9) is inserted in to root node (7,2,3) on the root node of left subtree.
For second eigenvector (5,4,2), by the first dimensional feature 7 of the first dimensional feature 5 and root node (7,2,3) relatively, owing to being less than 7, thereby belong to left subtree.Because left subtree is not empty, follow the second dimension of the eigenvector (2,3,9) on the second peacekeeping left subtree root node that compares second eigenvector (5,4,2) again.Here due to 4 the second dimensions 3 that are greater than (2,3,9), thereby belong to right subtree, again because right subtree be sky, so eigenvector (5,4,2) is inserted on the root node of right subtree of root node (2,3,9).The like, obtain final KDTree as shown in Figure 4.
After KDTree construction unit 303 has built this speaker's KDTree, feature subclass generation unit 305 is divided into each speaker's eigenvector the subclass with certain common feature, and then can accelerate the realization of next step subclass spatial division.
Specifically, suppose all root node i (i <=2 at N layer place in this speaker's KDTree n) be respectively root node separately, the child node below it and corresponding root node form respectively a little KDTree, and think that all eigenvectors of this little KDTree have common feature, be polymerized to a class, calculated the number of average, variance and the eigenvector (leaf node) of every class.Thus, finally can obtain i feature subclass for this speaker.
Preferably, in all root node i (i <=2N) at feature subclass generation unit 305 N layer place from this speaker's KDTree, screen, only retain the more front several root nodes of its corresponding child node number, by these root nodes and below the child node of each layer be polymerized to a class and calculate the number of average, variance and the eigenvector (leaf node) of every class.
Turn back to now Fig. 2, after inner cluster cell 201 completes the feature subclass generation for each speaker, feature subclass spatial division unit 203 carries out spatial division to had powerful connections speaker's feature subclass, therefrom select cluster centre, thereby all feature subclasses are divided into feature space, cluster between the speaker's that had powerful connections to realize class.
According to example of the present invention, feature subclass spatial division unit 203 adopts ultimate range example-based approach to carry out cluster to background speaker all categories, may do cluster centre by sample a long way off to ensure to exhaust.
Specifically, suppose and have M feature subclass sample, Zs={Z 1, Z 2..., Z m.First appoint get a sample for example Z1 as first Lei center, Z1=Z1, then from set find Zs Z1 apart from maximum sample as Z2.Then remaining sample Zi in Zs is calculated respectively the distance of Z1 and Z2, making wherein less that is D zi.
Having calculated in Zs all remaining sample Zi after the distance of Z1 and Z2, calculate if value be greater than a certain calculated value or given threshold value, get this Zi Wei Xinlei center.Here desirable doubly (0.5≤α < 1) of α that is more than or equal to Z1 and Z2 spacing of calculated value.
Then, repeat processing above, until again can not find qualified Xin Lei center.Finally, residue sample is assigned to from the class under its that nearest center.
Certainly, the spatial division of feature subclass is not limited only to ultimate range example-based approach described above, but can select diverse ways to select cluster centre from had powerful connections speaker's feature subclass according to different situations, such as K-Mean method, minimum distance method (single linkage method), group average distance method (average linkage method), gravity model appoach (centroid hierarchicalmethod) etc.
In feature subclass spatial division unit 203, had powerful connections speaker's feature subclass is carried out after space is divided into feature space, feature space characterization unit 205 use Gaussian functions characterize all feature subclasses that comprise in each feature space, thereby generate the universal background model for general speaker.Specifically, feature space characterization unit 205 can calculate average and the variance of the eigenvector that all feature subclasses in each feature space comprise, to obtain the normal distyribution function of each feature space, generate thus the universal background model for general speaker.
Next turn back to Fig. 1, will describe the principle of work of registration speaker model generation unit 105, metric computing unit 107 and recognition unit 109.
After the speech data that is utilized had powerful connections speaker by background model generation unit 103 generates universal background model, the registration speaker that need to register for each, registration speaker model generation unit 105 utilizes the eigenvector of each speech data of registering speaker to universal background model self-adaptation, generates each registration speaker's registration speaker model.
Specifically, according to a specific embodiment of the present invention, first registration speaker model generation unit 105 obtains the eigenvector F of registration speaker's speech data from feature extraction unit 101, then according to formula below for each eigenvector F, calculate its posterior probability p to each feature space k k,
p k = 1 ( 2 &pi; ) d / 2 | &Sigma; k | 1 / 2 exp { - 1 2 ( x - &mu; k ) T &Sigma; - 1 ( x - &mu; k ) } , k = 1,2 , . . . , N ,
Wherein, μ kthe average of the eigenvector comprising for all feature subclasses in each feature space, ∑ kthe variance of the eigenvector comprising for all feature subclasses in each feature space, N is the quantity that feature space is divided, for example, can get 512,1024 etc., d representation feature dimension.
Then, registration speaker model generation unit 105 is according to formula &alpha; = 1 &gamma; + p k Calculate and upgrade factor-alpha, and according to formula μ ' kkthe average μ of (1-α)+α * F to each feature space kupgrade, wherein γ is empirical value, for example, can be made as 10,16 etc.Finally, the average μ of the feature space after the 105 use renewals of registration speaker model generation unit kuniversal background model is carried out to self-adaptation, generate thus the registration speaker model for this registration speaker.
In identifying, the metric on each registration speaker's that the universal background model that the eigenvector that first metric computing unit 107 calculates test speaker generates at background model generation unit 103 and registration speaker model generation unit 105 generate registration speaker model.Then the metric identification test speaker that, recognition unit 109 calculates according to metric computing unit 107.
Specifically, according to a specific embodiment of the present invention, first metric computing unit 107 obtains test speaker's the eigenvector of speech data from feature extraction unit 101, calculate respectively all eigenvectors of speech data of obtained test speaker to universal background model M according to formula below bwith registration speaker model M rposterior probability P band P r,
P B = 1 m &Sigma; i = 1 m p B i
P R = 1 m &Sigma; i = 1 m p R i
p B i = &Sigma; k = 1 N log ( w k p k B )
= N * log w k - &Sigma; k = 1 N log ( ( 2 &pi; ) d / 2 | &Sigma; k B | 1 / 2 ) + &Sigma; k = 1 N ( - 1 2 ( x - &mu; k B ) T &Sigma; k - 1 ( x - &mu; k B ) )
p R i = &Sigma; k = 1 N log ( w k p k R )
= N * log w k - &Sigma; k = 1 N log ( ( 2 &pi; ) d / 2 | &Sigma; k R | 1 / 2 ) + &Sigma; k = 1 N ( - 1 2 ( x - &mu; k R ) T &Sigma; k - 1 ( x - &mu; k R ) )
Wherein, m is the quantity of the eigenvector of obtained test speaker's speech data, p k band p k rbe respectively eigenvector at universal background model M bwith registration speaker model M rk feature space on posterior probability; w kfor the weight of each feature space, w k = 1 N .
Afterwards, the posterior probability P that recognition unit 109 availability magnitude calculation unit 107 calculate band P r, calculate this test speaker's speech data to the marking P of each registration speaker model r-P b, obtain maximal value P maxand judging according to the threshold value Th setting, thereby identification test speaker.If P maxbe greater than the threshold value Th of setting, P maxcorresponding speaker is the speaker who recognizes, otherwise refuses to know.
Here it is to be noted, posterior probability on each registration speaker's that the universal background model that what metric computing unit 107 calculated in the embodiment described above is test speaker's eigenvector generates at background model generation unit 103 and registration speaker model generation unit 105 generate registration speaker model, and the posterior probability that recognition unit 109 availability magnitude calculation unit 107 calculate is identified test speaker.But, the present invention is not limited only to this, but can select as required other parameters, such as distance metric, similarity computation measure etc., respectively carry out corresponding computation measure at universal background model and registration on speaker model for the eigenvector of test speaker's speech data by metric computing unit 107, then set by recognition unit 109 that different threshold values is identified or according to knowledge.These methods can be tested speaker's identification equally quickly and easily, and those skilled in the art is in the above on the basis of the principle of work of described metric computing unit 107 and recognition unit 109, easily realizes by suitable processing.
For example, for the mode that adopts distance metric, first metric computing unit 107 can obtain the eigenvector of test speaker's speech data from feature extraction unit 101, then calculate the distance of each feature space component of each eigenvector and universal background model, such as more common Euclidean distance (Euclidean distance), manhatton distance (Manhattan distance), Minkowski Distance (Minkowski distance), Gauss's distance (Gaussian Divergence), Pasteur (Bhattacharyya, BHA) distance, Kullback-Leibler (KL) distance etc., and will sue for peace as the distance of this eigenvector and universal background model with the distance of each feature space component and the weighted accumulation of respective weights, and then can calculate all characteristic vectors of the speech data of testing speaker and the distance of universal background model.Based on same principle, can calculate test speaker's the eigenvector of speech data and the distance of registration speaker model.109 of recognition units calculate these two distances, and the distance of universal background model and and the distance of registration speaker model between poor.After the registration speaker model for all has calculated the difference between these two distances, the difference that recognition unit 109 also can be based on calculated is identified test speaker according to the threshold value of setting or according to knowledge.
In addition, for the mode that adopts similarity computation measure, first metric computing unit 107 for example can obtain the eigenvector of test speaker's speech data from feature extraction unit 101, then calculate the similarity of each feature space component of each eigenvector and universal background model, such as cosine similarity, Pearson's coefficient, adjust cosine similarity etc., and will sue for peace as the similarity of this eigenvector and universal background model with the similarity of each feature space component and the weighted accumulation of respective weights, and then can calculate all characteristic vectors of the speech data of testing speaker and the similarity of universal background model.Based on same principle, can calculate test speaker's the eigenvector of speech data and the similarity of registration speaker model.Equally, recognition unit 109 calculates these two similarities, and the similarity of universal background model and and the similarity of registration speaker model between poor.After the registration speaker model for all has calculated the difference between these two similarities, the difference that recognition unit 109 also can be based on calculated is identified test speaker according to the threshold value of setting or according to knowledge.
More than describe the Speaker Recognition System according to the embodiment of the present invention, described method for distinguishing speek person according to an embodiment of the invention in detail below in conjunction with accompanying drawing.Fig. 5 illustrates the process flow diagram of the processing procedure of method for distinguishing speek person according to an embodiment of the invention.
As shown in Figure 5, method for distinguishing speek person according to this embodiment of the invention comprises that characteristic extraction step S501, universal background model generate step S503, registration speaker model generates step S505, metric calculation procedure S507 and test Speaker Identification step S509.Due to according to the concrete processing procedure in above-mentioned each step of this embodiment of the invention respectively with the Speaker Recognition System of describing with reference to Fig. 1 in the modules such as feature extraction unit 101, background model generation unit 103, registration speaker model generation unit 105, metric computing unit 107 and recognition unit 109 in processing similar, therefore omit further detailed description at this.
In addition, Fig. 6 shows in detail according to the process flow diagram of the universal background model training process of a concrete example of the present invention.As shown in Figure 6, first extract the eigenvector of each background speaker's speech data at step S601, and build respectively each background speaker's KDTree at step S603 according to the processing procedure in KDTree construction unit above.
Then,, at step S605, select the suitably root node of layer to carry out this background speaker's internal feature cluster for each background speaker's KDTree, thereby generate a series of feature subclass.Then, at step S607, adopt ultimate range example-based approach to carry out cluster to had powerful connections speaker's feature subclass, may do cluster centre by sample a long way off to ensure to exhaust, be divided into feature space thereby all feature subclasses are carried out to space.
Finally, at step S609, all feature subclasses that comprise in each feature space are characterized, thereby generate the universal background model for general speaker.
Equally, the cluster of feature subclass and spatial division are not limited only to ultimate range example-based approach described above, but can select diverse ways to select cluster centre from had powerful connections speaker's feature subclass according to different situations, be not described in detail at this.
From finding out the detailed description of Speaker Recognition System and method for distinguishing speek person according to an embodiment of the invention above, in order to ensure each speaker's only characteristic concentrated distribution and training, first the feature of each speaker in background model is carried out to inner cluster (can adopt KDTree) according to Speaker Recognition System of the present invention and method for distinguishing speek person, then all speakers' feature subclass is carried out to space and cut apart, obtaining optimum space is the universal background model finally obtaining.Then with registration speaker's speech data, background model is carried out to self-adaptation, obtain registering speaker's model.In identification test process, calculate respectively the eigenvector of the speech data of testing speaker at universal background model and the metric of registering on speaker model, and give a mark and according to knowledge, finally judge speaker's identity.
Therefore, compare with traditional GMM-UBM system, according to the not only explicit physical meaning of this Speaker Recognition System of the present invention and method for distinguishing speek person, and fast operation, therefore can realize good recognition performance.
Ultimate principle of the present invention has below been described in conjunction with specific embodiments, but, it is to be noted, for those of ordinary skill in the art, can understand whole or any steps or the parts of method and apparatus of the present invention, can be in the network of any calculation element (comprising processor, storage medium etc.) or calculation element, realized with hardware, firmware, software or their combination, this is that those of ordinary skill in the art use their basic programming skill just can realize in the situation that having read explanation of the present invention.
Therefore, object of the present invention can also realize by move a program or batch processing on any calculation element.Described calculation element can be known fexible unit.Therefore, object of the present invention also can be only by providing the program product that comprises the program code of realizing described method or device to realize.That is to say, such program product also forms the present invention, and the storage medium that stores such program product also forms the present invention.Obviously, described storage medium can be any storage medium developing in any known storage medium or future.
In the situation that realizing embodiments of the invention by software and/or firmware, from storage medium or network to the computing machine with specialized hardware structure, example general purpose personal computer 700 is as shown in Figure 7 installed the program that forms this software, this computing machine, in the time that various program is installed, can be carried out various functions etc.
In Fig. 7, CPU (central processing unit) (CPU) 701 carries out various processing according to the program of storage in ROM (read-only memory) (ROM) 702 or from the program that storage area 708 is loaded into random access memory (RAM) 703.In RAM 703, also store as required data required in the time that CPU 701 carries out various processing etc.CPU 701, ROM 702 and RAM 703 are connected to each other via bus 704.Input/output interface 705 is also connected to bus 704.
Following parts are connected to input/output interface 705: importation 706, comprises keyboard, mouse etc.; Output 707, comprises display, such as cathode-ray tube (CRT) (CRT), liquid crystal display (LCD) etc., and loudspeaker etc.; Storage area 708, comprises hard disk etc.; With communications portion 709, comprise that network interface unit is such as LAN card, modulator-demodular unit etc.Communications portion 709 via network such as the Internet executive communication processing.
As required, driver 710 is also connected to input/output interface 705.Detachable media 711, such as disk, CD, magneto-optic disk, semiconductor memory etc. are installed on driver 710 as required, is installed in storage area 708 computer program of therefrom reading as required.
In the situation that realizing above-mentioned series of processes by software, from network such as the Internet or storage medium are such as detachable media 711 is installed the program that forms softwares.
It will be understood by those of skill in the art that this storage medium is not limited to wherein having program stored therein shown in Fig. 7, distributes separately the detachable media 711 so that program to be provided to user with device.The example of detachable media 711 comprises disk (comprising floppy disk (registered trademark)), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprising mini-disk (MD) (registered trademark)) and semiconductor memory.Or storage medium can be hard disk comprising in ROM 702, storage area 708 etc., wherein computer program stored, and be distributed to user together with comprising their device.
Also it is pointed out that in apparatus and method of the present invention, obviously, each parts or each step can decompose and/or reconfigure.These decomposition and/or reconfigure and should be considered as equivalents of the present invention.And, carry out the step of above-mentioned series of processes and can order naturally following the instructions carry out in chronological order, but do not need necessarily to carry out according to time sequencing.Some step can walk abreast or carry out independently of one another.
Although described the present invention and advantage thereof in detail, be to be understood that in the case of not departing from the spirit and scope of the present invention that limited by appended claim and can carry out various changes, alternative and conversion.And, the application's term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thereby the process, method, article or the device that make to comprise a series of key elements not only comprise those key elements, but also comprise other key elements of clearly not listing, or be also included as the intrinsic key element of this process, method, article or device.The in the situation that of more restrictions not, the key element being limited by statement " comprising ... ", and be not precluded within process, method, article or the device that comprises described key element and also have other identical element.

Claims (20)

1. a Speaker Recognition System, comprising:
Feature extraction unit, is configured to the eigenvector of the speech data that extracts speaker;
Background model generation unit, the eigenvector that is configured to the speech data to background speaker carries out inner cluster and generates the universal background model for general speaker according to the result of inner cluster;
Registration speaker model generation unit, is configured to the eigenvector of the speech data that utilizes each registration speaker to universal background model self-adaptation, generates each registration speaker's registration speaker model;
Metric computing unit, is configured to calculate the metric on each registration speaker's that universal background model that test speaker's eigenvector generates at background model generation unit and registration speaker model generation unit generate registration speaker model; And
Recognition unit, is configured to the metric identification test speaker who calculates according to metric computing unit.
2. Speaker Recognition System according to claim 1, wherein background model generation unit comprises:
Inner cluster cell, the eigenvector that is configured to the speech data to background speaker carries out inner cluster, to generate series of features subclass;
Feature subclass spatial division unit, is configured to select cluster centre the speaker's that has powerful connections who generates from inner cluster cell feature subclass, is divided into feature space so that all feature subclasses are carried out to space; And
Feature space characterization unit, all feature subclasses that are configured to comprising in each feature space characterize, to generate the universal background model for general speaker.
3. Speaker Recognition System according to claim 2, wherein the eigenvector of the speech data of inner cluster cell to each background speaker is constructed a KDTree and is carried out inner cluster according to nearest neighbouring rule.
4. Speaker Recognition System according to claim 3, wherein inner cluster cell comprises:
Voice segments extraction unit, is configured to the eigenvector of the speech data that has voice segments that extracts background speaker;
KDTree construction unit, the eigenvector that is configured to voice segments extraction unit to extract is configured to KDTree, make the value of the eigenvector of the dimension corresponding with this layer of all nodes on the left subtree of each root node on every one deck all be less than the value of the eigenvector of this dimension of this root node, on every one deck, on the right subtree of each root node, the value of the eigenvector of the dimension corresponding with this layer of all nodes is all greater than the value of the eigenvector of this dimension of this root node; And
Feature subclass generation unit, is configured to each root node and subtree thereof on arbitrary layer of KDTree constructed KDTree construction unit to be clustered into the feature subclass with common feature.
5. Speaker Recognition System according to claim 4, wherein feature subclass generation unit screens described each root node, retains the root node that corresponding child node number is many.
6. according to the arbitrary described Speaker Recognition System of claim 2 to 5, wherein feature subclass spatial division unit adopts ultimate range example-based approach, K-Mean method, minimum distance method, group average distance method or gravity model appoach to select cluster centre from the speaker's that has powerful connections of inner cluster cell generation feature subclass.
7. according to the arbitrary described Speaker Recognition System of claim 2 to 5, wherein feature space characterization unit characterizes all feature subclasses that comprise in each feature space with Gaussian function.
8. according to the arbitrary described Speaker Recognition System of claim 2 to 5, wherein feature space characterization unit calculates average and the variance of the eigenvector that all feature subclasses in each feature space comprise, to obtain the normal distyribution function of each feature space.
9. Speaker Recognition System according to claim 8, wherein registers speaker model generation unit
Obtain the eigenvector F of registration speaker's speech data;
For each eigenvector F, calculate its posterior probability p to each feature space k k,
p k = 1 ( 2 &pi; ) d / 2 | &Sigma; k | 1 / 2 exp { - 1 2 ( x - &mu; k ) T &Sigma; - 1 ( x - &mu; k ) } , k = 1,2 , . . . , N ,
Wherein, μ kthe average of the eigenvector comprising for all feature subclasses in each feature space, ∑ kthe variance of the eigenvector comprising for all feature subclasses in each feature space, N is the quantity that feature space is divided, d representation feature dimension;
Calculate and upgrade the factor &alpha; = 1 &gamma; + p k , γ is empirical value;
Average to each feature space is upgraded: μ ' kk(1-α)+α * F; And
Universal background model is carried out to self-adaptation by the average of feature space after upgrading, to generate this registration speaker's registration speaker model.
10. Speaker Recognition System according to claim 9, wherein
Described metric computing unit obtains the eigenvector of test speaker's speech data, and all eigenvectors of the test speaker's that calculating is obtained respectively speech data are to universal background model M bwith registration speaker model M rposterior probability P band P r,
P B = 1 m &Sigma; i = 1 m p B i
P R = 1 m &Sigma; i = 1 m p R i
p B i = &Sigma; k = 1 N log ( w k p k B )
= N * log w k - &Sigma; k = 1 N log ( ( 2 &pi; ) d / 2 | &Sigma; k B | 1 / 2 ) + &Sigma; k = 1 N ( - 1 2 ( x - &mu; k B ) T &Sigma; k - 1 ( x - &mu; k B ) )
p R i = &Sigma; k = 1 N log ( w k p k R )
= N * log w k - &Sigma; k = 1 N log ( ( 2 &pi; ) d / 2 | &Sigma; k R | 1 / 2 ) + &Sigma; k = 1 N ( - 1 2 ( x - &mu; k R ) T &Sigma; k - 1 ( x - &mu; k R ) )
Wherein, m is the quantity of the eigenvector of obtained test speaker's speech data, p k band p k rbe respectively eigenvector at universal background model M bwith registration speaker model M rk feature space on posterior probability; w kfor the weight of each feature space, w k = 1 N ; And
The marking P of this test of described recognition unit computes speaker's speech data to each registration speaker model r-P b, obtain maximal value P maxand according to the threshold value identification test speaker who sets.
11. 1 kinds of method for distinguishing speek person, comprising:
Extract the eigenvector of speaker's speech data;
The eigenvector of the speech data to background speaker carries out inner cluster and generates the universal background model for general speaker according to the result of inner cluster;
Utilize the eigenvector of each speech data of registering speaker to universal background model self-adaptation, generate each registration speaker's registration speaker model;
The metric of calculating test speaker's eigenvector on universal background model and each registration speaker's registration speaker model; And
According to calculated metric identification test speaker.
12. method for distinguishing speek person according to claim 11, wherein generate universal background model and comprise:
The eigenvector of the speech data to background speaker carries out inner cluster, to generate series of features subclass;
From the speaker's that has powerful connections the feature subclass generating, select cluster centre, be divided into feature space so that all feature subclasses are carried out to space; And
The all feature subclasses that comprise in each feature space are characterized, to generate the universal background model for general speaker.
13. method for distinguishing speek person according to claim 12, wherein, in inner cluster, an eigenvector KDTree of structure of the speech data to each background speaker also carries out inner cluster according to nearest neighbouring rule.
14. method for distinguishing speek person according to claim 13, wherein inner cluster comprises:
Extract the eigenvector of background speaker's the speech data that has voice segments;
The eigenvector of extraction is configured to KDTree, make the value of the eigenvector of the dimension corresponding with this layer of all nodes on the left subtree of each root node on every one deck all be less than the value of the eigenvector of this dimension of this root node, on every one deck, on the right subtree of each root node, the value of the eigenvector of the dimension corresponding with this layer of all nodes is all greater than the value of the eigenvector of this dimension of this root node; And
Each root node and subtree thereof on arbitrary layer of constructed KDTree are clustered into the feature subclass with common feature.
15. method for distinguishing speek person according to claim 14, wherein screen described each root node, retain the root node that corresponding child node number is many.
16. according to claim 12 to 15 arbitrary described method for distinguishing speek person, wherein adopt ultimate range example-based approach, K-Mean method, minimum distance method, group average distance method or gravity model appoach to select cluster centre from generated all registration speakers' feature subclass.
17. according to claim 12 to 15 arbitrary described method for distinguishing speek person, wherein characterize with Gaussian function all feature subclasses that comprise in each feature space.
18. according to claim 12 to 15 arbitrary described method for distinguishing speek person, wherein calculate average and the variance of the eigenvector that all feature subclasses in each feature space comprise, to obtain the normal distyribution function of each feature space.
19. method for distinguishing speek person according to claim 18, wherein generate registration speaker model and comprise:
Obtain the eigenvector F of registration speaker's speech data;
For each eigenvector F, calculate its posterior probability p to each feature space k k,
p k = 1 ( 2 &pi; ) d / 2 | &Sigma; k | 1 / 2 exp { - 1 2 ( x - &mu; k ) T &Sigma; - 1 ( x - &mu; k ) } , k = 1,2 , . . . , N ,
Wherein, μ kthe average of the eigenvector comprising for all feature subclasses in each feature space, ∑ kthe variance of the eigenvector comprising for all feature subclasses in each feature space, N is the quantity that feature space is divided, d is intrinsic dimensionality;
Calculate and upgrade the factor &alpha; = 1 &gamma; + p k , γ is empirical value;
Average to each feature space is upgraded: μ ' kk(1-α)+α * F; And
Universal background model is carried out to self-adaptation by the average of feature space after upgrading, to generate this registration speaker's registration speaker model.
20. method for distinguishing speek person according to claim 19, wherein
In metric calculates, obtain the eigenvector of test speaker's speech data, and all eigenvectors of the test speaker's that calculating is obtained respectively speech data are to universal background model M bwith registration speaker model M rposterior probability P band P r,
P B = 1 m &Sigma; i = 1 m p B i
P R = 1 m &Sigma; i = 1 m p R i
p B i = &Sigma; k = 1 N log ( w k p k B )
= N * log w k - &Sigma; k = 1 N log ( ( 2 &pi; ) d / 2 | &Sigma; k B | 1 / 2 ) + &Sigma; k = 1 N ( - 1 2 ( x - &mu; k B ) T &Sigma; k - 1 ( x - &mu; k B ) )
p R i = &Sigma; k = 1 N log ( w k p k R )
= N * log w k - &Sigma; k = 1 N log ( ( 2 &pi; ) d / 2 | &Sigma; k R | 1 / 2 ) + &Sigma; k = 1 N ( - 1 2 ( x - &mu; k R ) T &Sigma; k - 1 ( x - &mu; k R ) )
Wherein, m is the quantity of the eigenvector of obtained test speaker's speech data, p k band p k rbe respectively eigenvector at universal background model M bwith registration speaker model M rk feature space on posterior probability; w kfor the weight of each feature space, w k = 1 N ; And
In identification, the marking P of the speech data that calculates this test speaker to each registration speaker model r-P b, obtain maximal value P maxand according to the threshold value identification test speaker who sets.
CN200910170552.6A 2009-09-10 2009-09-10 Speaker recognition system and method Expired - Fee Related CN102024455B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910170552.6A CN102024455B (en) 2009-09-10 2009-09-10 Speaker recognition system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910170552.6A CN102024455B (en) 2009-09-10 2009-09-10 Speaker recognition system and method

Publications (2)

Publication Number Publication Date
CN102024455A CN102024455A (en) 2011-04-20
CN102024455B true CN102024455B (en) 2014-09-17

Family

ID=43865670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910170552.6A Expired - Fee Related CN102024455B (en) 2009-09-10 2009-09-10 Speaker recognition system and method

Country Status (1)

Country Link
CN (1) CN102024455B (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102238190B (en) * 2011-08-01 2013-12-11 安徽科大讯飞信息科技股份有限公司 Identity authentication method and system
CN102270451B (en) * 2011-08-18 2013-05-29 安徽科大讯飞信息科技股份有限公司 Method and system for identifying speaker
CN102737633B (en) * 2012-06-21 2013-12-25 北京华信恒达软件技术有限公司 Method and device for recognizing speaker based on tensor subspace analysis
CN102968990B (en) * 2012-11-15 2015-04-15 朱东来 Speaker identifying method and system
CN103106900B (en) * 2013-02-28 2016-05-04 用友网络科技股份有限公司 Speech recognition equipment and audio recognition method
CN103226951B (en) * 2013-04-19 2015-05-06 清华大学 Speaker verification system creation method based on model sequence adaptive technique
CN103219008B (en) * 2013-05-16 2016-04-20 清华大学 Based on the phrase sound method for distinguishing speek person of base state vector weighting
CN104464738B (en) * 2014-10-31 2018-01-02 北京航空航天大学 A kind of method for recognizing sound-groove towards Intelligent mobile equipment
CN104616655B (en) * 2015-02-05 2018-01-16 北京得意音通技术有限责任公司 The method and apparatus of sound-groove model automatic Reconstruction
CN106887231A (en) * 2015-12-16 2017-06-23 芋头科技(杭州)有限公司 A kind of identification model update method and system and intelligent terminal
CN105702263B (en) * 2016-01-06 2019-08-30 清华大学 Speech playback detection method and device
CN106981289A (en) * 2016-01-14 2017-07-25 芋头科技(杭州)有限公司 A kind of identification model training method and system and intelligent terminal
CN106971732A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of method and system that the Application on Voiceprint Recognition degree of accuracy is lifted based on identification model
CN107274904A (en) * 2016-04-07 2017-10-20 富士通株式会社 Method for distinguishing speek person and Speaker Identification equipment
CN106169295B (en) 2016-07-15 2019-03-01 腾讯科技(深圳)有限公司 Identity vector generation method and device
CN106683664A (en) * 2016-11-22 2017-05-17 中南大学 Voice starting method and system for wireless charging
CN108268948B (en) * 2017-01-03 2022-02-18 富士通株式会社 Data processing apparatus and data processing method
CN107240396B (en) * 2017-06-16 2023-01-17 百度在线网络技术(北京)有限公司 Speaker self-adaptation method, device, equipment and storage medium
CN107680600B (en) * 2017-09-11 2019-03-19 平安科技(深圳)有限公司 Sound-groove model training method, audio recognition method, device, equipment and medium
CN108417226A (en) * 2018-01-09 2018-08-17 平安科技(深圳)有限公司 Speech comparison method, terminal and computer readable storage medium
CN108922515A (en) * 2018-05-31 2018-11-30 平安科技(深圳)有限公司 Speech model training method, audio recognition method, device, equipment and medium
CN109147798B (en) * 2018-07-27 2023-06-09 北京三快在线科技有限公司 Speech recognition method, device, electronic equipment and readable storage medium
CN109545229B (en) * 2019-01-11 2023-04-21 华南理工大学 Speaker recognition method based on voice sample characteristic space track
CN110211595B (en) * 2019-06-28 2021-08-06 四川长虹电器股份有限公司 Speaker clustering system based on deep learning
CN110544481B (en) * 2019-08-27 2022-09-20 华中师范大学 S-T classification method and device based on voiceprint recognition and equipment terminal
CN111341324B (en) * 2020-05-18 2020-08-25 浙江百应科技有限公司 Fasttext model-based recognition error correction and training method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315771A (en) * 2008-06-04 2008-12-03 哈尔滨工业大学 Compensation method for different speech coding influence in speaker recognition

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315771A (en) * 2008-06-04 2008-12-03 哈尔滨工业大学 Compensation method for different speech coding influence in speaker recognition

Also Published As

Publication number Publication date
CN102024455A (en) 2011-04-20

Similar Documents

Publication Publication Date Title
CN102024455B (en) Speaker recognition system and method
US11244689B2 (en) System and method for determining voice characteristics
CN110457432B (en) Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium
US6253179B1 (en) Method and apparatus for multi-environment speaker verification
JP3532346B2 (en) Speaker Verification Method and Apparatus by Mixture Decomposition Identification
CN110310647B (en) Voice identity feature extractor, classifier training method and related equipment
CN107610707A (en) A kind of method for recognizing sound-groove and device
JP2014502375A (en) Passphrase modeling device and method for speaker verification, and speaker verification system
CN105656887A (en) Artificial intelligence-based voiceprint authentication method and device
CN106098068A (en) A kind of method for recognizing sound-groove and device
CN100363938C (en) Multi-model ID recognition method based on scoring difference weight compromised
CN105096955B (en) A kind of speaker&#39;s method for quickly identifying and system based on model growth cluster
US6684186B2 (en) Speaker recognition using a hierarchical speaker model tree
Apsingekar et al. Speaker model clustering for efficient speaker identification in large population applications
CN110110790B (en) Speaker confirmation method adopting unsupervised clustering score normalization
US11837236B2 (en) Speaker recognition based on signal segments weighted by quality
CN108091326A (en) A kind of method for recognizing sound-groove and system based on linear regression
Shivakumar et al. Simplified and supervised i-vector modeling for speaker age regression
CN114299920A (en) Method and device for training language model for speech recognition and speech recognition method and device
Michalevsky et al. Speaker identification using diffusion maps
WO2002029785A1 (en) Method, apparatus, and system for speaker verification based on orthogonal gaussian mixture model (gmm)
Jahangir et al. Automatic speaker identification through robust time domain features and hierarchical classification approach
Panda et al. Study of speaker recognition systems
CN115577357A (en) Android malicious software detection method based on stacking integration technology
JPWO2020003413A1 (en) Information processing equipment, control methods, and programs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140917

Termination date: 20150910

EXPY Termination of patent right or utility model