CN110853655A

CN110853655A - Initial method of voiceprint model based on K-means algorithm

Info

Publication number: CN110853655A
Application number: CN201810959778.3A
Authority: CN
Inventors: 杨瑞瑞; 李�浩
Original assignee: China Changfeng Science Technology Industry Group Corp
Current assignee: China Changfeng Science Technology Industry Group Corp
Priority date: 2018-08-20
Filing date: 2018-08-20
Publication date: 2020-02-28

Abstract

The invention provides a voiceprint model initial method based on a K-means algorithm. Then, selecting the data with the maximum density in the data set as the first initial clustering center nc₁Then, the first initial cluster center nc is selected from the remaining n-1 data₁The distance exceeds nc₁The threshold value of (1) and the data point with the maximum density are taken as the second initial clustering center nc₂And repeating the steps until k initial clustering centers meeting the condition are selected.

Description

Initial method of voiceprint model based on K-means algorithm

Technical Field

The invention belongs to the technical field of artificial intelligence, and relates to a voiceprint model initial method based on a K-means algorithm.

Background

In the voiceprint recognition, the UBM-MAP-GMM model can make up for the problem of low accuracy of the voiceprint model caused by insufficient training language data to a certain extent. The UBM model is a high-mixedness GMM model, and the GMM model is a parameterized model, and as the scale of the GMM model increases, the description capability of the model on data is also enhanced, and certainly, the required training data also increases. Training of the UBM model typically employs an expectation-maximization algorithm, EM, algorithm.

Expectation-maximization (EM) algorithm is proposed by Dempster, Laind and Rubin, and the algorithm can estimate parameters in maximum likelihood estimation by adopting an iterative idea. A UBM model may use the parameter λ ═ { ω ═ ω_i，μ_i，∑_i1,2, …, M is simply expressed, then, a process of training GMM model of M order is the M order { ω, which has the highest fitting degree to the training speech feature distribution_i，μ_i，∑_iParameter combinationThe most common method for solving is maximum likelihood estimation.

Let X be { X ═ X_tT1, 2.. T } is the voiceprint feature of the training sample, and the likelihood ratio expression with the voiceprint model λ is as follows:

where X represents the voiceprint features of the training speech sample and P (X | λ) is the estimation model parameter λ ═ ω { (ω } ω)_i，μ_i，∑_iThe objective of model training is to find the model parameter λ that maximizes the value of the P (X | λ) function^*The formula is as follows:

λ^*＝argmaxP(X|λ)

since the P (X | λ) function is a nonlinear function and has a large computational difficulty, the solution is usually performed using an EM algorithm. The iteration process of the EM algorithm is divided into two steps: and E Step and M Step, wherein the E Step is an Expectation Step (expecteration Step), and the M Step is a Maximum Likelihood Step (Maximum likehood Step). The estimation process of the model parameters is as follows: firstly, initializing the model parameters to lambda, and then obtaining new model parameters by adopting an iterative formula

Compare P (X | λ) withIs large or small, if

Greater than P (X | λ), it is used

The iterative calculation is repeated instead of λ until the algorithm converges or the maximum number of iterations is reached.

The advantage of the EM algorithm is that the likelihood function can be converged by a limited number of iterations. In the finite iteration process of the EM algorithm, the M step adopts a derivative method, so that the EM algorithm finds an extreme point, namely, the solution finally obtained through the EM algorithm is a local optimal solution rather than a global optimal solution.

The locality of the EM algorithm makes it very sensitive to the initial value, and initializing the EM algorithm with different values will result in different locally optimal solutions. The improper initial value of selection will make the EM algorithm converge to a poor local optimal solution finally, and at the same time, will affect the convergence speed of the EM algorithm.

The EM algorithm estimates a model with parameters based on an iterative idea, and therefore, when estimating parameters of the UBM model, the UBM model needs to be initialized first. According to the iterative idea of the EM algorithm, the algorithm is very sensitive to an initial value and is easily influenced by the initial value to fall into a locally optimal convergence result, however, the adoption of the randomly selected initial value can cause the algorithm to have low convergence speed, unstable convergence result and the like, in practical application, a partition clustering algorithm K-means algorithm is generally used for carrying out clustering analysis on data objects, and then a cluster center point obtained through clustering analysis is used as the initial value of the EM algorithm.

The traditional K-means algorithm randomly selects the initial clustering center, and due to the randomness and the instability of the selection of the initial clustering center, the clustering effect of the algorithm is influenced due to the improper selection of the initial clustering center, and finally a poor clustering center point can be obtained, so that a certain deviation is generated on the estimation of UBM model parameters, and the recognition performance of a voiceprint recognition system is influenced. The K-means algorithm is used for measuring the similarity between data objects through the spatial distance, can be only used for clustering spherical clusters, and is not ideal for clustering other clusters with any shapes.

Disclosure of Invention

The invention aims to combine the problems of the improved initial center algorithm and provides a voiceprint model initial method based on a K-means algorithm, and the initial center point is selected by combining the distance and the density.

The technical scheme of the invention is as follows:

a voiceprint model initial method based on a K-means algorithm is characterized by comprising the following steps:

(1) inputting: the number of clusters k and the data set X ═ X_i，i＝1,2，…,n}；

(2) And (3) outputting: a set nc of initial cluster centers;

(3) calculating the distance between the data objects in the data set, and storing the result in a matrix D;

(4) calculating the density and the threshold of each sample point according to a density and threshold calculation formula;

(5) selecting the data with the maximum density from the data set as the first initial clustering center nc₁And adds the data to the initial cluster center set nc, i.e., nc-nc ∪ { nc₁}；

(6) Selecting the next initial clustering center nc from the remaining n-1 data₂And added to the set nc, i.e., nc-nc ∪ { nc₂}. Wherein nc₂The following two conditions should be satisfied:

a.dis(nc₁，nc₂) Not less than nc₁A threshold value of (d);

b.nc₂is the data with the maximum density meeting the condition a in the remaining n-1 data;

(7) and (5) repeating the step (6) until k initial clustering centers are selected.

The initial central point is selected on the basis of comprehensively considering two factors of data distribution density and distance, so that the influence of randomly selecting the central point on the clustering effect can be effectively overcome, and the improved K-means algorithm has a better data clustering effect. The advantages of the invention are more obvious through comparative tests.

The method is carried out on a Matlab platform, and a part of voice data of a TIMIT voice library is selected for carrying out experiments. The voice data is composed of 580 pieces of voice data of 58 pieces, each having 10 sentences of voice. All 46 persons' voices are selected from the voice data to be used for training the UBM model, the voices of the other persons are used for establishing a personal GMM and carrying out voice test, wherein 9 sentences of voices are selected as GMM training data, and the rest voices are selected as tests.

Respectively simulating the clustering effect of a traditional K-means algorithm and an improved algorithm based on the K-means initial clustering center, selecting 50 sample points, and taking K as a value of 4. The improved K-means algorithm considers the influence of the initial clustering center on the clustering effect, selects the initial value of the clustering center on the basis of the distribution density of the sample points and the distance between high-density sample distribution areas, and has better clustering effect.

In the UBM-MAP-GMM model, the GMM is obtained through MAP algorithm adaptive training based on the UBM, so that the UBM and the GMM have the same mixing degree. In the experiment, in order to eliminate the interference of different Gaussian component numbers on the identification performance, the values of Gaussian mixture degrees are unified to 64. The method takes various values for testing the voice duration, and compares the recognition performance of the system under the condition that the UBM model parameters are initialized by the traditional K-means algorithm and the improved K-means algorithm respectively. From experimental results, it can be found that different model initialization algorithms have certain influence on the recognition rate of the system, wherein the recognition rate of the voiceprint recognition system using the improved K-means clustering algorithm is higher.

Detailed Description

It is generally considered that in the sample point space, the sample points in the low density region will segment the sample points in the high density region, and meanwhile, the data object with a longer distance interval is selected as the initial center point to represent the true distribution of the data itself than the randomly selected sample points [8-9 ]. Thus, in the improvement proposed herein, the initial value is chosen based on a dual constraint of local density and global distance: that is, the greater the density of the sample points, and at the same time, the greater the euclidean distance between the sample points, the smaller the probability that they are classified into a cluster, and the better the clustering effect as the initial clustering center. The specific process is as follows:

1. basic definition

Let the sample data set be { X_iI is 1,2, …, n }. Wherein the dimension of the sample data is d, and any two data X in the data set_i，x_jThe distance therebetween defines dis (X)_i，X_j) See formula (1).

Data X in dataset_iThe density of (2) is defined as shown in the formula (2).

ρ_iIs based on distance data X_iNearest m points X_iCalculated on the basis of the average distance to, and thus reflects the data X_iThe larger the distribution of other data in the vicinity, X_iThe denser the distribution of the nearby data points, the smaller the value, X is indicated_iThe looser the distribution of nearby data points. Wherein, the value of m is generally slightly larger than that of the cluster K.

Data X in dataset_iThe threshold of (2) is defined in equation (3).

R_iI.e. distance data X_iNearest m points to X_iThe threshold value is set to avoid that the initial clustering centers are selected only based on the density in the initial clustering center selection process, so that the phenomenon that different initial clustering centers belong to the same cluster occurs, and the final clustering result is further influenced.

2. Initial clustering center improvement algorithm

Let k cluster classes be C_jWherein j is 1,2, …, k, and k<n, k initial clustering centers nc_jWhere j is 1,2, …, k.

The basic idea of the improved initial clustering center algorithm based on density and distance is as follows: the distance from each data point in the data set to the m nearest data points in the vicinity is first calculated, and the density and threshold of each data point are calculated in turn. Then, selecting the data with the maximum density in the data set as the first initial clustering center nc₁Then, the first initial cluster center nc is selected from the remaining n-1 data₁The distance exceeds nc₁The threshold value of (1) and the data point with the maximum density are taken as the second initial clustering center nc₂And repeating the steps until k initial clustering centers meeting the condition are selected.

The algorithm comprises the following specific steps:

(2) And (3) outputting: set of initial cluster centers nc:

①dis(nc₁，nc₂) Not less than nc₁A threshold value of (d);

②nc₂is the data with the highest density satisfying the condition ① among the remaining n-1 data;

Claims

1. A voiceprint model initial method based on a K-means algorithm is characterized by comprising the following steps:

(2) And (3) outputting: a set nc of initial cluster centers;

(6) Selecting the next initial clustering center nc from the remaining n-1 data₂And added to the set nc, i.e., nc-nc ∪ { nc₂}; wherein nc₂The following two conditions should be satisfied:

a.dis(nc₁，nc₂) Not less than nc₁A threshold value of (d);