CN110853655A - Initial method of voiceprint model based on K-means algorithm - Google Patents

Initial method of voiceprint model based on K-means algorithm Download PDF

Info

Publication number
CN110853655A
CN110853655A CN201810959778.3A CN201810959778A CN110853655A CN 110853655 A CN110853655 A CN 110853655A CN 201810959778 A CN201810959778 A CN 201810959778A CN 110853655 A CN110853655 A CN 110853655A
Authority
CN
China
Prior art keywords
data
initial
algorithm
density
center
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810959778.3A
Other languages
Chinese (zh)
Inventor
杨瑞瑞
李�浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Changfeng Science Technology Industry Group Corp
Original Assignee
China Changfeng Science Technology Industry Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Changfeng Science Technology Industry Group Corp filed Critical China Changfeng Science Technology Industry Group Corp
Priority to CN201810959778.3A priority Critical patent/CN110853655A/en
Publication of CN110853655A publication Critical patent/CN110853655A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/14Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a voiceprint model initial method based on a K-means algorithm. Then, selecting the data with the maximum density in the data set as the first initial clustering center nc1Then, the first initial cluster center nc is selected from the remaining n-1 data1The distance exceeds nc1The threshold value of (1) and the data point with the maximum density are taken as the second initial clustering center nc2And repeating the steps until k initial clustering centers meeting the condition are selected.

Description

Initial method of voiceprint model based on K-means algorithm
Technical Field
The invention belongs to the technical field of artificial intelligence, and relates to a voiceprint model initial method based on a K-means algorithm.
Background
In the voiceprint recognition, the UBM-MAP-GMM model can make up for the problem of low accuracy of the voiceprint model caused by insufficient training language data to a certain extent. The UBM model is a high-mixedness GMM model, and the GMM model is a parameterized model, and as the scale of the GMM model increases, the description capability of the model on data is also enhanced, and certainly, the required training data also increases. Training of the UBM model typically employs an expectation-maximization algorithm, EM, algorithm.
Expectation-maximization (EM) algorithm is proposed by Dempster, Laind and Rubin, and the algorithm can estimate parameters in maximum likelihood estimation by adopting an iterative idea. A UBM model may use the parameter λ ═ { ω ═ ωi,μi,∑i1,2, …, M is simply expressed, then, a process of training GMM model of M order is the M order { ω, which has the highest fitting degree to the training speech feature distributioni,μi,∑iParameter combinationThe most common method for solving is maximum likelihood estimation.
Let X be { X ═ XtT1, 2.. T } is the voiceprint feature of the training sample, and the likelihood ratio expression with the voiceprint model λ is as follows:
Figure BDA0001770703780000011
where X represents the voiceprint features of the training speech sample and P (X | λ) is the estimation model parameter λ ═ ω { (ω } ω)i,μi,∑iThe objective of model training is to find the model parameter λ that maximizes the value of the P (X | λ) function*The formula is as follows:
λ*=argmaxP(X|λ)
since the P (X | λ) function is a nonlinear function and has a large computational difficulty, the solution is usually performed using an EM algorithm. The iteration process of the EM algorithm is divided into two steps: and E Step and M Step, wherein the E Step is an Expectation Step (expecteration Step), and the M Step is a Maximum Likelihood Step (Maximum likehood Step). The estimation process of the model parameters is as follows: firstly, initializing the model parameters to lambda, and then obtaining new model parameters by adopting an iterative formula
Figure BDA0001770703780000012
Compare P (X | λ) withIs large or small, if
Figure BDA0001770703780000014
Greater than P (X | λ), it is used
Figure BDA0001770703780000015
The iterative calculation is repeated instead of λ until the algorithm converges or the maximum number of iterations is reached.
The advantage of the EM algorithm is that the likelihood function can be converged by a limited number of iterations. In the finite iteration process of the EM algorithm, the M step adopts a derivative method, so that the EM algorithm finds an extreme point, namely, the solution finally obtained through the EM algorithm is a local optimal solution rather than a global optimal solution.
The locality of the EM algorithm makes it very sensitive to the initial value, and initializing the EM algorithm with different values will result in different locally optimal solutions. The improper initial value of selection will make the EM algorithm converge to a poor local optimal solution finally, and at the same time, will affect the convergence speed of the EM algorithm.
The EM algorithm estimates a model with parameters based on an iterative idea, and therefore, when estimating parameters of the UBM model, the UBM model needs to be initialized first. According to the iterative idea of the EM algorithm, the algorithm is very sensitive to an initial value and is easily influenced by the initial value to fall into a locally optimal convergence result, however, the adoption of the randomly selected initial value can cause the algorithm to have low convergence speed, unstable convergence result and the like, in practical application, a partition clustering algorithm K-means algorithm is generally used for carrying out clustering analysis on data objects, and then a cluster center point obtained through clustering analysis is used as the initial value of the EM algorithm.
The traditional K-means algorithm randomly selects the initial clustering center, and due to the randomness and the instability of the selection of the initial clustering center, the clustering effect of the algorithm is influenced due to the improper selection of the initial clustering center, and finally a poor clustering center point can be obtained, so that a certain deviation is generated on the estimation of UBM model parameters, and the recognition performance of a voiceprint recognition system is influenced. The K-means algorithm is used for measuring the similarity between data objects through the spatial distance, can be only used for clustering spherical clusters, and is not ideal for clustering other clusters with any shapes.
Disclosure of Invention
The invention aims to combine the problems of the improved initial center algorithm and provides a voiceprint model initial method based on a K-means algorithm, and the initial center point is selected by combining the distance and the density.
The technical scheme of the invention is as follows:
a voiceprint model initial method based on a K-means algorithm is characterized by comprising the following steps:
(1) inputting: the number of clusters k and the data set X ═ Xi,i=1,2,…,n};
(2) And (3) outputting: a set nc of initial cluster centers;
(3) calculating the distance between the data objects in the data set, and storing the result in a matrix D;
(4) calculating the density and the threshold of each sample point according to a density and threshold calculation formula;
(5) selecting the data with the maximum density from the data set as the first initial clustering center nc1And adds the data to the initial cluster center set nc, i.e., nc-nc ∪ { nc1};
(6) Selecting the next initial clustering center nc from the remaining n-1 data2And added to the set nc, i.e., nc-nc ∪ { nc2}. Wherein nc2The following two conditions should be satisfied:
a.dis(nc1,nc2) Not less than nc1A threshold value of (d);
b.nc2is the data with the maximum density meeting the condition a in the remaining n-1 data;
(7) and (5) repeating the step (6) until k initial clustering centers are selected.
The initial central point is selected on the basis of comprehensively considering two factors of data distribution density and distance, so that the influence of randomly selecting the central point on the clustering effect can be effectively overcome, and the improved K-means algorithm has a better data clustering effect. The advantages of the invention are more obvious through comparative tests.
The method is carried out on a Matlab platform, and a part of voice data of a TIMIT voice library is selected for carrying out experiments. The voice data is composed of 580 pieces of voice data of 58 pieces, each having 10 sentences of voice. All 46 persons' voices are selected from the voice data to be used for training the UBM model, the voices of the other persons are used for establishing a personal GMM and carrying out voice test, wherein 9 sentences of voices are selected as GMM training data, and the rest voices are selected as tests.
Respectively simulating the clustering effect of a traditional K-means algorithm and an improved algorithm based on the K-means initial clustering center, selecting 50 sample points, and taking K as a value of 4. The improved K-means algorithm considers the influence of the initial clustering center on the clustering effect, selects the initial value of the clustering center on the basis of the distribution density of the sample points and the distance between high-density sample distribution areas, and has better clustering effect.
In the UBM-MAP-GMM model, the GMM is obtained through MAP algorithm adaptive training based on the UBM, so that the UBM and the GMM have the same mixing degree. In the experiment, in order to eliminate the interference of different Gaussian component numbers on the identification performance, the values of Gaussian mixture degrees are unified to 64. The method takes various values for testing the voice duration, and compares the recognition performance of the system under the condition that the UBM model parameters are initialized by the traditional K-means algorithm and the improved K-means algorithm respectively. From experimental results, it can be found that different model initialization algorithms have certain influence on the recognition rate of the system, wherein the recognition rate of the voiceprint recognition system using the improved K-means clustering algorithm is higher.
Detailed Description
It is generally considered that in the sample point space, the sample points in the low density region will segment the sample points in the high density region, and meanwhile, the data object with a longer distance interval is selected as the initial center point to represent the true distribution of the data itself than the randomly selected sample points [8-9 ]. Thus, in the improvement proposed herein, the initial value is chosen based on a dual constraint of local density and global distance: that is, the greater the density of the sample points, and at the same time, the greater the euclidean distance between the sample points, the smaller the probability that they are classified into a cluster, and the better the clustering effect as the initial clustering center. The specific process is as follows:
1. basic definition
Let the sample data set be { XiI is 1,2, …, n }. Wherein the dimension of the sample data is d, and any two data X in the data seti,xjThe distance therebetween defines dis (X)i,Xj) See formula (1).
Figure BDA0001770703780000031
Data X in datasetiThe density of (2) is defined as shown in the formula (2).
ρiIs based on distance data XiNearest m points XiCalculated on the basis of the average distance to, and thus reflects the data XiThe larger the distribution of other data in the vicinity, XiThe denser the distribution of the nearby data points, the smaller the value, X is indicatediThe looser the distribution of nearby data points. Wherein, the value of m is generally slightly larger than that of the cluster K.
Data X in datasetiThe threshold of (2) is defined in equation (3).
Figure BDA0001770703780000041
RiI.e. distance data XiNearest m points to XiThe threshold value is set to avoid that the initial clustering centers are selected only based on the density in the initial clustering center selection process, so that the phenomenon that different initial clustering centers belong to the same cluster occurs, and the final clustering result is further influenced.
2. Initial clustering center improvement algorithm
Let k cluster classes be CjWherein j is 1,2, …, k, and k<n, k initial clustering centers ncjWhere j is 1,2, …, k.
The basic idea of the improved initial clustering center algorithm based on density and distance is as follows: the distance from each data point in the data set to the m nearest data points in the vicinity is first calculated, and the density and threshold of each data point are calculated in turn. Then, selecting the data with the maximum density in the data set as the first initial clustering center nc1Then, the first initial cluster center nc is selected from the remaining n-1 data1The distance exceeds nc1The threshold value of (1) and the data point with the maximum density are taken as the second initial clustering center nc2And repeating the steps until k initial clustering centers meeting the condition are selected.
The algorithm comprises the following specific steps:
(1) inputting: the number of clusters k and the data set X ═ Xi,i=1,2,…,n};
(2) And (3) outputting: set of initial cluster centers nc:
(3) calculating the distance between the data objects in the data set, and storing the result in a matrix D;
(4) calculating the density and the threshold of each sample point according to a density and threshold calculation formula;
(5) selecting the data with the maximum density from the data set as the first initial clustering center nc1And adds the data to the initial cluster center set nc, i.e., nc-nc ∪ { nc1};
(6) Selecting the next initial clustering center nc from the remaining n-1 data2And added to the set nc, i.e., nc-nc ∪ { nc2}. Wherein nc2The following two conditions should be satisfied:
①dis(nc1,nc2) Not less than nc1A threshold value of (d);
②nc2is the data with the highest density satisfying the condition ① among the remaining n-1 data;
(7) and (5) repeating the step (6) until k initial clustering centers are selected.

Claims (1)

1. A voiceprint model initial method based on a K-means algorithm is characterized by comprising the following steps:
(1) inputting: the number of clusters k and the data set X ═ Xi,i=1,2,…,n};
(2) And (3) outputting: a set nc of initial cluster centers;
(3) calculating the distance between the data objects in the data set, and storing the result in a matrix D;
(4) calculating the density and the threshold of each sample point according to a density and threshold calculation formula;
(5) selecting the data with the maximum density from the data set as the first initial clustering center nc1And adds the data to the initial cluster center set nc, i.e., nc-nc ∪ { nc1};
(6) Selecting the next initial clustering center nc from the remaining n-1 data2And added to the set nc, i.e., nc-nc ∪ { nc2}; wherein nc2The following two conditions should be satisfied:
a.dis(nc1,nc2) Not less than nc1A threshold value of (d);
b.nc2is the data with the maximum density meeting the condition a in the remaining n-1 data;
(7) and (5) repeating the step (6) until k initial clustering centers are selected.
CN201810959778.3A 2018-08-20 2018-08-20 Initial method of voiceprint model based on K-means algorithm Pending CN110853655A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810959778.3A CN110853655A (en) 2018-08-20 2018-08-20 Initial method of voiceprint model based on K-means algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810959778.3A CN110853655A (en) 2018-08-20 2018-08-20 Initial method of voiceprint model based on K-means algorithm

Publications (1)

Publication Number Publication Date
CN110853655A true CN110853655A (en) 2020-02-28

Family

ID=69595105

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810959778.3A Pending CN110853655A (en) 2018-08-20 2018-08-20 Initial method of voiceprint model based on K-means algorithm

Country Status (1)

Country Link
CN (1) CN110853655A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115221980A (en) * 2022-09-16 2022-10-21 之江实验室 Load clustering method based on feature extraction and improved K-means algorithm

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115221980A (en) * 2022-09-16 2022-10-21 之江实验室 Load clustering method based on feature extraction and improved K-means algorithm

Similar Documents

Publication Publication Date Title
Li et al. Vlad3: Encoding dynamics of deep features for action recognition
Engel et al. Incremental learning of multivariate gaussian mixture models
WO2014029099A1 (en) I-vector based clustering training data in speech recognition
Najar et al. A fixed-point estimation algorithm for learning the multivariate ggmm: application to human action recognition
Channoufi et al. Color image segmentation with bounded generalized gaussian mixture model and feature selection
Ding et al. Variational nonparametric Bayesian hidden Markov model
CN110853655A (en) Initial method of voiceprint model based on K-means algorithm
US6789063B1 (en) Acoustic modeling using a two-level decision tree in a speech recognition system
CN115455670A (en) non-Gaussian noise model establishment method based on Gaussian mixture model
Haider et al. Sequence training of DNN acoustic models with natural gradient
Głomb et al. Unsupervised parameter selection for gesture recognition with vector quantization and hidden markov models
Farsi et al. Implementation and optimization of a speech recognition system based on hidden Markov model using genetic algorithm
Yang et al. Unsupervised dimensionality reduction for gaussian mixture model
Wang et al. Adaptive density estimation based on self-organizing incremental neural network using Gaussian process
Hashimoto et al. Bayesian context clustering using cross valid prior distribution for HMM-based speech recognition.
Papageorgiou et al. Context-tree weighting for real-valued time series: Bayesian inference with hierarchical mixture models
Nemati et al. Particle swarm optimization for feature selection in speaker verification
Park et al. Clustering of Gaussian probability density functions using centroid neural network
Memon et al. Information theoretic expectation maximization based Gaussian mixture modeling for speaker verification
Zhao et al. A hybrid method for incomplete data imputation
CN113033495B (en) Weak supervision behavior identification method based on k-means algorithm
CN115222945B (en) Deep semantic segmentation network training method based on multi-scale self-adaptive course learning
Samira et al. A novel speech recognition approach based on multiple modeling by hidden Markov models
CN111276188B (en) Short-time-sequence gene expression data clustering method based on angle characteristics
CN114550697B (en) Voice sample equalization method combining mixed sampling and random forest

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200228

WD01 Invention patent application deemed withdrawn after publication