CN103474061A

CN103474061A - Automatic distinguishing method based on integration of classifier for Chinese dialects

Info

Publication number: CN103474061A
Application number: CN2013104161737A
Authority: CN
Inventors: 朱贺; 高红民; 王慧斌
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2013-09-12
Filing date: 2013-09-12
Publication date: 2013-12-25

Abstract

The invention discloses an automatic distinguishing method based on integration of a classifier for Chinese dialects. The method includes the four steps of Chinese dialect phonetic feature extraction, dialect model matching marking, classification vector extraction and rear end classification. A mode of two-stage feature extraction is adopted, and a GMM is used as a high-class feature extractor to be used. In the calculation process, phonetic features are sent to the GMM containing dialect phonetic prior knowledge to be marked, normalization and difference calculation is carried out the obtained marks, and classification vectors with the difference degree among high classes and the intra-class polymerization degree are formed. The classification vectors are sent to a rear end SVM classifier to be classified. The technical advantages of data distribution fitting of the GMM and classification surface modeling of an SVM are integrated, and distinguishing for Chinese dialects belonging to which the variety of dialect areas is finally achieved. The method can be used for distinguishing Chinese telephone dialect voice and the like stably and reliably, and is high in accuracy.

Description

Chinese dialect identification method based on Multiple Classifier Fusion

Technical field

The present invention relates to the speech identifying method of multiple Classifiers Combination technology, relate in particular to a kind of Chinese dialects discrimination method, belong to field of voice signal.

Background technology

Chinese dialect identification is the voice that utilize one section input of Computer Analysis, differentiates the voice processing technology in the affiliated dialect of speaker zone.In the such country multi-national, that in many ways say of China, the research that knowledge debated automatically in Chinese dialects is that basis has been set up in the accessible communication between China various nationalities, along with the fast development of China's science and technology, wherein fermenting especially huge using value and wide application prospect.The branch as the Research of Speech Recognition, in research in early days, the Chinese dialects identification system often adopts the layout strategy of single classifier list feature, ignored the application of information fusion in system, make system place one's entire reliance upon a certain sorter and a certain feature, restricted the raising of system performance.

Many information fusion are focuses of current information processing research field, it not only can be more comprehensively, describe objective phenomenon in detail, can also realize the excavation of the deep information.In the speech processes field, the information fusion mode is mainly adopted in two ways: one, many Fusion Features; Two, multiple Classifiers Combination.The former adopts the layout strategy of many feature-single classifiers, by the weighted sum of different characteristic score, makes in a system and uses a plurality of features simultaneously, thereby the decision-making of higher accuracy is provided; The latter adopts the layout strategy of multi-categorizer, will have in complementary Multiple Classifier Fusion to a system, and the difference by different sorters on classification policy realizes multiple classifition integrated classification result.In the research of corresponding Multiple Classifier Fusion, be mostly for the speech recognition with text-dependent at present, and can adapt to the syncretizing mechanism of text-independent speech recognition actually rare.

Summary of the invention

Goal of the invention: for problems of the prior art, the present invention be take two-level classifier as framework, has proposed a set of new Multiple Classifier Fusion mechanism, specifically a kind of Chinese dialect identification method based on Multiple Classifier Fusion.The present invention can better extract different information between class Chinese dialects phonetic feature, and more adapts to and the recognition systems such as the dialect of text-independent, languages identification, significantly improves classification capacity and robustness.

In Multiple Classifier Fusion, the performance of emerging system depends primarily on following 2 points: one, sorter chooses; Two, the design of syncretizing mechanism.Choosing of sorter, usually require multi-categorizer to there is complementarity on classification policy, thereby realize the decision-making that degree of confidence is higher after fusion.Given this, this paper selects production sorter gauss hybrid models (GMM) and deterministic sorter supporting vector machine (SVM) for merging object.As the production sorter, GMM has data fitting ability preferably, can describe preferably the distribution of overall data.But, due to need to be from complete data learning parameter, for the data volume of training set, require too high and cycle of training is longer.Compare, SVM does not possess capability of fitting that data preferably distribute but the state of interpretive classification face comparatively clearly.Therefore, GMM and SVM have the complementary advantage that its fusion can be brought into play to two kinds of sorters on principle.For the design of syncretizing mechanism, can take the rear end mark to merge and two kinds of modes of multistage fusion.The former carries out degree of confidence marking to the decision-making of SVM, and the marking of itself and GMM is weighted and asks, and with this, carries out the classification decision-making; The maker of the latter using GMM as classification vector, generate the classification vector that contains global information and send into SVM and classified.In dialect identification due to the distribution of data too complexity and data volume too huge, should not use SVM classified and give a mark the raw tone feature, during this external mark merges, the selection of weight also has certain difficulty, therefore, the multistage classifier emerging system is adapted to the Chinese dialects Research on Identification more.The two-level classifier of tradition based on GMM, SVM merges the common Fisher of employing kernel function as syncretizing mechanism, and the acoustic information that not only contains dialect phonetic in extracted feature also contains the global information of this dialect, is a kind of senior classification vector.But, wherein also exist many limitations.At first, the mapping space of Fisher kernel function exists the hidden danger of dimension disaster, is difficult to meet big data quantity and speech recognition text-independent.Secondly, for same speech primitive, certain correlativity is arranged between the marking of different dialect models, as shown in table 1, and plant the class representativeness that correlativity has affected classification vector.Finally, for the dialect identification, we expect that characteristic of division embodies difference between the class of dialect, and different dialect models are to the otherness between one section speech score.

The marking of the different dialect models of table .1 to speech primitive

Technical scheme: a kind of Chinese dialect identification method based on Multiple Classifier Fusion, select production sorter gauss hybrid models (GMM) and deterministic sorter supporting vector machine (SVM) for merging object, production sorter gauss hybrid models is the production probability statistics model, and its probability density computing formula is:

P (x | W_{n}) = Σ_{i = 1}^{k} w_{ni} \frac{1}{{(2 π)}^{N} {| Σ_{ni} |}^{1 / 2}} \exp (- \frac{1}{2} {(x - μ_{ni})}^{T} Σ_{ni}^{- 1} (x - μ_{ni})) - - - (1)

Wherein, the acoustic feature that X is a speech primitive, w _ni, μ _ni, Σ _nirepresent respectively weight, average and the covariance matrix of each Gaussian Mixture unit in dialect GMM, k is for mixing first dimension.Input Chinese dialects signal carries out speech feature extraction, in the leaching process of new characteristic of division, at first utilizes the GMM of known training sample set training dialect; Then speech data is input in the various dialect GMM that design, speech primitive is carried out to likelihood marking, form mark vector [P (x _i| μ ₁Σ ₁) P (x _i| μ ₂Σ ₂) ... P (x _i| μ _nΣ _n)], realize the mapping from the raw tone feature space to minute number space.Finally this vector is carried out to normalized and calculus of differences.Its calculation procedure is as follows:

One, the score of voice is carried out to normalized:

SV _i＝(1/C _i)·[P(x _i|μ ₁Σ ₁)P(x _i|μ ₂Σ ₂)…P(x _i|μ _NΣ _N)] (2)

C wherein _ibe normalized factor, get in literary composition:

C_{i} = \max_{n} (P (x_{i} | μ_{n} Σ_{n})), n = 1 . . . N

Two, calculate the mark difference:

φ′(x _i)＝[(SV _i1-SV _i2)(SV _i1-SV _i3)…(SV _i1-SV _iN),(SV _i2-SV _i3)(SV _i2-SV _i4)

…(SV _i2-SV _iN),…,(SV _iN-1-SV _iN)] (3)

Subsequently, based on training classification vector training svm classifier device.

The present invention adopts technique scheme, has following beneficial effect: it can well solve Fisher kernel function existing problem in the classification vector design, embodies different information between class simultaneously, more adapts to the speech recognition work such as dialect, Language Identification.

The accompanying drawing explanation

The method flow diagram that Fig. 1 is the embodiment of the present invention;

The distribution plan that Fig. 2 is the primary features vector classification vector in the embodiment of the present invention.

Embodiment

Below in conjunction with specific embodiment, further illustrate the present invention, should understand these embodiment only is not used in and limits the scope of the invention for the present invention is described, after having read the present invention, those skilled in the art all fall within the application's claims limited range to the modification of the various equivalent form of values of the present invention.

As shown in Figure 1, Chinese dialect identification method based on Multiple Classifier Fusion, select production sorter gauss hybrid models (GMM) and deterministic sorter supporting vector machine (SVM) for merging object, production sorter gauss hybrid models is the production probability statistics model, it can roughly describe out the global information in data space, and its probability density computing formula is:

P (x | W_{n}) = Σ_{i = 1}^{k} w_{ni} \frac{1}{{(2 π)}^{N} {| Σ_{ni} |}^{1 / 2}} \exp (- \frac{1}{2} {(x - μ_{ni})}^{T} Σ_{ni}^{- 1} (x - μ_{ni})) - - - (1)

Wherein, the acoustic feature that X is a speech primitive, w _ni, μ _ni, Σ _nirepresent respectively weight, average and the covariance matrix of each Gaussian Mixture unit in dialect GMM, k is for mixing first dimension.In the leaching process of new characteristic of division, at first utilize the GMM of known training sample set training dialect.Then speech data is input in the various dialect GMM that design, speech primitive is carried out to likelihood marking, form mark vector [P (x _i| μ ₁Σ ₁) P (x _i| μ ₂Σ ₂) ... P (x _i| μ _nΣ _n)], realize the mapping from the raw tone feature space to minute number space.Finally this vector is carried out to normalized and calculus of differences.Its calculation procedure is as follows:

One, the score of voice is carried out to normalized:

C wherein _ibe normalized factor, get in literary composition:

C_{i} = \max_{n} (P (x_{i} | μ_{n} Σ_{n})), n = 1 . . . N

Two, calculate the mark difference:

…(SV _i2-SV _iN),…,(SV _iN-1-SV _iN)] (3)

In this fusion process, the impact of correlativity between the marking of having mentioned in having reduced above by normalized on the one hand on discrimination, different information between the class of the proposition dialect of the mark calculus of differences success after by different dialect GMM, a certain primitive being given a mark on the other hand, make

(x _i) in not only include acoustic information, global information and also contain different information between the dialect class.Distribution (as shown in Figure 2) by Wu Fangyan, Guangdong dialect, three kinds of dialect classification vector of Fujian dialect, can see that the dialect phonetic classification vector of the same race of comparing with primitive character under new syncretizing mechanism has embodied difference effect between stronger cluster and class, is more applicable for dialect identification work.

Due to svm classifier device classification capacity and outstanding generalization preferably, in merging, the device of multiclass classification usually select SVM as the rear end sorter.The Chinese dialects identification is a multi-class classification problem in essence, and the solution of this problem at present is mainly to adopt the classification strategies such as decision Tree algorithms, " one to one " classification policy, " one-to-many ".But, due to the complicacy that multi-class sample data distributes, experimental results demonstrate that the identification system based on above strategy is unsatisfactory when processing multi-class classification problem.The present invention adopts the ECOC algorithm, and this algorithm is treated the sub-category binary-coding that carries out, and usings this label as classification.In the process of coding, in algorithm requirement code matrix, between the code word of each row and column, to keep independence and separability.Accordingly, the ECOC algorithm requires when 3≤k≤7, and the maximum length of code book should be 2 ^k-1-1 dimension, wherein k is the classification number.Its coding rule is: the first row contains unit vector, the second row code book be by 2 ^k-2individual 0 and 2 ^k-2individual 1 alternately forms, and by that analogy, the capable code book of i is by 2 ^k-iindividual 0 and 2 ^k-iindividual 1 alternately forms.Suppose take that 4 class problems are object of classification, just need the code book of 7 dimensions to carry out code Design, as shown in table 1, the vector of wherein going is the coding of ECOC algorithm for each classification.Class label design category device according to the column vector in code word matrix, obtain f ₁, f ₂... f _nn≤2 ^k-1-1.In test process, this algorithm is at first to inputting voice according to f ₁, f ₂... f _nclassifying rules is classified, and then according to classification results, unknown voice is encoded, and designs the code book of these voice, finally itself and known classification code book is complementary, and realizes decision-making.The ECOC algorithm is used and carries out matching measurement based on Hamming distance from nearest neighbor algorithm, has certain fault-tolerance, and these characteristics are particularly important in multi-class classification.This paper is used the ECOC algorithm to realize the identification of multiclass dialect, as shown in table 2.

Table .2 classification coding

During training, adopt the training data voice, train respectively 128 dimension GMM models of different dialects, and output is to the likelihood marking of every section training utterance (15s).By normalization and Difference Calculation, generate the training classification vector subsequently.Subsequently, based on training classification vector training svm classifier device.

During test, the Chinese dialects speech data of input, according to above-mentioned flow process, is sent in the GMM model and is given a mark, and extract classification vector, classified.

So far complete the identification of Chinese dialects.

Claims

1. the Chinese dialect identification method based on Multiple Classifier Fusion, it is characterized in that: select GMM and SVM for merging object, input Chinese dialects signal carries out speech feature extraction, in the leaching process of new characteristic of division, at first utilizes the GMM of known training sample set training dialect; Then speech data is input in the GMM of the various dialects that design, speech primitive is carried out to likelihood marking, form mark vector [P (x _i| μ ₁Σ ₁) P (x _i| μ ₂Σ ₂) ... P (x _i| μ _nΣ _n)], realize the mapping from the raw tone feature space to minute number space; Secondly this mark vector is carried out to normalized and calculus of differences; Subsequently, based on training classification vector training svm classifier device.

2. the Chinese dialect identification method based on Multiple Classifier Fusion as claimed in claim 1, it is characterized in that: GMM is the production probability statistics model, its probability density computing formula is:

P (x | W_{n}) = Σ_{i = 1}^{k} w_{ni} \frac{1}{{(2 π)}^{N} {| Σ_{ni} |}^{1 / 2}} \exp (- \frac{1}{2} {(x - μ_{ni})}^{T} Σ_{ni}^{- 1} (x - μ_{ni})) - - - (1)

Wherein, the acoustic feature that X is a speech primitive, w _ni, μ _ni, Σ _nirepresent respectively weight, average and the covariance matrix of each Gaussian Mixture unit in dialect GMM, k is for mixing first dimension.

3. the Chinese dialect identification method based on Multiple Classifier Fusion as claimed in claim 1 is characterized in that: described mark vector is carried out to normalized and calculus of differences is calculated as follows:

One, the score of voice is carried out to normalized:

C wherein _ibe normalized factor, get:

C_{i} = \max_{n} (P (x_{i} | μ_{n} Σ_{n})), n = 1 . . . N;

Two, calculate the mark difference:

…(SV _i2-SV _iN),…,(SV _iN-1-SV _iN)] (3)。

4. the Chinese dialect identification method based on Multiple Classifier Fusion as claimed in claim 1 is characterized in that: in training classification vector training svm classifier device, adopt the ECOC algorithm to treat the sub-category binary-coding that carries out, using this label as classification; In the process of coding, in requirement code matrix, between the code word of each row and column, to keep independence and separability; When 3≤k≤7, the maximum length of code book should be 2 ^k-1-1 dimension, wherein k is the classification number; Coding rule is: the first row contains unit vector, the second row code book be by 2 ^k-2individual 0 and 2 ^k-2individual 1 alternately forms, and by that analogy, the capable code book of i is by 2 ^k-iindividual 0 and 2 ^k-iindividual 1 alternately forms; Suppose take that 4 class problems are object of classification, just need the code book of 7 dimensions to carry out code Design, the row vector is the coding of ECOC algorithm for each classification; Class label design category device according to the column vector in code word matrix, obtain f ₁, f ₂... f _nn≤2 ^k-1-1; In test process, this algorithm is at first to inputting voice according to f ₁, f ₂... f _nclassifying rules is classified, and then according to classification results, unknown voice is encoded, and designs the code book of these voice, finally itself and known classification code book is complementary.