CN102543075A

CN102543075A - Speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on virtual instrument technology

Info

Publication number: CN102543075A
Application number: CN201210008213XA
Authority: CN
Inventors: 刘祥楼; 吴香艳; 张明; 姜继玉; 刘昭廷
Original assignee: Northeast Petroleum University
Current assignee: Northeast Petroleum University
Priority date: 2012-01-12
Filing date: 2012-01-12
Publication date: 2012-07-04

Abstract

The invention relates to a speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on a virtual instrument technology, and the system comprises a voice pre-processing unit, a characteristic extracting unit, a speaker model unit, an identification unit and a LabVIEW virtual instrument platform; a large program is divided into various small modules through LabVIEW sub VIs on the virtual instrument platform; a part in the program which involves the transfer of an MATLAB (Matrix Laboratory) node is written into various sub Vis; and the system is constructed by transferring the sub Vis. According to the speaker VQ-SVM parallel identification system based on the virtual instrument technology, the defects of an existing VQ-SVM that two methods are mixed to identify the speaker, series identification needs to be carried out and time is wasted are overcome, the invention puts forward to centralize the VQ and SVM methods onto the same platform and carry out parallel identification, so that the identification time is saved on the premise of improving the identification effect of the whole system.

Description

Based on the parallel recognition system of the speaker VQ-SVM of virtual instrument technique

One, technical field:

What the present invention relates to is signal Processing and area of pattern recognition, the speaker VQ-SVM that is based on virtual instrument technique that the is specifically related to recognition system that walks abreast.

Two, background technology:

Speaker Identification is to reach the purpose that identifies speaker ' s identity through the phonetic feature of analyzing the speaker.Method for distinguishing speek person mainly comprises vector quantization method, probabilistic method, identification and classification device method etc.By the Speaker Recognition System principle of compositionality, Speaker Identification mainly comprises training and discerns two stages as shown in Figure 1.At first obtain primary speech signal, obtain clean voice signal through pre-service again, extract speech characteristic parameter then and realize speaker's training and discern through specific method more afterwards.Its speaker model adopts the phonetic feature sample after database storing is handled by special algorithm in a large number usually, and voice to be identified through after pre-service and the feature extraction with database in sample set mate and calculate the back and realize differentiation.

Any single method had both had superiority also has limitation, and the more of research at present is the mixing recognition methods that two or more method is combined.The VQ technology is a kind of data compression and coding techniques; SVM is based on the machine learning method of statistical theory.These two kinds of methods have complementarity, and the advantage of vector quantization (VQ) method is that the large sample sort feature is sent out, and model quantity is few, and the training time is short, and the identification response is very fast, and shortcoming is to solve nonlinear problem, and noiseproof feature is poor; The advantage of SVMs (SVM) method is that the small sample classification is better, in solving non-linear and higher-dimension pattern recognition problem, shows distinctive advantage, and shortcoming is that training algorithm complicacy and training speed are slow, is difficult to handle big-sample data.Carried out Speaker Identification though once there are two kinds of methods of VQ-SVM to mix, all kinds of computings all realize on the MATLAB platform usually.Therefore, if also can only adopt serial mode to carry out with multiple distinct methods.Equally, two kinds of methods of existing VQ-SVM mix that to carry out Speaker Identification be to discern for the first time with a kind of method earlier to carry out the so-called serial that secondary discerns with another kind of method and discern too again.Be not difficult to find that the greatest weakness of this serial recognition methods is promptly to take machine resources to waste recognition time again.

Three, summary of the invention:

The purpose of this invention is to provide the parallel recognition system based on the speaker VQ-SVM of virtual instrument technique, it is used to solve two kinds of methods of existing VQ-SVM and mixes and carry out Speaker Identification and promptly take the problem that machine resources is wasted recognition time again.

The technical solution adopted for the present invention to solve the technical problems is: the parallel recognition system of this speaker VQ-SVM based on virtual instrument technique comprises voice pretreatment unit, feature extraction unit, speaker model unit, recognition unit, LabVIEW virtual instrument platform; On virtual instrument platform, realize a large program is divided into each little module through the sub-VI of LabVIEW; The program part that calls the MATLAB node that relates in the program all is written as each sub-VI, through calling the structure that this a little VI realizes system;

Adopt the VQ algorithm, set up the VQ model, initial codebook adopts disintegrating method, and the centre of form of selected characteristic vector realizes foundation and the storage of speaker model through calling the MATLAB node as inceptive code book in LabVIEW, and algorithmic formula is following:

Total distortion:

Calculate new code word:

In the formula: the number of vector in

-set, the barycenter of all vectors among

- ;

Relative distortion improvement amount:

Adopt the SVM algorithm; Set up the SVM model; Select for use radially basic kernel function to set up speaker's model; Its algorithmic formula is following:

,

;

Export recognition result in decision section as a result on through the Speaker Identification front panel in the recognition unit, when the result of VQ, two kinds of recognition methodss of SVM is inconsistent, export the recognition result of this method as correct result as long as there is a kind of method just to discern; When the coming to the same thing of two kinds of methods, on the Speaker Identification front panel, export recognition result, correct identification is with the green light indication, and nonrecognition is indicated with red light.

Feature extraction unit adopts Mei Er frequency cepstral coefficient MFCC and first order difference thereof as the characteristic parameter of identification in the such scheme, realizes the extraction of characteristic parameter through under the MATLAB7.0 environment, programming, and concrete parameter is set to: frame length 512; Frame moves 256; The number of wave filter is 12, SF 44100Hz, and removed each two frame of head and the tail; Because the first order difference of this two frame is zero, so just obtained the speech feature vector of 24 dimensions.

Beneficial effect:

1, the present invention overcomes when Speaker Identification is carried out in two kinds of methods mixing of existing VQ-SVM needs serial identification to lose time; Proposition concentrates on the parallel identification processing of realization on the same platform with VQ and two kinds of methods of SVM, thereby under the prerequisite of the recognition effect that improves total system, saves recognition time.

2, the present invention combines two kinds of recognition methodss on the virtual instrument technique platform and carries out the parallel identification of speaker.Under small sampling condition, the SVM method is superior to the VQ method; Along with increasing of sample, the recognition performance of SVM is on a declining curve, and the recognition performance of VQ method is on the rise, has so just made full use of two kinds of complementarity that method is had on sample number, thereby can improve the overall performance of system.

Four, description of drawings:

Fig. 1 is Speaker Recognition System principle of compositionality figure;

Fig. 2 is a structural representation of the present invention;

Fig. 3 is the synoptic diagram of Speaker Identification front panel among the present invention;

Fig. 4 is the LBG algorithm flow chart.

1 Speaker Identification front panel, 2 lamps.

Five, embodiment:

Below in conjunction with accompanying drawing the present invention is done further explanation:

In conjunction with Fig. 2, shown in Figure 3; The parallel recognition system of this speaker VQ-SVM based on virtual instrument technique comprises voice pretreatment unit, feature extraction unit, speaker model unit, recognition unit, LabVIEW virtual instrument platform; On virtual instrument platform, realize a large program is divided into each little module through the sub-VI of LabVIEW; The program part that calls the MATLAB node that relates in the program all is written as each sub-VI, through calling the structure that this a little VI realizes system.In order to realize the parallel identification of VQ and SVM, in view of LabVIEW can realize the characteristics of multitask, multithreading, based on virtual instrument technique and merge speaker Recognition Technology, by the LabVIEW management with call the processing that MATLAB realizes system.When decision section as a result of the present invention is selected, when the result of two kinds of recognition methodss is inconsistent, export the recognition result of this method as correct result as long as there is a kind of method just to discern.When the coming to the same thing of two kinds of methods, speaker's front panel 1 output recognition result, correct identification is with green light 2 indications, and nonrecognition is indicated with red light 2.The performance of system is more as shown in table 4.

Key issue in the Speaker Recognition System is the extraction of speech characteristic parameter and the foundation of speaker model, selects Mei Er frequency cepstral coefficient (MFCC) and first order difference thereof the characteristic parameter as identification for use.The MFCC parameter has only reflected the static characteristics of speech parameter, and people's ear is more responsive to the behavioral characteristics of voice, and the parameter of reflection voice dynamic change is exactly the difference cepstrum.Realize the extraction of characteristic parameter through programming under the MATLAB7.0 environment; Concrete parameter is set to: frame length 512, and frame moves 256, and the number of wave filter is 12; SF 44100Hz; And removed each two frame of head and the tail, because the first order difference of this two frame is zero, so just obtained the speech feature vector of 24 dimensions.

One of algorithm that the present invention adopts is VQ; The method of setting up VQ model (code book) has multiple; Basically the most the most frequently used method is the LBG algorithm, and this algorithm is realized the generation of optimum code book through trained vector collection and certain iterative algorithm, and algorithm flow is as shown in Figure 4.Initial codebook adopts disintegrating method, and the centre of form (barycenter) of selected characteristic vector is as inceptive code book.Choose code book capacity =16 through experiment; Distortion threshold value

=0.01; Maximum iteration time

=log has obtained recognition effect preferably.These work realize through programming under the MATLAB7.0 environment, then in LabVIEW through calling foundation and the storage that the MATLAB node comes implementation model.The formula that wherein relates to core algorithm is following:

Total distortion:

Calculate new code word:

(wherein: the number of vector in

-set, the barycenter of all vectors among -

)

Relative distortion improvement amount:

Another algorithm that the present invention adopts is SVM, and its key is that kernel function is chosen and Parameter Optimization.

Radially basic kernel function:

,

For selected suitable kernel function, arbitrarily chosen 10 boy students, 10 schoolgirls in the practice as sample, carried out the contrast experiment with RBF and two kinds of kernel functions of Poly, specifically referring to table 1, table 2.Can find out under identical condition, select for use the experiment effect of RBF better.Lack than the parameter of Poly kernel function in view of the parameter of previous experiments basis and RBF kernel function, comprehensively select for use radially basic kernel function to set up speaker's model.The kernel function parameter is a key factor that influences the support vector machine classifier performance; Thereby the kernel function parameter optimization is most important; Method commonly used at present is exactly to let C and γ value in certain scope; Utilize training set cross validation method to obtain verifying classification accuracy as raw data set for getting fixed parameter, finally get that the highest group parameter of training set checking classification accuracy as optimal parameter, this method is the thought of grid search method just.Utilize grid.py program below the sub-directory under the python under the libsvm tool box just can realize the optimizing of parameters C and γ, the parameter optimization sectional drawing of 20 people, 5 frame data is as shown in Figure 4.

10 schoolgirl's experimental results of table 1

Frame number	(C, γ)	(Degree, Coeff)	Discrimination % (RBF)	Discrimination % (Poly)	RBF recognition time (s)	Poly recognition time (s)
							1	(0.25,0.25)	(3,1)	100	100	0.01	0.02
3	(0.25,0.25)	(3,1)	96.67	96.67	0.02	0.01
							5	(0.25,0.25)	(3,1)	100	98.00	0.02	0.02
7	(0.25,0.25)	(3,1)	100	98.57	0.02	0.01
							10	(0.25,0.25)	(3,1)	97.00	95.00	0.02	0.02
15	(4.00,1.00)	(3,1)	96.00	96.00	0.03	0.02
							20	(4.00,1.00)	(3,1)	96.00	97.00	0.03	0.03
30	(4.00,1.00)	(3,1)	93.00	93.33	0.08	0.08

10 boy student's experimental results of table 2

Frame number	(C, γ)	(Degree, Coeff)	Discrimination % (RBF)	Discrimination % (Poly)	RBF recognition time (s)	Poly recognition time (s)
							1	(1.32,0.76)	(3,1)	90.00	90.00	0.01	0.01
3	(1.00,1.00)	(3,1)	93.33	93.33	0.02	0.02
							5	(4.00,1.00)	(3,1)	96.00	94.00	0.02	0.01
7	(4.00,1.00)	(3,1)	95.71	90.00	0.01	0.02
							10	(4 .00,1.00)	(3,1)	92.00	84.00	0.02	0.02
15	(4.00,1.00)	(3,1)	86.67	84.00	0.05	0.03
							20	(4.00,1.00)	(3,1)	87.00	85.00	0.05	0.05
30	(4.00,1.00)	(3,1)	84.67	85.33	0.11	0.09

Through above analysis and experiment; The present invention carries out the structure of Speaker Recognition System on virtual instrument platform of single method earlier; And debug to reach recognition effect preferably, then these two kinds of methods are merged the establishment of Parallel Implementation based on the Speaker Recognition System of virtual instrument technique.In the process of system building, made full use of modular thought, a large program is divided into each little module realizes, not only simplify program but also increased the readability of program.Promptly on virtual instrument platform, realize, the program part that calls the MATLAB node that relates in the program all is written as each sub-VI, through calling the structure that this a little VI realizes system through the sub-VI of LabVIEW.System's front panel is as shown in Figure 3.

Experimental verification and interpretation of result

Speaker's voice are under the common lab environment, and the sound-track engraving apparatus recording that carries through computer obtains.Language material is SF 22050Hz, 16 monaural voice signals, and file type is the wav form.This paper has chosen 30 people as the speaker, and everyone records 20 sections voice, and (every section 2～4s) as the sample storehouse, and all irrelevant with sample, preceding 10 sections are used to set up speaker model, and back 10 sections are used for test.Through the test to 20 people and random 5 times and the 10 times voice that extract of 30 people, the experimental result that obtains is as shown in table 3.Therefrom analyze; Along with the VQ method that increases of sample size is superior to the SVM method, just explained that also the SVM method has the classification advantage on small sample, and the VQ method has the classification advantage on large sample; Can infer along with sample size increase in addition thousands of, the VQ method will be superior to the SVM method.Selected when the result of two kinds of recognition methodss is inconsistent in decision section as a result of the present invention, export the recognition result of this method as correct result as long as there is a kind of method just to discern.When the coming to the same thing of two kinds of methods, speaker's front panel 1 output recognition result, correct identification is with green light 2 indications, and nonrecognition is indicated with red light 2.The performance of system is more as shown in table 4.Therefrom can find out, the performance of system is improved,, increase seldom though recognition time increases to some extent with two kinds of method Parallel Implementation Speaker Identification.

The contrast of table 3 experimental result

Table 4 system performance relatively

Recognition methods	Discrimination (%)	Misclassification rate (%)	Recognition time (s)
				VQ	94.88	10.09	0.06
SVM	97.38	9.71	0.08
				VQ-SVM	98.54	5.28	0.15

Conclusion

The present invention combines two kinds of recognition methodss on the virtual instrument technique platform and carries out the parallel identification of speaker.Under small sampling condition, the SVM method is superior to the VQ method; Along with increasing of sample, the recognition performance of SVM is on a declining curve, and the recognition performance of VQ method is on the rise, has so just made full use of two kinds of complementarity that method is had on sample number, thereby can improve the overall performance of system.

Claims

1. one kind based on the parallel recognition system of the speaker VQ-SVM of virtual instrument technique; It is characterized in that: the parallel recognition system of this speaker VQ-SVM based on virtual instrument technique comprises voice pretreatment unit, feature extraction unit, speaker model unit, recognition unit, LabVIEW virtual instrument platform; On virtual instrument platform, realize a large program is divided into each little module through the sub-VI of LabVIEW; The program part that calls the MATLAB node that relates in the program all is written as each sub-VI, through calling the structure that this a little VI realizes system;

Total distortion:

Figure 201210008213X100001DEST_PATH_IMAGE002

Calculate new code word:

Figure 201210008213X100001DEST_PATH_IMAGE004

In the formula: the number of vector in

Figure 201210008213X100001DEST_PATH_IMAGE006

-set, the barycenter of all vectors among -

Figure 201210008213X100001DEST_PATH_IMAGE010

;

Relative distortion improvement amount:

Figure 201210008213X100001DEST_PATH_IMAGE012

Figure 201210008213X100001DEST_PATH_IMAGE014

,

Figure 201210008213X100001DEST_PATH_IMAGE016

;

2. the parallel recognition system of the speaker VQ-SVM based on virtual instrument technique according to claim 1; It is characterized in that: described feature extraction unit adopts Mei Er frequency cepstral coefficient MFCC and first order difference thereof as the characteristic parameter of identification, realizes the extraction of characteristic parameter through under the MATLAB7.0 environment, programming, and concrete parameter is set to: frame length 512; Frame moves 256; The number of wave filter is 12, SF 44100Hz, and removed each two frame of head and the tail; Because the first order difference of this two frame is zero, so just obtained the speech feature vector of 24 dimensions.