CN102543075A - Speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on virtual instrument technology - Google Patents

Speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on virtual instrument technology Download PDF

Info

Publication number
CN102543075A
CN102543075A CN201210008213XA CN201210008213A CN102543075A CN 102543075 A CN102543075 A CN 102543075A CN 201210008213X A CN201210008213X A CN 201210008213XA CN 201210008213 A CN201210008213 A CN 201210008213A CN 102543075 A CN102543075 A CN 102543075A
Authority
CN
China
Prior art keywords
speaker
svm
virtual instrument
identification
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201210008213XA
Other languages
Chinese (zh)
Inventor
刘祥楼
吴香艳
张明
姜继玉
刘昭廷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeast Petroleum University
Original Assignee
Northeast Petroleum University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Petroleum University filed Critical Northeast Petroleum University
Priority to CN201210008213XA priority Critical patent/CN102543075A/en
Publication of CN102543075A publication Critical patent/CN102543075A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention relates to a speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on a virtual instrument technology, and the system comprises a voice pre-processing unit, a characteristic extracting unit, a speaker model unit, an identification unit and a LabVIEW virtual instrument platform; a large program is divided into various small modules through LabVIEW sub VIs on the virtual instrument platform; a part in the program which involves the transfer of an MATLAB (Matrix Laboratory) node is written into various sub Vis; and the system is constructed by transferring the sub Vis. According to the speaker VQ-SVM parallel identification system based on the virtual instrument technology, the defects of an existing VQ-SVM that two methods are mixed to identify the speaker, series identification needs to be carried out and time is wasted are overcome, the invention puts forward to centralize the VQ and SVM methods onto the same platform and carry out parallel identification, so that the identification time is saved on the premise of improving the identification effect of the whole system.

Description

Based on the parallel recognition system of the speaker VQ-SVM of virtual instrument technique
  
One, technical field:
What the present invention relates to is signal Processing and area of pattern recognition, the speaker VQ-SVM that is based on virtual instrument technique that the is specifically related to recognition system that walks abreast.
Two, background technology:
Speaker Identification is to reach the purpose that identifies speaker ' s identity through the phonetic feature of analyzing the speaker.Method for distinguishing speek person mainly comprises vector quantization method, probabilistic method, identification and classification device method etc.By the Speaker Recognition System principle of compositionality, Speaker Identification mainly comprises training and discerns two stages as shown in Figure 1.At first obtain primary speech signal, obtain clean voice signal through pre-service again, extract speech characteristic parameter then and realize speaker's training and discern through specific method more afterwards.Its speaker model adopts the phonetic feature sample after database storing is handled by special algorithm in a large number usually, and voice to be identified through after pre-service and the feature extraction with database in sample set mate and calculate the back and realize differentiation.
Any single method had both had superiority also has limitation, and the more of research at present is the mixing recognition methods that two or more method is combined.The VQ technology is a kind of data compression and coding techniques; SVM is based on the machine learning method of statistical theory.These two kinds of methods have complementarity, and the advantage of vector quantization (VQ) method is that the large sample sort feature is sent out, and model quantity is few, and the training time is short, and the identification response is very fast, and shortcoming is to solve nonlinear problem, and noiseproof feature is poor; The advantage of SVMs (SVM) method is that the small sample classification is better, in solving non-linear and higher-dimension pattern recognition problem, shows distinctive advantage, and shortcoming is that training algorithm complicacy and training speed are slow, is difficult to handle big-sample data.Carried out Speaker Identification though once there are two kinds of methods of VQ-SVM to mix, all kinds of computings all realize on the MATLAB platform usually.Therefore, if also can only adopt serial mode to carry out with multiple distinct methods.Equally, two kinds of methods of existing VQ-SVM mix that to carry out Speaker Identification be to discern for the first time with a kind of method earlier to carry out the so-called serial that secondary discerns with another kind of method and discern too again.Be not difficult to find that the greatest weakness of this serial recognition methods is promptly to take machine resources to waste recognition time again.
Three, summary of the invention:
The purpose of this invention is to provide the parallel recognition system based on the speaker VQ-SVM of virtual instrument technique, it is used to solve two kinds of methods of existing VQ-SVM and mixes and carry out Speaker Identification and promptly take the problem that machine resources is wasted recognition time again.
The technical solution adopted for the present invention to solve the technical problems is: the parallel recognition system of this speaker VQ-SVM based on virtual instrument technique comprises voice pretreatment unit, feature extraction unit, speaker model unit, recognition unit, LabVIEW virtual instrument platform; On virtual instrument platform, realize a large program is divided into each little module through the sub-VI of LabVIEW; The program part that calls the MATLAB node that relates in the program all is written as each sub-VI, through calling the structure that this a little VI realizes system;
Adopt the VQ algorithm, set up the VQ model, initial codebook adopts disintegrating method, and the centre of form of selected characteristic vector realizes foundation and the storage of speaker model through calling the MATLAB node as inceptive code book in LabVIEW, and algorithmic formula is following:
Total distortion:
Figure 999366DEST_PATH_IMAGE001
Calculate new code word:
Figure 950004DEST_PATH_IMAGE002
In the formula: the number of vector in
Figure 770193DEST_PATH_IMAGE003
-set, the barycenter of all vectors among
Figure 122677DEST_PATH_IMAGE004
- ;
Relative distortion improvement amount:
Figure 690110DEST_PATH_IMAGE006
Adopt the SVM algorithm; Set up the SVM model; Select for use radially basic kernel function to set up speaker's model; Its algorithmic formula is following:
Figure 48410DEST_PATH_IMAGE007
,
Figure 520980DEST_PATH_IMAGE008
;
Export recognition result in decision section as a result on through the Speaker Identification front panel in the recognition unit, when the result of VQ, two kinds of recognition methodss of SVM is inconsistent, export the recognition result of this method as correct result as long as there is a kind of method just to discern; When the coming to the same thing of two kinds of methods, on the Speaker Identification front panel, export recognition result, correct identification is with the green light indication, and nonrecognition is indicated with red light.
Feature extraction unit adopts Mei Er frequency cepstral coefficient MFCC and first order difference thereof as the characteristic parameter of identification in the such scheme, realizes the extraction of characteristic parameter through under the MATLAB7.0 environment, programming, and concrete parameter is set to: frame length 512; Frame moves 256; The number of wave filter is 12, SF 44100Hz, and removed each two frame of head and the tail; Because the first order difference of this two frame is zero, so just obtained the speech feature vector of 24 dimensions.
Beneficial effect:
1, the present invention overcomes when Speaker Identification is carried out in two kinds of methods mixing of existing VQ-SVM needs serial identification to lose time; Proposition concentrates on the parallel identification processing of realization on the same platform with VQ and two kinds of methods of SVM, thereby under the prerequisite of the recognition effect that improves total system, saves recognition time.
2, the present invention combines two kinds of recognition methodss on the virtual instrument technique platform and carries out the parallel identification of speaker.Under small sampling condition, the SVM method is superior to the VQ method; Along with increasing of sample, the recognition performance of SVM is on a declining curve, and the recognition performance of VQ method is on the rise, has so just made full use of two kinds of complementarity that method is had on sample number, thereby can improve the overall performance of system.
Four, description of drawings:
Fig. 1 is Speaker Recognition System principle of compositionality figure;
Fig. 2 is a structural representation of the present invention;
Fig. 3 is the synoptic diagram of Speaker Identification front panel among the present invention;
Fig. 4 is the LBG algorithm flow chart.
1 Speaker Identification front panel, 2 lamps.
Five, embodiment:
Below in conjunction with accompanying drawing the present invention is done further explanation:
In conjunction with Fig. 2, shown in Figure 3; The parallel recognition system of this speaker VQ-SVM based on virtual instrument technique comprises voice pretreatment unit, feature extraction unit, speaker model unit, recognition unit, LabVIEW virtual instrument platform; On virtual instrument platform, realize a large program is divided into each little module through the sub-VI of LabVIEW; The program part that calls the MATLAB node that relates in the program all is written as each sub-VI, through calling the structure that this a little VI realizes system.In order to realize the parallel identification of VQ and SVM, in view of LabVIEW can realize the characteristics of multitask, multithreading, based on virtual instrument technique and merge speaker Recognition Technology, by the LabVIEW management with call the processing that MATLAB realizes system.When decision section as a result of the present invention is selected, when the result of two kinds of recognition methodss is inconsistent, export the recognition result of this method as correct result as long as there is a kind of method just to discern.When the coming to the same thing of two kinds of methods, speaker's front panel 1 output recognition result, correct identification is with green light 2 indications, and nonrecognition is indicated with red light 2.The performance of system is more as shown in table 4.
Key issue in the Speaker Recognition System is the extraction of speech characteristic parameter and the foundation of speaker model, selects Mei Er frequency cepstral coefficient (MFCC) and first order difference thereof the characteristic parameter as identification for use.The MFCC parameter has only reflected the static characteristics of speech parameter, and people's ear is more responsive to the behavioral characteristics of voice, and the parameter of reflection voice dynamic change is exactly the difference cepstrum.Realize the extraction of characteristic parameter through programming under the MATLAB7.0 environment; Concrete parameter is set to: frame length 512, and frame moves 256, and the number of wave filter is 12; SF 44100Hz; And removed each two frame of head and the tail, because the first order difference of this two frame is zero, so just obtained the speech feature vector of 24 dimensions.
One of algorithm that the present invention adopts is VQ; The method of setting up VQ model (code book) has multiple; Basically the most the most frequently used method is the LBG algorithm, and this algorithm is realized the generation of optimum code book through trained vector collection and certain iterative algorithm, and algorithm flow is as shown in Figure 4.Initial codebook adopts disintegrating method, and the centre of form (barycenter) of selected characteristic vector is as inceptive code book.Choose code book capacity =16 through experiment; Distortion threshold value
Figure 740751DEST_PATH_IMAGE010
=0.01; Maximum iteration time
Figure 902742DEST_PATH_IMAGE011
=log has obtained recognition effect preferably.These work realize through programming under the MATLAB7.0 environment, then in LabVIEW through calling foundation and the storage that the MATLAB node comes implementation model.The formula that wherein relates to core algorithm is following:
Total distortion:
Figure 435541DEST_PATH_IMAGE001
Calculate new code word:
Figure 848068DEST_PATH_IMAGE002
(wherein: the number of vector in
Figure 813750DEST_PATH_IMAGE003
-set, the barycenter of all vectors among -
Figure 371956DEST_PATH_IMAGE005
)
Relative distortion improvement amount:
Figure 271779DEST_PATH_IMAGE006
Another algorithm that the present invention adopts is SVM, and its key is that kernel function is chosen and Parameter Optimization.
Radially basic kernel function:
Figure 775572DEST_PATH_IMAGE007
,
Figure 264191DEST_PATH_IMAGE008
For selected suitable kernel function, arbitrarily chosen 10 boy students, 10 schoolgirls in the practice as sample, carried out the contrast experiment with RBF and two kinds of kernel functions of Poly, specifically referring to table 1, table 2.Can find out under identical condition, select for use the experiment effect of RBF better.Lack than the parameter of Poly kernel function in view of the parameter of previous experiments basis and RBF kernel function, comprehensively select for use radially basic kernel function to set up speaker's model.The kernel function parameter is a key factor that influences the support vector machine classifier performance; Thereby the kernel function parameter optimization is most important; Method commonly used at present is exactly to let C and γ value in certain scope; Utilize training set cross validation method to obtain verifying classification accuracy as raw data set for getting fixed parameter, finally get that the highest group parameter of training set checking classification accuracy as optimal parameter, this method is the thought of grid search method just.Utilize grid.py program below the sub-directory under the python under the libsvm tool box just can realize the optimizing of parameters C and γ, the parameter optimization sectional drawing of 20 people, 5 frame data is as shown in Figure 4.
  
10 schoolgirl's experimental results of table 1
Frame number (C, γ) (Degree, Coeff) Discrimination % (RBF) Discrimination % (Poly) RBF recognition time (s) Poly recognition time (s)
1 (0.25,0.25) (3,1) 100 100 0.01 0.02
3 (0.25,0.25) (3,1) 96.67 96.67 0.02 0.01
5 (0.25,0.25) (3,1) 100 98.00 0.02 0.02
7 (0.25,0.25) (3,1) 100 98.57 0.02 0.01
10 (0.25,0.25) (3,1) 97.00 95.00 0.02 0.02
15 (4.00,1.00) (3,1) 96.00 96.00 0.03 0.02
20 (4.00,1.00) (3,1) 96.00 97.00 0.03 0.03
30 (4.00,1.00) (3,1) 93.00 93.33 0.08 0.08
10 boy student's experimental results of table 2
Frame number (C, γ) (Degree, Coeff) Discrimination % (RBF) Discrimination % (Poly) RBF recognition time (s) Poly recognition time (s)
1 (1.32,0.76) (3,1) 90.00 90.00 0.01 0.01
3 (1.00,1.00) (3,1) 93.33 93.33 0.02 0.02
5 (4.00,1.00) (3,1) 96.00 94.00 0.02 0.01
7 (4.00,1.00) (3,1) 95.71 90.00 0.01 0.02
10 (4 .00,1.00) (3,1) 92.00 84.00 0.02 0.02
15 (4.00,1.00) (3,1) 86.67 84.00 0.05 0.03
20 (4.00,1.00) (3,1) 87.00 85.00 0.05 0.05
30 (4.00,1.00) (3,1) 84.67 85.33 0.11 0.09
Through above analysis and experiment; The present invention carries out the structure of Speaker Recognition System on virtual instrument platform of single method earlier; And debug to reach recognition effect preferably, then these two kinds of methods are merged the establishment of Parallel Implementation based on the Speaker Recognition System of virtual instrument technique.In the process of system building, made full use of modular thought, a large program is divided into each little module realizes, not only simplify program but also increased the readability of program.Promptly on virtual instrument platform, realize, the program part that calls the MATLAB node that relates in the program all is written as each sub-VI, through calling the structure that this a little VI realizes system through the sub-VI of LabVIEW.System's front panel is as shown in Figure 3.
Experimental verification and interpretation of result
Speaker's voice are under the common lab environment, and the sound-track engraving apparatus recording that carries through computer obtains.Language material is SF 22050Hz, 16 monaural voice signals, and file type is the wav form.This paper has chosen 30 people as the speaker, and everyone records 20 sections voice, and (every section 2~4s) as the sample storehouse, and all irrelevant with sample, preceding 10 sections are used to set up speaker model, and back 10 sections are used for test.Through the test to 20 people and random 5 times and the 10 times voice that extract of 30 people, the experimental result that obtains is as shown in table 3.Therefrom analyze; Along with the VQ method that increases of sample size is superior to the SVM method, just explained that also the SVM method has the classification advantage on small sample, and the VQ method has the classification advantage on large sample; Can infer along with sample size increase in addition thousands of, the VQ method will be superior to the SVM method.Selected when the result of two kinds of recognition methodss is inconsistent in decision section as a result of the present invention, export the recognition result of this method as correct result as long as there is a kind of method just to discern.When the coming to the same thing of two kinds of methods, speaker's front panel 1 output recognition result, correct identification is with green light 2 indications, and nonrecognition is indicated with red light 2.The performance of system is more as shown in table 4.Therefrom can find out, the performance of system is improved,, increase seldom though recognition time increases to some extent with two kinds of method Parallel Implementation Speaker Identification.
  
The contrast of table 3 experimental result
Figure 624766DEST_PATH_IMAGE013
Table 4 system performance relatively
Recognition methods Discrimination (%) Misclassification rate (%) Recognition time (s)
VQ 94.88 10.09 0.06
SVM 97.38 9.71 0.08
VQ-SVM 98.54 5.28 0.15
Conclusion
The present invention combines two kinds of recognition methodss on the virtual instrument technique platform and carries out the parallel identification of speaker.Under small sampling condition, the SVM method is superior to the VQ method; Along with increasing of sample, the recognition performance of SVM is on a declining curve, and the recognition performance of VQ method is on the rise, has so just made full use of two kinds of complementarity that method is had on sample number, thereby can improve the overall performance of system.

Claims (2)

1. one kind based on the parallel recognition system of the speaker VQ-SVM of virtual instrument technique; It is characterized in that: the parallel recognition system of this speaker VQ-SVM based on virtual instrument technique comprises voice pretreatment unit, feature extraction unit, speaker model unit, recognition unit, LabVIEW virtual instrument platform; On virtual instrument platform, realize a large program is divided into each little module through the sub-VI of LabVIEW; The program part that calls the MATLAB node that relates in the program all is written as each sub-VI, through calling the structure that this a little VI realizes system;
Adopt the VQ algorithm, set up the VQ model, initial codebook adopts disintegrating method, and the centre of form of selected characteristic vector realizes foundation and the storage of speaker model through calling the MATLAB node as inceptive code book in LabVIEW, and algorithmic formula is following:
Total distortion:
Figure 201210008213X100001DEST_PATH_IMAGE002
Calculate new code word:
Figure 201210008213X100001DEST_PATH_IMAGE004
In the formula: the number of vector in
Figure 201210008213X100001DEST_PATH_IMAGE006
-set, the barycenter of all vectors among -
Figure 201210008213X100001DEST_PATH_IMAGE010
;
Relative distortion improvement amount:
Figure 201210008213X100001DEST_PATH_IMAGE012
Adopt the SVM algorithm; Set up the SVM model; Select for use radially basic kernel function to set up speaker's model; Its algorithmic formula is following:
Figure 201210008213X100001DEST_PATH_IMAGE014
,
Figure 201210008213X100001DEST_PATH_IMAGE016
;
Export recognition result in decision section as a result on through the Speaker Identification front panel in the recognition unit, when the result of VQ, two kinds of recognition methodss of SVM is inconsistent, export the recognition result of this method as correct result as long as there is a kind of method just to discern; When the coming to the same thing of two kinds of methods, on the Speaker Identification front panel, export recognition result, correct identification is with the green light indication, and nonrecognition is indicated with red light.
2. the parallel recognition system of the speaker VQ-SVM based on virtual instrument technique according to claim 1; It is characterized in that: described feature extraction unit adopts Mei Er frequency cepstral coefficient MFCC and first order difference thereof as the characteristic parameter of identification, realizes the extraction of characteristic parameter through under the MATLAB7.0 environment, programming, and concrete parameter is set to: frame length 512; Frame moves 256; The number of wave filter is 12, SF 44100Hz, and removed each two frame of head and the tail; Because the first order difference of this two frame is zero, so just obtained the speech feature vector of 24 dimensions.
CN201210008213XA 2012-01-12 2012-01-12 Speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on virtual instrument technology Pending CN102543075A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210008213XA CN102543075A (en) 2012-01-12 2012-01-12 Speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on virtual instrument technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210008213XA CN102543075A (en) 2012-01-12 2012-01-12 Speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on virtual instrument technology

Publications (1)

Publication Number Publication Date
CN102543075A true CN102543075A (en) 2012-07-04

Family

ID=46349815

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210008213XA Pending CN102543075A (en) 2012-01-12 2012-01-12 Speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on virtual instrument technology

Country Status (1)

Country Link
CN (1) CN102543075A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107945787A (en) * 2017-11-21 2018-04-20 上海电机学院 A kind of acoustic control login management system and method based on virtual instrument technique

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030036905A1 (en) * 2001-07-25 2003-02-20 Yasuhiro Toguri Information detection apparatus and method, and information search apparatus and method
CN1588535A (en) * 2004-09-29 2005-03-02 上海交通大学 Automatic sound identifying treating method for embedded sound identifying system
JP2005345683A (en) * 2004-06-02 2005-12-15 Toshiba Tec Corp Speaker-recognizing device, program, and speaker-recognizing method
CN101640043A (en) * 2009-09-01 2010-02-03 清华大学 Speaker recognition method based on multi-coordinate sequence kernel and system thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030036905A1 (en) * 2001-07-25 2003-02-20 Yasuhiro Toguri Information detection apparatus and method, and information search apparatus and method
JP2005345683A (en) * 2004-06-02 2005-12-15 Toshiba Tec Corp Speaker-recognizing device, program, and speaker-recognizing method
CN1588535A (en) * 2004-09-29 2005-03-02 上海交通大学 Automatic sound identifying treating method for embedded sound identifying system
CN101640043A (en) * 2009-09-01 2010-02-03 清华大学 Speaker recognition method based on multi-coordinate sequence kernel and system thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
余洋: "基于LabVIEW的说话人识别系统开发", 《中国优秀硕士学位论文全文数据库》 *
刘祥楼等: "说话人识别中支持向量机核函数参数优化研究", 《科学技术与工程》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107945787A (en) * 2017-11-21 2018-04-20 上海电机学院 A kind of acoustic control login management system and method based on virtual instrument technique

Similar Documents

Publication Publication Date Title
CN107092596B (en) Text emotion analysis method based on attention CNNs and CCR
Soutner et al. Application of LSTM neural networks in language modelling
CN102722713B (en) Handwritten numeral recognition method based on lie group structure data and system thereof
CN101710490B (en) Method and device for compensating noise for voice assessment
CN106776574B (en) User comment text mining method and device
CN103177733B (en) Standard Chinese suffixation of a nonsyllabic "r" sound voice quality evaluating method and system
CN106156083A (en) A kind of domain knowledge processing method and processing device
CN107844559A (en) A kind of file classifying method, device and electronic equipment
CN104216876B (en) Information text filter method and system
CN107818164A (en) A kind of intelligent answer method and its system
CN104166643A (en) Dialogue act analyzing method in intelligent question-answering system
US10831993B2 (en) Method and apparatus for constructing binary feature dictionary
CN103474061A (en) Automatic distinguishing method based on integration of classifier for Chinese dialects
CN103077720B (en) Speaker identification method and system
CN103678278A (en) Chinese text emotion recognition method
CN104167208A (en) Speaker recognition method and device
CN101488150A (en) Real-time multi-view network focus event analysis apparatus and analysis method
Fang et al. Channel adversarial training for cross-channel text-independent speaker recognition
CN113066499B (en) Method and device for identifying identity of land-air conversation speaker
CN103514170A (en) Speech-recognition text classification method and device
CN106897290B (en) Method and device for establishing keyword model
CN104538035A (en) Speaker recognition method and system based on Fisher supervectors
CN109063478A (en) Method for detecting virus, device, equipment and the medium of transplantable executable file
Van Leeuwen Speaker linking in large data sets
CN109800309A (en) Classroom Discourse genre classification methods and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20120704