CN102543075A - Speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on virtual instrument technology - Google Patents
Speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on virtual instrument technology Download PDFInfo
- Publication number
- CN102543075A CN102543075A CN201210008213XA CN201210008213A CN102543075A CN 102543075 A CN102543075 A CN 102543075A CN 201210008213X A CN201210008213X A CN 201210008213XA CN 201210008213 A CN201210008213 A CN 201210008213A CN 102543075 A CN102543075 A CN 102543075A
- Authority
- CN
- China
- Prior art keywords
- speaker
- svm
- virtual instrument
- identification
- recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Complex Calculations (AREA)
Abstract
The invention relates to a speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on a virtual instrument technology, and the system comprises a voice pre-processing unit, a characteristic extracting unit, a speaker model unit, an identification unit and a LabVIEW virtual instrument platform; a large program is divided into various small modules through LabVIEW sub VIs on the virtual instrument platform; a part in the program which involves the transfer of an MATLAB (Matrix Laboratory) node is written into various sub Vis; and the system is constructed by transferring the sub Vis. According to the speaker VQ-SVM parallel identification system based on the virtual instrument technology, the defects of an existing VQ-SVM that two methods are mixed to identify the speaker, series identification needs to be carried out and time is wasted are overcome, the invention puts forward to centralize the VQ and SVM methods onto the same platform and carry out parallel identification, so that the identification time is saved on the premise of improving the identification effect of the whole system.
Description
One, technical field:
What the present invention relates to is signal Processing and area of pattern recognition, the speaker VQ-SVM that is based on virtual instrument technique that the is specifically related to recognition system that walks abreast.
Two, background technology:
Speaker Identification is to reach the purpose that identifies speaker ' s identity through the phonetic feature of analyzing the speaker.Method for distinguishing speek person mainly comprises vector quantization method, probabilistic method, identification and classification device method etc.By the Speaker Recognition System principle of compositionality, Speaker Identification mainly comprises training and discerns two stages as shown in Figure 1.At first obtain primary speech signal, obtain clean voice signal through pre-service again, extract speech characteristic parameter then and realize speaker's training and discern through specific method more afterwards.Its speaker model adopts the phonetic feature sample after database storing is handled by special algorithm in a large number usually, and voice to be identified through after pre-service and the feature extraction with database in sample set mate and calculate the back and realize differentiation.
Any single method had both had superiority also has limitation, and the more of research at present is the mixing recognition methods that two or more method is combined.The VQ technology is a kind of data compression and coding techniques; SVM is based on the machine learning method of statistical theory.These two kinds of methods have complementarity, and the advantage of vector quantization (VQ) method is that the large sample sort feature is sent out, and model quantity is few, and the training time is short, and the identification response is very fast, and shortcoming is to solve nonlinear problem, and noiseproof feature is poor; The advantage of SVMs (SVM) method is that the small sample classification is better, in solving non-linear and higher-dimension pattern recognition problem, shows distinctive advantage, and shortcoming is that training algorithm complicacy and training speed are slow, is difficult to handle big-sample data.Carried out Speaker Identification though once there are two kinds of methods of VQ-SVM to mix, all kinds of computings all realize on the MATLAB platform usually.Therefore, if also can only adopt serial mode to carry out with multiple distinct methods.Equally, two kinds of methods of existing VQ-SVM mix that to carry out Speaker Identification be to discern for the first time with a kind of method earlier to carry out the so-called serial that secondary discerns with another kind of method and discern too again.Be not difficult to find that the greatest weakness of this serial recognition methods is promptly to take machine resources to waste recognition time again.
Three, summary of the invention:
The purpose of this invention is to provide the parallel recognition system based on the speaker VQ-SVM of virtual instrument technique, it is used to solve two kinds of methods of existing VQ-SVM and mixes and carry out Speaker Identification and promptly take the problem that machine resources is wasted recognition time again.
The technical solution adopted for the present invention to solve the technical problems is: the parallel recognition system of this speaker VQ-SVM based on virtual instrument technique comprises voice pretreatment unit, feature extraction unit, speaker model unit, recognition unit, LabVIEW virtual instrument platform; On virtual instrument platform, realize a large program is divided into each little module through the sub-VI of LabVIEW; The program part that calls the MATLAB node that relates in the program all is written as each sub-VI, through calling the structure that this a little VI realizes system;
Adopt the VQ algorithm, set up the VQ model, initial codebook adopts disintegrating method, and the centre of form of selected characteristic vector realizes foundation and the storage of speaker model through calling the MATLAB node as inceptive code book in LabVIEW, and algorithmic formula is following:
Adopt the SVM algorithm; Set up the SVM model; Select for use radially basic kernel function to set up speaker's model; Its algorithmic formula is following:
,
;
Export recognition result in decision section as a result on through the Speaker Identification front panel in the recognition unit, when the result of VQ, two kinds of recognition methodss of SVM is inconsistent, export the recognition result of this method as correct result as long as there is a kind of method just to discern; When the coming to the same thing of two kinds of methods, on the Speaker Identification front panel, export recognition result, correct identification is with the green light indication, and nonrecognition is indicated with red light.
Feature extraction unit adopts Mei Er frequency cepstral coefficient MFCC and first order difference thereof as the characteristic parameter of identification in the such scheme, realizes the extraction of characteristic parameter through under the MATLAB7.0 environment, programming, and concrete parameter is set to: frame length 512; Frame moves 256; The number of wave filter is 12, SF 44100Hz, and removed each two frame of head and the tail; Because the first order difference of this two frame is zero, so just obtained the speech feature vector of 24 dimensions.
Beneficial effect:
1, the present invention overcomes when Speaker Identification is carried out in two kinds of methods mixing of existing VQ-SVM needs serial identification to lose time; Proposition concentrates on the parallel identification processing of realization on the same platform with VQ and two kinds of methods of SVM, thereby under the prerequisite of the recognition effect that improves total system, saves recognition time.
2, the present invention combines two kinds of recognition methodss on the virtual instrument technique platform and carries out the parallel identification of speaker.Under small sampling condition, the SVM method is superior to the VQ method; Along with increasing of sample, the recognition performance of SVM is on a declining curve, and the recognition performance of VQ method is on the rise, has so just made full use of two kinds of complementarity that method is had on sample number, thereby can improve the overall performance of system.
Four, description of drawings:
Fig. 1 is Speaker Recognition System principle of compositionality figure;
Fig. 2 is a structural representation of the present invention;
Fig. 3 is the synoptic diagram of Speaker Identification front panel among the present invention;
Fig. 4 is the LBG algorithm flow chart.
1 Speaker Identification front panel, 2 lamps.
Five, embodiment:
Below in conjunction with accompanying drawing the present invention is done further explanation:
In conjunction with Fig. 2, shown in Figure 3; The parallel recognition system of this speaker VQ-SVM based on virtual instrument technique comprises voice pretreatment unit, feature extraction unit, speaker model unit, recognition unit, LabVIEW virtual instrument platform; On virtual instrument platform, realize a large program is divided into each little module through the sub-VI of LabVIEW; The program part that calls the MATLAB node that relates in the program all is written as each sub-VI, through calling the structure that this a little VI realizes system.In order to realize the parallel identification of VQ and SVM, in view of LabVIEW can realize the characteristics of multitask, multithreading, based on virtual instrument technique and merge speaker Recognition Technology, by the LabVIEW management with call the processing that MATLAB realizes system.When decision section as a result of the present invention is selected, when the result of two kinds of recognition methodss is inconsistent, export the recognition result of this method as correct result as long as there is a kind of method just to discern.When the coming to the same thing of two kinds of methods, speaker's front panel 1 output recognition result, correct identification is with green light 2 indications, and nonrecognition is indicated with red light 2.The performance of system is more as shown in table 4.
Key issue in the Speaker Recognition System is the extraction of speech characteristic parameter and the foundation of speaker model, selects Mei Er frequency cepstral coefficient (MFCC) and first order difference thereof the characteristic parameter as identification for use.The MFCC parameter has only reflected the static characteristics of speech parameter, and people's ear is more responsive to the behavioral characteristics of voice, and the parameter of reflection voice dynamic change is exactly the difference cepstrum.Realize the extraction of characteristic parameter through programming under the MATLAB7.0 environment; Concrete parameter is set to: frame length 512, and frame moves 256, and the number of wave filter is 12; SF 44100Hz; And removed each two frame of head and the tail, because the first order difference of this two frame is zero, so just obtained the speech feature vector of 24 dimensions.
One of algorithm that the present invention adopts is VQ; The method of setting up VQ model (code book) has multiple; Basically the most the most frequently used method is the LBG algorithm, and this algorithm is realized the generation of optimum code book through trained vector collection and certain iterative algorithm, and algorithm flow is as shown in Figure 4.Initial codebook adopts disintegrating method, and the centre of form (barycenter) of selected characteristic vector is as inceptive code book.Choose code book capacity
=16 through experiment; Distortion threshold value
=0.01; Maximum iteration time
=log
has obtained recognition effect preferably.These work realize through programming under the MATLAB7.0 environment, then in LabVIEW through calling foundation and the storage that the MATLAB node comes implementation model.The formula that wherein relates to core algorithm is following:
Another algorithm that the present invention adopts is SVM, and its key is that kernel function is chosen and Parameter Optimization.
For selected suitable kernel function, arbitrarily chosen 10 boy students, 10 schoolgirls in the practice as sample, carried out the contrast experiment with RBF and two kinds of kernel functions of Poly, specifically referring to table 1, table 2.Can find out under identical condition, select for use the experiment effect of RBF better.Lack than the parameter of Poly kernel function in view of the parameter of previous experiments basis and RBF kernel function, comprehensively select for use radially basic kernel function to set up speaker's model.The kernel function parameter is a key factor that influences the support vector machine classifier performance; Thereby the kernel function parameter optimization is most important; Method commonly used at present is exactly to let C and γ value in certain scope; Utilize training set cross validation method to obtain verifying classification accuracy as raw data set for getting fixed parameter, finally get that the highest group parameter of training set checking classification accuracy as optimal parameter, this method is the thought of grid search method just.Utilize grid.py program below the sub-directory under the python under the libsvm tool box just can realize the optimizing of parameters C and γ, the parameter optimization sectional drawing of 20 people, 5 frame data is as shown in Figure 4.
10 schoolgirl's experimental results of table 1
Frame number | (C, γ) | (Degree, Coeff) | Discrimination % (RBF) | Discrimination % (Poly) | RBF recognition time (s) | Poly recognition time (s) |
1 | (0.25,0.25) | (3,1) | 100 | 100 | 0.01 | 0.02 |
3 | (0.25,0.25) | (3,1) | 96.67 | 96.67 | 0.02 | 0.01 |
5 | (0.25,0.25) | (3,1) | 100 | 98.00 | 0.02 | 0.02 |
7 | (0.25,0.25) | (3,1) | 100 | 98.57 | 0.02 | 0.01 |
10 | (0.25,0.25) | (3,1) | 97.00 | 95.00 | 0.02 | 0.02 |
15 | (4.00,1.00) | (3,1) | 96.00 | 96.00 | 0.03 | 0.02 |
20 | (4.00,1.00) | (3,1) | 96.00 | 97.00 | 0.03 | 0.03 |
30 | (4.00,1.00) | (3,1) | 93.00 | 93.33 | 0.08 | 0.08 |
10 boy student's experimental results of table 2
Frame number | (C, γ) | (Degree, Coeff) | Discrimination % (RBF) | Discrimination % (Poly) | RBF recognition time (s) | Poly recognition time (s) |
1 | (1.32,0.76) | (3,1) | 90.00 | 90.00 | 0.01 | 0.01 |
3 | (1.00,1.00) | (3,1) | 93.33 | 93.33 | 0.02 | 0.02 |
5 | (4.00,1.00) | (3,1) | 96.00 | 94.00 | 0.02 | 0.01 |
7 | (4.00,1.00) | (3,1) | 95.71 | 90.00 | 0.01 | 0.02 |
10 | (4 .00,1.00) | (3,1) | 92.00 | 84.00 | 0.02 | 0.02 |
15 | (4.00,1.00) | (3,1) | 86.67 | 84.00 | 0.05 | 0.03 |
20 | (4.00,1.00) | (3,1) | 87.00 | 85.00 | 0.05 | 0.05 |
30 | (4.00,1.00) | (3,1) | 84.67 | 85.33 | 0.11 | 0.09 |
Through above analysis and experiment; The present invention carries out the structure of Speaker Recognition System on virtual instrument platform of single method earlier; And debug to reach recognition effect preferably, then these two kinds of methods are merged the establishment of Parallel Implementation based on the Speaker Recognition System of virtual instrument technique.In the process of system building, made full use of modular thought, a large program is divided into each little module realizes, not only simplify program but also increased the readability of program.Promptly on virtual instrument platform, realize, the program part that calls the MATLAB node that relates in the program all is written as each sub-VI, through calling the structure that this a little VI realizes system through the sub-VI of LabVIEW.System's front panel is as shown in Figure 3.
Experimental verification and interpretation of result
Speaker's voice are under the common lab environment, and the sound-track engraving apparatus recording that carries through computer obtains.Language material is SF 22050Hz, 16 monaural voice signals, and file type is the wav form.This paper has chosen 30 people as the speaker, and everyone records 20 sections voice, and (every section 2~4s) as the sample storehouse, and all irrelevant with sample, preceding 10 sections are used to set up speaker model, and back 10 sections are used for test.Through the test to 20 people and random 5 times and the 10 times voice that extract of 30 people, the experimental result that obtains is as shown in table 3.Therefrom analyze; Along with the VQ method that increases of sample size is superior to the SVM method, just explained that also the SVM method has the classification advantage on small sample, and the VQ method has the classification advantage on large sample; Can infer along with sample size increase in addition thousands of, the VQ method will be superior to the SVM method.Selected when the result of two kinds of recognition methodss is inconsistent in decision section as a result of the present invention, export the recognition result of this method as correct result as long as there is a kind of method just to discern.When the coming to the same thing of two kinds of methods, speaker's front panel 1 output recognition result, correct identification is with green light 2 indications, and nonrecognition is indicated with red light 2.The performance of system is more as shown in table 4.Therefrom can find out, the performance of system is improved,, increase seldom though recognition time increases to some extent with two kinds of method Parallel Implementation Speaker Identification.
The contrast of table 3 experimental result
Table 4 system performance relatively
Recognition methods | Discrimination (%) | Misclassification rate (%) | Recognition time (s) |
VQ | 94.88 | 10.09 | 0.06 |
SVM | 97.38 | 9.71 | 0.08 |
VQ-SVM | 98.54 | 5.28 | 0.15 |
Conclusion
The present invention combines two kinds of recognition methodss on the virtual instrument technique platform and carries out the parallel identification of speaker.Under small sampling condition, the SVM method is superior to the VQ method; Along with increasing of sample, the recognition performance of SVM is on a declining curve, and the recognition performance of VQ method is on the rise, has so just made full use of two kinds of complementarity that method is had on sample number, thereby can improve the overall performance of system.
Claims (2)
1. one kind based on the parallel recognition system of the speaker VQ-SVM of virtual instrument technique; It is characterized in that: the parallel recognition system of this speaker VQ-SVM based on virtual instrument technique comprises voice pretreatment unit, feature extraction unit, speaker model unit, recognition unit, LabVIEW virtual instrument platform; On virtual instrument platform, realize a large program is divided into each little module through the sub-VI of LabVIEW; The program part that calls the MATLAB node that relates in the program all is written as each sub-VI, through calling the structure that this a little VI realizes system;
Adopt the VQ algorithm, set up the VQ model, initial codebook adopts disintegrating method, and the centre of form of selected characteristic vector realizes foundation and the storage of speaker model through calling the MATLAB node as inceptive code book in LabVIEW, and algorithmic formula is following:
Adopt the SVM algorithm; Set up the SVM model; Select for use radially basic kernel function to set up speaker's model; Its algorithmic formula is following:
,
;
Export recognition result in decision section as a result on through the Speaker Identification front panel in the recognition unit, when the result of VQ, two kinds of recognition methodss of SVM is inconsistent, export the recognition result of this method as correct result as long as there is a kind of method just to discern; When the coming to the same thing of two kinds of methods, on the Speaker Identification front panel, export recognition result, correct identification is with the green light indication, and nonrecognition is indicated with red light.
2. the parallel recognition system of the speaker VQ-SVM based on virtual instrument technique according to claim 1; It is characterized in that: described feature extraction unit adopts Mei Er frequency cepstral coefficient MFCC and first order difference thereof as the characteristic parameter of identification, realizes the extraction of characteristic parameter through under the MATLAB7.0 environment, programming, and concrete parameter is set to: frame length 512; Frame moves 256; The number of wave filter is 12, SF 44100Hz, and removed each two frame of head and the tail; Because the first order difference of this two frame is zero, so just obtained the speech feature vector of 24 dimensions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210008213XA CN102543075A (en) | 2012-01-12 | 2012-01-12 | Speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on virtual instrument technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210008213XA CN102543075A (en) | 2012-01-12 | 2012-01-12 | Speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on virtual instrument technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102543075A true CN102543075A (en) | 2012-07-04 |
Family
ID=46349815
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210008213XA Pending CN102543075A (en) | 2012-01-12 | 2012-01-12 | Speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on virtual instrument technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102543075A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107945787A (en) * | 2017-11-21 | 2018-04-20 | 上海电机学院 | A kind of acoustic control login management system and method based on virtual instrument technique |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030036905A1 (en) * | 2001-07-25 | 2003-02-20 | Yasuhiro Toguri | Information detection apparatus and method, and information search apparatus and method |
CN1588535A (en) * | 2004-09-29 | 2005-03-02 | 上海交通大学 | Automatic sound identifying treating method for embedded sound identifying system |
JP2005345683A (en) * | 2004-06-02 | 2005-12-15 | Toshiba Tec Corp | Speaker-recognizing device, program, and speaker-recognizing method |
CN101640043A (en) * | 2009-09-01 | 2010-02-03 | 清华大学 | Speaker recognition method based on multi-coordinate sequence kernel and system thereof |
-
2012
- 2012-01-12 CN CN201210008213XA patent/CN102543075A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030036905A1 (en) * | 2001-07-25 | 2003-02-20 | Yasuhiro Toguri | Information detection apparatus and method, and information search apparatus and method |
JP2005345683A (en) * | 2004-06-02 | 2005-12-15 | Toshiba Tec Corp | Speaker-recognizing device, program, and speaker-recognizing method |
CN1588535A (en) * | 2004-09-29 | 2005-03-02 | 上海交通大学 | Automatic sound identifying treating method for embedded sound identifying system |
CN101640043A (en) * | 2009-09-01 | 2010-02-03 | 清华大学 | Speaker recognition method based on multi-coordinate sequence kernel and system thereof |
Non-Patent Citations (2)
Title |
---|
余洋: "基于LabVIEW的说话人识别系统开发", 《中国优秀硕士学位论文全文数据库》 * |
刘祥楼等: "说话人识别中支持向量机核函数参数优化研究", 《科学技术与工程》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107945787A (en) * | 2017-11-21 | 2018-04-20 | 上海电机学院 | A kind of acoustic control login management system and method based on virtual instrument technique |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107092596B (en) | Text emotion analysis method based on attention CNNs and CCR | |
Soutner et al. | Application of LSTM neural networks in language modelling | |
CN102722713B (en) | Handwritten numeral recognition method based on lie group structure data and system thereof | |
CN101710490B (en) | Method and device for compensating noise for voice assessment | |
CN106776574B (en) | User comment text mining method and device | |
CN103177733B (en) | Standard Chinese suffixation of a nonsyllabic "r" sound voice quality evaluating method and system | |
CN106156083A (en) | A kind of domain knowledge processing method and processing device | |
CN107844559A (en) | A kind of file classifying method, device and electronic equipment | |
CN104216876B (en) | Information text filter method and system | |
CN107818164A (en) | A kind of intelligent answer method and its system | |
CN104166643A (en) | Dialogue act analyzing method in intelligent question-answering system | |
US10831993B2 (en) | Method and apparatus for constructing binary feature dictionary | |
CN103474061A (en) | Automatic distinguishing method based on integration of classifier for Chinese dialects | |
CN103077720B (en) | Speaker identification method and system | |
CN103678278A (en) | Chinese text emotion recognition method | |
CN104167208A (en) | Speaker recognition method and device | |
CN101488150A (en) | Real-time multi-view network focus event analysis apparatus and analysis method | |
Fang et al. | Channel adversarial training for cross-channel text-independent speaker recognition | |
CN113066499B (en) | Method and device for identifying identity of land-air conversation speaker | |
CN103514170A (en) | Speech-recognition text classification method and device | |
CN106897290B (en) | Method and device for establishing keyword model | |
CN104538035A (en) | Speaker recognition method and system based on Fisher supervectors | |
CN109063478A (en) | Method for detecting virus, device, equipment and the medium of transplantable executable file | |
Van Leeuwen | Speaker linking in large data sets | |
CN109800309A (en) | Classroom Discourse genre classification methods and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20120704 |