CN107507611A

CN107507611A - A kind of method and device of Classification of Speech identification

Info

Publication number: CN107507611A
Application number: CN201710774048.1A
Authority: CN
Inventors: 张莉; 徐志强; 王邦军; 张召; 李凡长
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2017-08-31
Filing date: 2017-08-31
Publication date: 2017-12-22
Anticipated expiration: 2037-08-31
Also published as: CN107507611B

Abstract

The invention discloses a kind of Classification of Speech to know method for distinguishing, and speech data sample to be discriminated is inputted into the sorter model being pre-created, the classification results of the speech data sample are worth to according to the output of the sorter model；Wherein, the sorter model is based on using L1 normal forms regular parameter and the constraints of the determination of Laplce's regular parameter and SVMs, obtain supporting vector sample set, so that the sorter model obtained, with stronger openness and interpretation, and there is stronger filter capacity to noise, so as to have stronger robustness to noise, so as to obtain an accurate result to Classification of Speech.The present invention also provides a kind of device of Classification of Speech identification, has above-mentioned beneficial effect.

Description

A kind of method and device of Classification of Speech identification

Technical field

The present invention relates to artificial intelligence application field, more particularly to a kind of method and device of Classification of Speech identification.

Background technology

With the development of artificial intelligence, computer technology is widely used in every field, and wherein speech recognition is Wherein there is very much one of direction of application value, and the conventional complexity all suitable for the treatment technology of voice and language, to meter The computing of calculation machine brings certain burden.

Current relatively simple voice processing technology is to create generation model using semi-supervised algorithm, with the life of establishment Voice is handled into model, but the generation model of currently acquired processing voice creates process complexity, and to noise Filter capacity is not very strong, lacks stronger robustness.

The content of the invention

It is an object of the invention to provide a kind of Classification of Speech to know method for distinguishing, solves the generation model of processing voice to making an uproar The problem of sound filter capacity is low, improve the accuracy of Classification of Speech identification.

It is a further object of the present invention to provide a kind of device of Classification of Speech identification.

In order to solve the above technical problems, the present invention, which provides a kind of Classification of Speech, knows method for distinguishing, this method includes：

Speech data sample to be discriminated is inputted into the sorter model being pre-created, according to the sorter model Output is worth to the classification results of the speech data sample；Wherein, the sorter model establishment process is：

The training sample set of speech data is inputted into Laplce's SVMs；According to the training sample set and just Determine parameter and kernel function, obtain optimal L1 normal form regular parameters γ_AWith optimal Laplce's regular parameter γ_IAnd core Jacobian matrix；According to the optimal L1 normal form regular parameters γ_A, optimal Laplce's regular parameter γ_I, the core The constraints of function and SVMs, obtain supporting vector sample set and offset；According to the supporting vector sample This set and the offset, obtain sorter model.

Wherein, the training sample set of speech data is inputted in the SVMs to Laplce to be included：

Training sample set is inputted into Laplce's SVMs：Wherein x_i∈R^D, y_iIt is x_iLabel, Show x_iClassification, work as y_iDuring ∈ { -1 ,+1 }, It is the number for having label training sample, works as y_iWhen=0,U is the number of no label training sample, and D is the dimension of luv space.

Wherein, it is described according to the training sample set and positive definite parameter and kernel function, obtain optimal L1 normal form canonicals Parameter γ_AWith Laplce's regular parameter γ_IAnd kernel matrix includes：

The training sample set is divided into several pieces, passes through the training sample set pair L1 normal form regular parameters after division γ_AWith Laplce's regular parameter γ_ITested and trained in a manner of cross validation, obtain optimal L1 normal forms canonical ginseng Number γ_AWith optimal Laplce's regular parameter γ_I；

The training sample set is mapped in core Hilbert space by kernel function, obtains kernel matrix K, wherein K_ij=k (x_i,x_j)。

Wherein, it is described according to the optimal L1 normal form regular parameters γ_A, optimal Laplce's regular parameter γ_I, the kernel function and SVMs constraints, obtaining supporting vector sample set and offset includes：

Under conditions of, solveObtain discrimination model Coefficient a=a⁺-a^-=[α₁,α₂,...,α_l+u]^TWith offset b=β⁺-β^-, wherein,δ is constant coefficient, ξ_iFor slack variable,L=D-W is Laplacian Matrix,For parameter preset, and D_ii=∑_jW_ij；

According to the coefficient a of the discrimination model, obtain training sample and concentrate supporting vector sample set SVs={ x_i|α_i≠ 0, i=1 ..., N }.

Wherein, described according to the supporting vector sample set and the offset, obtaining sorter model includes：

According to the supporting vector sample set and the offset, the sorter model is determined：Wherein, x is speech data sample to be judged, wherein x ∈ R^D, x_sv It is supporting vector, a_svIt is the model coefficient of supporting vector, and y value is speech data sample x differentiation result.

The present invention also provides a kind of device of Classification of Speech identification, and the device includes：

Classifier modules, for inputting speech data sample to be discriminated into the sorter model being pre-created, according to The output of the sorter model is worth to the classification results of the speech data sample；Wherein, the sorter model is point Class device creation module, which creates, to be obtained, and the grader creation module is used for：

The training sample set of speech data is inputted into Laplce's SVMs；According to the training sample set and just Determine parameter and kernel function, obtain optimal L1 normal form regular parameters γ_AWith optimal Laplce's regular parameter γ_IWith core letter Matrix number；According to the optimal L1 normal form regular parameters γ_A, optimal Laplce's regular parameter γ_I, the core letter The constraints of number and SVMs, obtain supporting vector sample set and offset；According to the supporting vector sample Set and the offset, obtain sorter model.

Wherein, the grader creation module includes：

Input block, for inputting training sample set into Laplce's SVMs：Wherein x_i∈R^D, y_iIt is x_iLabel, show x_iClassification, work as y_iDuring ∈ { -1 ,+1 }, It is the number for having label training sample, works as y_i When=0,U is the number of no label training sample, and D is the dimension of luv space.

Wherein, the grader creation module includes：

Parameter processing unit, for the training sample set to be divided into several pieces, pass through the training sample set after division To L1 normal form regular parameters γ_AWith Laplce's regular parameter γ_ITested and trained in a manner of cross validation, obtained most Excellent L1 normal form regular parameters γ_AWith Laplce's regular parameter γ_I；The training sample set is mapped to core by kernel function In Hilbert space, kernel matrix K, wherein K are obtained_ij=k (x_i,x_j)。

Wherein, the grader creation module includes：

Arithmetic element, forUnder conditions of, solveObtain discrimination model Coefficient a=a⁺-a^-=[α₁,α₂,...,α_l+u]^TWith offset b=β⁺-β^-, wherein,δ is constant coefficient, ξ_iFor slack variable,L=D-W is Laplacian Matrix,For parameter preset, and D_ii=∑_jW_ij；According to the coefficient a of the discrimination model, obtain training sample and concentrate supporting vector sample set SVs={ x_i|α_i ≠ 0, i=1 ..., N }.

Wherein, the grader creation module includes：

Obtaining unit, for according to the supporting vector sample set and the offset, determining the sorter model：Wherein, x is speech data sample to be judged, wherein x ∈ R^D, x_sv It is supporting vector, a_svIt is the model coefficient of supporting vector, and y value is speech data sample x differentiation result.

A kind of Classification of Speech provided by the present invention knows method for distinguishing, is input to by speech data sample to be discriminated pre- In the sorter model first created, obtain and differentiate result, the sorter model is to use L1 normal forms regular parameter and Laplce Regular parameter determines and the constraints of SVMs, obtains supporting vector sample set, so that the classification obtained Device model, L1 normal forms regular parameter and Laplce's regular parameter cause the sorter model to have openness and interpretation, In practice, it is possible to required model is just obtained using seldom sample point, this further enhances grader To the filter capacity of noise, so as to obtain good robustness, the present invention compared to the prior art, classification of the present invention Device model complexity is low, and with stronger interpretation and openness, on the basis of the discrimination of sorter model is improved, Also there is stronger robustness to noise so that the result of Classification of Speech is more accurate.

The present invention also provides a kind of device of Classification of Speech identification, has above-mentioned beneficial effect.

Brief description of the drawings

, below will be to embodiment or existing for the clearer explanation embodiment of the present invention or the technical scheme of prior art The required accompanying drawing used is briefly described in technology description, it should be apparent that, drawings in the following description are only this hair Some bright embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can be with root Other accompanying drawings are obtained according to these accompanying drawings.

Fig. 1 is that Classification of Speech provided by the invention identifies a kind of flow chart of embodiment；

Fig. 2 is the structured flowchart of the device of Classification of Speech provided in an embodiment of the present invention identification.

Embodiment

In order that those skilled in the art more fully understand the present invention program, with reference to the accompanying drawings and detailed description The present invention is described in further detail.Obviously, described embodiment is only part of the embodiment of the present invention, rather than Whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art are not making creative work premise Lower obtained every other embodiment, belongs to the scope of protection of the invention.

Classification of Speech provided by the invention identifies a kind of flow chart of embodiment as shown in figure 1, this method can be with Including：

Step S101：The training sample set of speech data is inputted into Laplce's SVMs.

Step S102：According to the training sample set and positive definite parameter and kernel function, optimal L1 normal form canonicals are obtained Parameter γ_AWith optimal Laplce's regular parameter γ_IAnd kernel matrix.

Step S103：According to the optimal L1 normal form regular parameters γ_A, optimal Laplce's regular parameter γ_I, the kernel function and SVMs constraints, obtain supporting vector sample set and offset.

Step S104：According to the supporting vector sample set and the offset, sorter model is obtained.

Step S105：Speech data sample to be discriminated is inputted into sorter model, according to the output of sorter model It is worth to the classification results of speech data sample.

It should be noted that in the present embodiment, step S101 to step S104 is the process for creating sorter model, Grader in the present invention can create according to step S101 to step S104 in advance, it is only necessary to defeated according to step S105 Enter speech data sample to be judged, you can obtain classification results.

In addition, generation model of the prior art needs to take the particular content of identification voice, it is necessary to existing including noise Interior voice content is identified, and so as to cause robust type poor, and the sorter model used in the present invention is to be used to identify The content of the voice whether be we want without go be concerned about the specific content of voice what is.

Cite a plain example, in sorter model is created, choose 150 project personnel's recording, everyone reads two 52 samples are obtained all over alphabet, wherein each sample dimension is 617.Differentiate for voice content and data set is divided： Preceding 120 personnel are extracted to alphabetical a's and alphabetical b's pronunciation, obtain 480*617 data set, alphabetical a speech datas division For positive class, word b speech datas are divided into negative class, mark 80% and are used as training set, remainder is as test set.Wherein training set Part, mark off 10 data has label data the most, and remaining is as without label data.Trained point using training set Class device model, input test collection obtain the accuracy based on training sample of the sorter model.

In actual application, the voice that we choose a, b does training sample and obtains that voice a, b can be made a distinction Sorter model, when there is new phonetic entry, as long as judging whether voice is a or b, without concern for voice it is specific in Hold.

The sorter model that the present invention uses in Classification of Speech identification, when having such demand in some scenarios, with regard to pole The big technical difficulty that reduces also saves cost.Simplest application is that, when our voice answers, have fixation for some The topic of answer, data under voice can be pre-processed and be judged to obtain result by the model again.Such as judge Topic, in advance learner is trained to obtain corresponding model with " right, wrong " two kinds of voices, the speech data that then will be collected Result is obtained using the model after pretreatment, and then result is exported and is compared with former topic answer.

In summary, the sorter model employed in the present invention is to have used L1 normal form regularizations so that creates classification The object function of device model has openness and interpretation, also implies that and is obtained with using seldom training sample point Required sorter model, so as to exclude noise data well so that sorter model has good robustness.

Based on above-described embodiment, in another specific embodiment of the invention, can include：

The training sample set of input speech data is specially in the SVMs to Laplce：

It is described according to the training sample set and positive definite parameter and kernel function, obtain optimal L1 normal form regular parameters γ_AWith Laplce's regular parameter γ_IAnd kernel matrix is specially：

The training sample set is divided into several pieces, passes through the training sample set pair L1 normal form regular parameters after division γ_AWith Laplce's regular parameter γ_ITested and trained in a manner of cross validation, obtain optimal L1 normal forms canonical ginseng Number γ_AWith optimal Laplce's regular parameter γ_I；The training sample set is mapped to core Hilbert sky by kernel function Between in, obtain kernel matrix K, wherein K_ij=k (x_i,x_j)。

Specifically, the cross-validation method in the present invention is to use k-fold cross-validation methods, L1 normal forms regular parameter is with drawing This regular parameter of pula is obtained by cross validation.

For example, training set is divided into five equal portions, chooses a copy of it and be used as test, other are used as training, obtain five Accuracy rate is averaging, and obtained Average Accuracy is exactly accurate corresponding to corresponding L1 normal forms regular parameter and Laplce's regular parameter True rate, L1 normal forms regular parameter corresponding to maximum accuracy rate and Laplce's regular parameter are finally chosen as final parameter.

It is described according to the optimal L1 normal form regular parameters γ_A, optimal Laplce's regular parameter γ_I, it is described The constraints of kernel function and SVMs, obtains supporting vector sample set and offset is specially：

Specifically, δ takes the normal number of a very little to ensure to get unique solution, it is constant coefficient.

It is described to be specially according to the supporting vector sample set and the offset, acquisition sorter model：

The device of Classification of Speech provided in an embodiment of the present invention identification is introduced below, Classification of Speech described below The device of identification knows method for distinguishing with above-described Classification of Speech can be mutually to should refer to.

Fig. 2 is the structured flowchart of the device of Classification of Speech provided in an embodiment of the present invention identification, and the Classification of Speech of reference picture 2 is known Other device can include：

Classifier modules 100, for inputting speech data sample to be discriminated, root into the sorter model being pre-created The classification results of the speech data sample are worth to according to the output of the sorter model；Wherein, the sorter model is Grader creation module 200, which creates, to be obtained, and the grader creation module 200 is used for：

Alternatively, the grader creation module 200 includes：

Optionally, the grader creation module 200 includes：

The device of the Classification of Speech identification of the present embodiment is used to realize that foregoing Classification of Speech knows method for distinguishing, therefore voice The visible Classification of Speech hereinbefore of embodiment in the device of Classification and Identification knows the embodiment part of method for distinguishing, example Such as, classifier modules 100, for realizing that above-mentioned Classification of Speech knows step S105 in method for distinguishing, grader creation module 200, For realizing that above-mentioned Classification of Speech knows step S101, S102, S103 and S104 in method for distinguishing, so, its embodiment The description of corresponding various pieces embodiment is referred to, will not be repeated here.

Each embodiment is described by the way of progressive in this specification, what each embodiment stressed be with it is other The difference of embodiment, between each embodiment same or similar part mutually referring to.For dress disclosed in embodiment For putting, because it is corresponded to the method disclosed in Example, so description is fairly simple, related part is referring to method part Explanation.

Professional further appreciates that, with reference to the unit of each example of the embodiments described herein description And algorithm steps, can be realized with electronic hardware, computer software or the combination of the two, in order to clearly demonstrate hardware and The interchangeability of software, the composition and step of each example are generally described according to function in the above description.These Function is performed with hardware or software mode actually, application-specific and design constraint depending on technical scheme.Specialty Technical staff can realize described function using distinct methods to each specific application, but this realization should not Think beyond the scope of this invention.

Directly it can be held with reference to the step of method or algorithm that the embodiments described herein describes with hardware, processor Capable software module, or the two combination are implemented.Software module can be placed in random access memory (RAM), internal memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.

Method for distinguishing is known to Classification of Speech provided by the present invention above and device is described in detail.Herein should The principle and embodiment of the present invention are set forth with specific case, the explanation of above example is only intended to help and managed Solve the method and its core concept of the present invention.It should be pointed out that for those skilled in the art, do not departing from On the premise of the principle of the invention, some improvement and modification can also be carried out to the present invention, these are improved and modification also falls into this hair In bright scope of the claims.

Claims

1. a kind of Classification of Speech knows method for distinguishing, it is characterised in that including：

Speech data sample to be discriminated is inputted into the sorter model being pre-created, according to the output of the sorter model It is worth to the classification results of the speech data sample；

Wherein, the sorter model establishment process is：

The training sample set of speech data is inputted into Laplce's SVMs；

According to the training sample set and positive definite parameter and kernel function, optimal L1 normal form regular parameters γ is obtained_AWith it is optimal Laplce's regular parameter γ_IAnd kernel matrix；

According to the optimal L1 normal form regular parameters γ_A, optimal Laplce's regular parameter γ_I, the kernel function with And the constraints of SVMs, obtain supporting vector sample set and offset；

According to the supporting vector sample set and the offset, the sorter model is obtained.

2. according to the method for claim 1, it is characterised in that voice number is inputted in the SVMs to Laplce According to training sample set include：

Training sample set is inputted into Laplce's SVMs：Wherein x_i∈R^D, y_iIt is x_iLabel, show x_i Classification, work as y_iDuring ∈ { -1 ,+1 }, i=1 ..., l, l are the numbers for having label training sample, work as y_iWhen=0, i=l+ 1 ..., u, u be no label training sample number, D is the dimension of luv space.

3. according to the method for claim 2, it is characterised in that it is described according to the training sample set and positive definite parameter and Kernel function, obtain optimal L1 normal form regular parameters γ_AWith Laplce's regular parameter γ_IAnd kernel matrix includes：

The training sample set is divided into several pieces, passes through the training sample set pair L1 normal form regular parameters γ after division_AAnd drawing Pula this regular parameter γ_ITested and trained in a manner of cross validation, obtain optimal L1 normal form regular parameters γ_AWith Optimal Laplce's regular parameter γ_I；

The training sample set is mapped in core Hilbert space by kernel function, obtains kernel matrix K, wherein K_ij= k(x_i,x_j)。

4. according to the method for claim 3, it is characterised in that described according to the optimal L1 normal form regular parameters γ_A、 Optimal Laplce's regular parameter γ_I, the kernel function and SVMs constraints, obtain supporting vector Sample set and offset include：

Under conditions of, solveObtain discrimination model Coefficient a=a⁺-a^-=[α₁,α₂,...,α_l+u]^TWith offset b=β⁺-β^-, wherein, ξ_i>=0, i=1 ..., l, J=1 ..., l+u, δ are constant coefficient, ξ_iFor slack variable,L=D-W is Laplacian Matrix,T ＞ 0 are parameter preset, and D_ii=∑_jW_ij；

According to the coefficient a of the discrimination model, obtain training sample and concentrate supporting vector sample set SVs={ x_i|α_i≠ 0, i= 1,…,N}。

5. according to the method for claim 1, it is characterised in that described according to the supporting vector sample set and described inclined Shifting amount, obtaining sorter model includes：

A kind of 6. device of Classification of Speech identification, it is characterised in that including：

Classifier modules, for inputting speech data sample to be discriminated into the sorter model being pre-created, according to described The output of sorter model is worth to the classification results of the speech data sample；

Wherein, the sorter model is that grader creation module creates acquisition, and the grader creation module is used for：

The training sample set of speech data is inputted into Laplce's SVMs；

According to the supporting vector sample set and the offset, sorter model is obtained.

7. according to the method for claim 6, it is characterised in that the grader creation module includes：

Input block, for inputting training sample set into Laplce's SVMs：Wherein x_i∈R^D, y_iIt is x_i Label, show x_iClassification, work as y_iDuring ∈ { -1 ,+1 }, i=1 ..., l, l are the numbers for having label training sample, work as y_i=0 When, i=l+1 ..., u, u are the numbers of no label training sample, and D is the dimension of luv space.

8. device according to claim 7, it is characterised in that the grader creation module includes：

Parameter processing unit, for the training sample set to be divided into several pieces, pass through the training sample set pair L1 after division Normal form regular parameter γ_AWith Laplce's regular parameter γ_ITested and trained in a manner of cross validation, obtained optimal L1 normal form regular parameters γ_AWith Laplce's regular parameter γ_I；The training sample set is mapped to core Xi Er by kernel function In Bert space, kernel matrix K, wherein K are obtained_ij=k (x_i,x_j)。

9. device according to claim 8, it is characterised in that the grader creation module includes：

Arithmetic element, forUnder conditions of, solveObtain discrimination model Coefficient a=a⁺-a^-=[α₁,α₂,...,α_l+u]^TWith offset b=β⁺-β^-, wherein, ξ_i>=0, i=1 ..., l, J=1 ..., l+u, δ are constant coefficient, ξ_iFor slack variable,I=1 ..., l, L=D-W La Pula This matrix,T ＞ 0 are parameter preset, and D_ii=∑_jW_ij；According to the coefficient a of the discrimination model, instructed Practice supporting vector sample set SVs={ x in sample set_i|α_i≠ 0, i=1 ..., N }.

10. device according to claim 9, it is characterised in that the grader creation module includes：