CN110070894A

CN110070894A - A kind of improved multiple pathology unit voice recognition methods

Info

Publication number: CN110070894A
Application number: CN201910233952.0A
Authority: CN
Inventors: 张涛; 武雅琴
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-03-26
Filing date: 2019-03-26
Publication date: 2019-07-30
Anticipated expiration: 2039-03-26
Also published as: CN110070894B

Abstract

A kind of improved multiple pathology unit voice recognition methods, comprising: calculate the line spectrum pairs parameter of input speech signal；Calculate the adjacent differential line spectrum pairs parameter of input speech signal；Frequency bending is carried out to the line spectrum pairs parameter of input speech signal, obtains the Bark line spectrum pairs parameter of input speech signal；Carrying out feature to the Bark line spectrum pairs parameter of input speech signal enhances to obtain enhanced Bark line spectrum pairs parameter；The enhanced Bark line spectrum pairs parameter of input speech signal is input to the identification that multiple pathology single vowels are carried out in deep neural network classifier.The present invention has better discrimination, to be subsequent to single vowel and the voice reparation of more complicated words sentence provides Research foundation.

Description

A kind of improved multiple pathology unit voice recognition methods

Technical field

The present invention relates to a kind of pathology unit voice recognition methods.Know more particularly to a kind of improved multiple pathology single vowels Other method

Background technique

Voice is that language transmits most direct mode, therefore voice quality directly affects the daily communication effect of people Rate.Voice Disorders are suffered from the U.S. about 7,500,000 people according to statistics, wherein the voice illness rate of profession of teaching personage is 57.7%, non-profession of teaching is 28.8%.In addition, about 2200 people are diagnosed with laryngocarcinoma every year in Britain.Voice Ambiguous can greatly reduce people's lives quality, therefore be identified that then reparation seems especially heavy to pathological voice It wants.

Voice Disorders can be treated by drug and physics mode, but the no thoroughness treated can be to the expression of illness language person It has an impact, therefore identification is carried out to pathological voice using Noninvasive repair mode and repairs the key for becoming scholars and studying. The identification reparation of single vowel voice is the basis of complicated words sentence.For multiple single vowel voice Study of recognition, at present research pair As being all based on normal voice, commonly used linear prediction cepstrum parameter (the Linear Prediction of characteristic parameter Cepstrum Coefficient, LPCC), Mel frequency cepstral parameter (Mel-Frequency Cepstral Coefficients, MFCC) and formant etc..However mostly it is conceived to pathological voice and just for the identification work of pathological voice Two classification of normal voice, due to most of acoustical characteristic parameters it is right/discrimination of a/ sound is nearly all higher than other vowels, state The inside and outside pathology single vowel/a/ that is typically chosen is as experiment sample, by extracting the characteristic parameter of voice sample and being entered into The identification of different classifications network progress pathological voice.Common identification feature have fundamental frequency disturbance, amplitude disturbances it is isometric when feature, MPEG-7 and multidirectional regression M DR (Multidirectional regression, MDR) etc. return feature etc..But it is applied to more It is poor to the recognition effect of multiple pathology single vowels that a normal cell sound knows another characteristic (LPCC, MFCC).

Summary of the invention

The technical problem to be solved by the invention is to provide the improvement that one kind can further increase pathological voice discrimination Multiple pathology unit voice recognition methods.

The technical scheme adopted by the invention is that: a kind of improved multiple pathology unit voice recognition methods, including walk as follows It is rapid:

1) line spectrum pairs parameter of input speech signal is calculated；

2) the adjacent differential line spectrum pairs parameter of input speech signal is calculated；

3) frequency bending is carried out to the line spectrum pairs parameter of input speech signal, obtains the Bark line spectrum pair of input speech signal Parameter；

4) carrying out feature to the Bark line spectrum pairs parameter of input speech signal enhances to obtain enhanced Bark line spectrum pairs parameter；

5) the enhanced Bark line spectrum pairs parameter of input speech signal is input in deep neural network classifier and is carried out The identification of multiple pathology single vowels.

Step 1) includes:

(1.1) Signal Pretreatment, including DC processing and sub-frame processing are carried out；

(1.2) it for every frame voice signal, is calculated according to the model order p=12 of setting using Lai Wenxun-Du Bin auto-correlation Method calculates 12 rank linear predictor coefficient a_i；

(1.3) the linear predictor coefficient a being calculated by (1.2)_iLinear prediction inverse filter system function is calculated, It is as follows:

In formula, A (z) indicates linear prediction inverse filter system function；P indicates model order；a_iIndicate linear prediction system Number；

(1.4) the p+1 rank symmetric and anti-symmetric multinomial of linear prediction inverse filter system function A (z) is calculated:

P (z)=A (z)+z^-(p+1)A(z^-1)(2)

In formula, P (z) indicates the p+1 rank symmetric polynomial of A (z), and A (z) indicates linear prediction inverse filter system function；p Indicate model order；

Q (z)=A (z)-z^-(p+1)A(z^-1)(3)

In formula, Q (z) indicates that the p+1 rank antisymmetry multinomial of A (z), A (z) indicate linear prediction inverse filter system letter Number；P indicates model order；

(1.5) line spectrum pairs parameter of 12 rank input speech signals is calculated by P (z) and Q (z):

In formula, H (e^jω) it is linear predication spectrum amplitude, e^jωIt is the frequency representation form of z, P (e^jω) it is A (e^jω) p+1 Rank symmetric polynomial, Q (e^jω) it is A (e^jω) p+1 rank antisymmetry multinomial, cos θ_iWith cos ω_iIt is LSP coefficient in cosine domain Expression, θ_iAnd ω_iIt is the corresponding line spectral frequencies of Line Spectral Pair coefficients of input speech signal, Π is to tire out to multiply symbol.

Step 2) is to calculate according to the following formula:

DAL_i=l_i+1-l_iI=1,2 ... M (M < N) (5)

In formula, DAL_iIt is the i-th rank adjacent differential line spectrum pairs parameter, l_i+1I+1 rank line spectrum pairs parameter, l_iI-th rank line spectrum pair Parameter, M are adjacent differential line spectrum pairs parameter maximum orders, and N is line spectrum pairs parameter maximum order.

The bending of frequency described in step 3) is using following formula:

- 0.53 (6) Bark=26.81/ (1+ (1960/f))

In formula, Bark indicates Bark frequency；F indicates linear frequency.

Step 4) is that the mode of bidirectional iteration is taken to be adjusted jth rank Bark line spectrum pairs parameter, j=2 ..., N-1, Original Bark line spectrum pairs parameter is directly updated after adjustment, and jth rank Bark line spectrum pairs parameter adjusted is used under adjustment In the Bark line spectrum pairs parameter of single order, the Bark line spectrum pairs parameter of present frame is set as { b₁, b₂,...b_N}^N, N is Bark line spectrum Pair order, the coefficient of the adjacent differential line spectrum pairs parameter of present frame is b_i+1-b_i, i=1,2 ..., N-1；Specific iterative formula It is as follows:

c_i=η (b_i+1-b_i), η < 1, i=2,3 ..., N-1 (8)

(1) forward direction iteration: from j=2 to j=N-1, forward direction adjusts jth rank Bark line spectrum pairs parameter；

(2) backward iteration: backward to adjust jth rank Bark line spectrum pairs parameter from j=N-1 to j=2；

(3) it is averaged: the preceding Bark line spectrum pairs parameter obtained to iteration and backward iteration is averaged to obtain enhanced bar Gram line spectrum pairs parameter；

In formula, η controls the degree of formant enhancing, and η is smaller, and reinforcing effect is more obvious.

Step 5) is to randomly select 75% from every kind of single vowel data set in SVD pathological voice database first to instruct Practice collection, 25% makees test set, guarantees to be evenly distributed in sorter network training and the every class voice data satisfaction of test phase, then will Wriggling polyp pathological voice/a/ ,/i/, enhanced bar of 12 ranks of/u/ and normal voice/a/ ,/i/ ,/u/ this 6 kinds of single vowel voices Gram line spectrum pairs parameter, which is input in deep neural network, to be identified, network parameter setting are as follows: 2 layers of hidden layer, every layer of 100 mind It through member, selects ReLU function as activation primitive, selects Softmax function by the defeated of neural network in identification model the last layer Become a probability distribution, and then Optimum Classification result out.

Multiple pathology unit voice recognition methods of a modification of the present invention, have the following beneficial effects:

1) present invention guarantees that more traditional MFCC, LPCC feature of improved multiple pathology unit voice recognition methods has preferably Discrimination proposes a kind of popularity feature E-BLSP suitable for the identification of multiple pathology single vowels.The E-BLSP newly proposed is special Sign realize to normal/a/ ,/i/ ,/u/ and pathology/a/ ,/i/ ,/u/6 kind single vowel high discrimination；

2) E-BLSP feature proposed by the present invention is higher than pathology/a/ sound, and traditional pathology to pathology/i/ sound discrimination Voice identification is all based on single vowel/a/ greatly, this improves new thinking for the identifying and diagnosing of pathological voice, also to be subsequent to unit The voice reparation of sound and more complicated words sentence provides Research foundation.

Detailed description of the invention

Fig. 1 is a kind of structural representation of improved multiple pathology unit voice recognition methods of the invention；

Fig. 2 a is 11 rank DAL parameter box figure of normal cell sound/a/；

Fig. 2 b is 11 rank DAL parameter box figure of pathology single vowel/a/；

Fig. 2 c is 11 rank DAL parameter box figure of normal cell sound/i/；

Fig. 2 d is 11 rank DAL parameter box figure of pathology single vowel/i/；

Fig. 2 e is 11 rank DAL parameter box figure of normal cell sound/u/；

Fig. 2 f is 11 rank DAL parameter box figure of pathology single vowel/u/；

Fig. 3 a is the schematic diagram of 12 rank LSP parameter of the embodiment of the present invention；

Fig. 3 b is the schematic diagram of 12 rank BLSP parameter of the embodiment of the present invention；

Fig. 4 a is the three-dimensional spectrum diagram of 12 rank BLSP parameter of the embodiment of the present invention；

Fig. 4 b is the three-dimensional spectrum diagram of 12 rank E-BLSP parameter of the embodiment of the present invention.

Specific embodiment

It is made in detail below with reference to multiple pathology unit voice recognition methods of the embodiment and attached drawing to a modification of the present invention It describes in detail bright.

As shown in Figure 1, multiple pathology unit voice recognition methods of a modification of the present invention, include the following steps:

1) line spectrum pair (Line Spectrum Pair, LSP) parameter of input speech signal is calculated；Include:

P (z)=A (z)+z^-(p+1)A(z^-1)(2)

Q (z)=A (z)-z^-(p+1)A(z^-1)(3)

In formula, H (e^jω) it is linear predication spectrum amplitude, e^jωIt is the frequency representation form of z, P (e^jω) it is A (e^jω) p+1 Rank symmetric polynomial, Q (e^jω) it is A (e^jω) p+1 rank antisymmetry multinomial, cos θ_iWith cos ω_iIt is LSP coefficient in cosine domain Expression, θ_iAnd ω_iIt is the corresponding line spectral frequencies of Line Spectral Pair coefficients (the Linear Spectrum of input speech signal Frequency, LSF), Π is to tire out to multiply symbol.

2) adjacent differential line spectrum pair (Difference of Adjacent LSP, the DAL) ginseng of input speech signal is calculated Number；

It is to calculate according to the following formula:

DAL_i=l_i+1-l_iI=1,2 ... M (M < N) (5)

3) frequency bending is carried out to the line spectrum pairs parameter of input speech signal, obtains the Bark line spectrum pair of input speech signal (BarkLine Spectrum Pair, BLSP) parameter；

The frequency bending is using following formula:

- 0.53 (6) Bark=26.81/ (1+ (1960/f))

In formula, Bark indicates Bark frequency；F indicates linear frequency.

4) carrying out feature to the Bark line spectrum pairs parameter of input speech signal enhances to obtain enhanced Bark line spectrum pair (Enhanced-Bark Line Spectrum Pair, E-BLSP) parameter；It is to take the mode of bidirectional iteration to jth rank Bark Line spectrum pairs parameter is adjusted, j=2 ..., N-1, original Bark line spectrum pairs parameter is directly updated after adjustment, and will be after adjustment Jth rank Bark line spectrum pairs parameter use in the Bark line spectrum pairs parameter for adjusting lower single order, set the Bark line spectrum pair of present frame Parameter is { b₁, b₂,...b_N}^N, N is the order of Bark line spectrum pair, and the coefficient of the adjacent differential line spectrum pairs parameter of present frame is b_i+1-b_i, i=1,2 ..., N-1；Specific iterative formula is as follows:

c_i=η (b_i+1-b_i), η < 1, i=2,3 ..., N-1 (8)

5) the enhanced Bark line spectrum pairs parameter of input speech signal is input in deep neural network classifier and is carried out The identification of multiple pathology single vowels.It is to be randomly selected from every kind of single vowel data set in SVD pathological voice database first 75% makees training set, and 25% makees test set, guarantees to meet average mark in sorter network training and the every class voice data of test phase Cloth, then by wriggling polyp pathological voice/a/ ,/i/ ,/u/ and normal voice/a/ ,/i/ ,/u/ this 6 kinds of single vowel voices 12 The enhanced Bark line spectrum pairs parameter of rank, which is input in deep neural network, to be identified, network parameter setting are as follows: 2 layers of hidden layer, Every layer of 100 neuron, select ReLU function as activation primitive, select Softmax function will in identification model the last layer The output of neural network becomes a probability distribution, and then Optimum Classification result.

Specific example is given below:

1, pre-process: the time span of every frame signal is 30ms, sample frequency 8KHz in sub-frame processing, and corresponding frame length is 240, frame pipettes 80

2, when calculating linear predictor coefficient, p=12

3, linear prediction inverse filter system function A (z) can be calculated by linear predictor coefficient

4, the p+1 rank symmetric and anti-symmetric multinomial P (z) and Q (z) of A (z) are calculated

5,12 rank LSP parameters are calculated by P (z) and Q (z)

6, by 12 rank LSP parameters calculate input speech signal 11 rank DAL (Difference of Adjacent LSP, DAL) parameter

7, frequency is carried out to the LSP parameter of input speech signal to bend to obtain BLSP (the Bark Line of input speech signal Spectrum Pair, BLSP) parameter

It is the box figure of 6 kinds of unit sound signal DAL parameters of the embodiment of the present invention shown in Fig. 2 a~Fig. 2 f.Wherein, shown in Fig. 2 a It is 11 rank DAL parameter box figure of normal cell sound/a/；It is 11 rank DAL parameter box figure of pathology single vowel/a/ shown in Fig. 2 b；Figure It is 11 rank DAL parameter box figure of normal cell sound/i/ shown in 2c；It is 11 rank DAL parameter box of pathology single vowel/i/ shown in Fig. 2 d Figure；It is 11 rank DAL parameter box figure of normal cell sound/u/ shown in Fig. 2 e；It is 11 rank DAL of pathology single vowel/u/ shown in Fig. 2 f Parameter box figure.

By Fig. 2 a~Fig. 2 f it is found that for normal/a/, tri- kinds of/i/ ,/u/ unit sound signals, preceding 7 rank DAL data distribution Rectangle frame difference is larger, has preferable discrimination to three kinds of single vowels；For tri- kinds of pathology/a/ ,/i/ ,/u/ unit sound signals, Preceding 7 rank DAL data are distributed more uniform than normal voice.For pathology/a/ sound, rear 4 rank DAL parameter and normal/a/ cent cloth It is completely different, and the rear 4 rank DAL data distribution of pathology/i/ sound and/u/ sound has more intersection, and it is poor to distinguish effect.Due to DAL low order parameter is higher than high band in view of DAL parameter low-frequency range discrimination to induction signal low frequency part, the embodiment of the present invention Characteristic and the domain Bark more can actual response human ear feeling that signal is generated, using the domain Bark change of scale to the LSP of extraction into Row non-linear frequency bends to obtain BLSP parameter, and Warping function is:

- 0.53 (6) Bark=26.81/ (1+ (1960/f))

In formula, Bark indicates Bark frequency；F indicates linear frequency.

It is the schematic diagram of 12 rank LSP parameter of the embodiment of the present invention and 12 rank BLSP parameters shown in Fig. 3 a~Fig. 3 b.With Fig. 3 a It compares, Fig. 3 b is exaggerated signal low frequency part, has compressed high frequency section, improves normal and the polynary sound of pathology discrimination.

8, carrying out feature to the BLSP parameter of input speech signal enhances to obtain E-BLSP (Enhanced-Bark Line SpectrumPair, E-BLSP) parameter: η controls the degree of formant enhancing, and η is smaller, and reinforcing effect is more obvious.The present invention is real It applies a η and takes 0.4.

It is that the three-dimensional frequency spectrums of 12 rank BLSP parameter of the embodiment of the present invention and 12 rank E-BLSP parameters shows shown in Fig. 4 a~Fig. 4 b It is intended to.Fig. 4 b is compared with Fig. 4 a, and amplitude greatly improves at formant frequency, and broadening effect is inhibited, and greatly strengthens normal With the discrimination of the polynary sound of pathology.

9, the E-BLSP parameter of input speech signal is input to the identification that multiple pathology single vowels are carried out in DNN classifier

The embodiment of the present invention randomly selects 75% from every kind of single vowel data set first and makees training set, and 25% tests Collection, guarantee it is trained and the every class voice data satisfaction of test phase being evenly distributed in sorter network, then by 6 kinds of single vowel voices 12 rank E-BLSP parameters are input in DNN (Deep Neural Network, DNN) network and are identified.Network parameter is arranged such as Under: 2 layers of hidden layer, every layer of 100 neuron.

The embodiment of the present invention is in terms of selection unit sound voice source signal, using the responsible recording of voice study institute, Sa Lan university SVD (Saarbrucken Voice Database, SVD) pathological voice database, comprising sustained vowel/a/ ,/i/ and/u/ Normal and various pathological voice signal, sample rate is unified for 50KHz, and resolution ratio is 16.Therefrom choose wriggling polyp pathology Each three kinds of sustained vowel/a/ ,/the i/ ,/u/ of voice and normal voice are tested, and sample rate is uniformly reduced to 8KHz.Every class voice Total sample number is 180, includes 4 kinds of different tones (normal, low, high and low-high-low).

The evaluation of the embodiment of the present invention mainly has accuracy rate and AUC two indices.Accuracy rate is defined as by correctly classification case The percentage of example, ROC (Receiver Operating Characteristic, ROC) curve are reflection sensibility and specificities The overall target of continuous variable can disclose the correlation of sensibility and specificity, AUC (Area Under with composition method Curve, AUC) it is defined as the area surrounded under ROC curve with reference axis, value range between 0.5 and 1, get over by the value of AUC Greatly, classifying quality is better.In order to guarantee the accuracy and popularity of experiment, every kind of feature combination experiment does 10 times, is made even As last classification results.

As can be seen from Table 1: feature of the invention to the discriminations of multiple pathology single vowels than using traditional MFCC and LPCC high.Highest accuracy rate is up to 97.3600%, AUC up to 0.9894.

Table 1

Claims

1. a kind of improved multiple pathology unit voice recognition methods, which comprises the steps of:

1) line spectrum pairs parameter of input speech signal is calculated；

3) frequency bending is carried out to the line spectrum pairs parameter of input speech signal, obtains the Bark line spectrum pair ginseng of input speech signal Number；

5) the enhanced Bark line spectrum pairs parameter of input speech signal is input in deep neural network classifier carry out it is multiple The identification of pathology single vowel.

2. the improved multiple pathology unit voice recognition methods of one kind according to claim 1, which is characterized in that step 1) packet It includes:

(1.2) for every frame voice signal, Lai Wenxun-Du Bin auto-correlation algorithm meter is used according to the model order p=12 of setting Calculate 12 rank linear predictor coefficient a_i；

(1.3) the linear predictor coefficient a being calculated by (1.2)_iLinear prediction inverse filter system function is calculated, as follows:

In formula, A (z) indicates linear prediction inverse filter system function；P indicates model order；a_iIndicate linear predictor coefficient；

P (z)=A (z)+z^-(p+1)A(z^-1) (2)

In formula, P (z) indicates the p+1 rank symmetric polynomial of A (z), and A (z) indicates linear prediction inverse filter system function；P is indicated Model order；

Q (z)=A (z)-z^-(p+1)A(z^-1) (3)

In formula, Q (z) indicates that the p+1 rank antisymmetry multinomial of A (z), A (z) indicate linear prediction inverse filter system function；P table Representation model order；

In formula, H (e^jω) it is linear predication spectrum amplitude, e^jωIt is the frequency representation form of z, P (e^jω) it is A (e^jω) p+1 rank pair Claim multinomial, Q (e^jω) it is A (e^jω) p+1 rank antisymmetry multinomial, cos θ_iWith cos ω_iIt is table of the LSP coefficient in cosine domain Show, θ_iAnd ω_iIt is the corresponding line spectral frequencies of Line Spectral Pair coefficients of input speech signal, Π is to tire out to multiply symbol.

3. the improved multiple pathology unit voice recognition methods of one kind according to claim 1, which is characterized in that step 2) is It calculates according to the following formula:

DAL_i=l_i+1-l_iI=1,2 ... M (M < N) (5)

In formula, DAL_iIt is the i-th rank adjacent differential line spectrum pairs parameter, l_i+1I+1 rank line spectrum pairs parameter, l_iI-th rank line spectrum pairs parameter, M is adjacent differential line spectrum pairs parameter maximum order, and N is line spectrum pairs parameter maximum order.

4. the improved multiple pathology unit voice recognition methods of one kind according to claim 1, which is characterized in that step 3) institute The frequency bending stated is using following formula:

- 0.53 (6) Bark=26.81/ (1+ (1960/f))

In formula, Bark indicates Bark frequency；F indicates linear frequency.

5. the improved multiple pathology unit voice recognition methods of one kind according to claim 1, which is characterized in that step 4) is The mode of bidirectional iteration is taken to be adjusted jth rank Bark line spectrum pairs parameter, j=2 ..., N-1 directly update former after adjustment The Bark line spectrum pairs parameter come, and jth rank Bark line spectrum pairs parameter adjusted is used to the Bark line spectrum pair for adjusting lower single order In parameter, the Bark line spectrum pairs parameter of present frame is set as { b₁, b₂,...b_N}^N, N is the order of Bark line spectrum pair, present frame The coefficient of adjacent differential line spectrum pairs parameter is b_i+1-b_i, i=1,2 ..., N-1；Specific iterative formula is as follows:

c_i=η (b_i+1-b_i), η < 1, i=2,3 ..., N-1 (8)

(3) it is averaged: the preceding Bark line spectrum pairs parameter obtained to iteration and backward iteration is averaged to obtain enhanced Bark line Spectrum is to parameter；

6. the improved multiple pathology unit voice recognition methods of one kind according to claim 1, which is characterized in that step 5) is 75% is randomly selected from every kind of single vowel data set in SVD pathological voice database first and makees training set, and 25% tests Collection guarantees to be evenly distributed in sorter network training and the every class voice data satisfaction of test phase, then by wriggling polyp pathology throat Sound/a/ ,/i/, the enhanced Bark line spectrum pairs parameter of 12 ranks of/u/ and normal voice/a/ ,/i/ ,/u/ this 6 kinds of single vowel voices are defeated Enter and is identified into deep neural network, network parameter setting are as follows: 2 layers of hidden layer, every layer of 100 neuron select ReLU Function selects Softmax function that the output of neural network is become one generally as activation primitive, in identification model the last layer Rate distribution, and then Optimum Classification result.