CN110070894A - A kind of improved multiple pathology unit voice recognition methods - Google Patents

A kind of improved multiple pathology unit voice recognition methods Download PDF

Info

Publication number
CN110070894A
CN110070894A CN201910233952.0A CN201910233952A CN110070894A CN 110070894 A CN110070894 A CN 110070894A CN 201910233952 A CN201910233952 A CN 201910233952A CN 110070894 A CN110070894 A CN 110070894A
Authority
CN
China
Prior art keywords
line spectrum
parameter
bark
spectrum pairs
rank
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910233952.0A
Other languages
Chinese (zh)
Other versions
CN110070894B (en
Inventor
张涛
武雅琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201910233952.0A priority Critical patent/CN110070894B/en
Publication of CN110070894A publication Critical patent/CN110070894A/en
Application granted granted Critical
Publication of CN110070894B publication Critical patent/CN110070894B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A kind of improved multiple pathology unit voice recognition methods, comprising: calculate the line spectrum pairs parameter of input speech signal;Calculate the adjacent differential line spectrum pairs parameter of input speech signal;Frequency bending is carried out to the line spectrum pairs parameter of input speech signal, obtains the Bark line spectrum pairs parameter of input speech signal;Carrying out feature to the Bark line spectrum pairs parameter of input speech signal enhances to obtain enhanced Bark line spectrum pairs parameter;The enhanced Bark line spectrum pairs parameter of input speech signal is input to the identification that multiple pathology single vowels are carried out in deep neural network classifier.The present invention has better discrimination, to be subsequent to single vowel and the voice reparation of more complicated words sentence provides Research foundation.

Description

A kind of improved multiple pathology unit voice recognition methods
Technical field
The present invention relates to a kind of pathology unit voice recognition methods.Know more particularly to a kind of improved multiple pathology single vowels Other method
Background technique
Voice is that language transmits most direct mode, therefore voice quality directly affects the daily communication effect of people Rate.Voice Disorders are suffered from the U.S. about 7,500,000 people according to statistics, wherein the voice illness rate of profession of teaching personage is 57.7%, non-profession of teaching is 28.8%.In addition, about 2200 people are diagnosed with laryngocarcinoma every year in Britain.Voice Ambiguous can greatly reduce people's lives quality, therefore be identified that then reparation seems especially heavy to pathological voice It wants.
Voice Disorders can be treated by drug and physics mode, but the no thoroughness treated can be to the expression of illness language person It has an impact, therefore identification is carried out to pathological voice using Noninvasive repair mode and repairs the key for becoming scholars and studying. The identification reparation of single vowel voice is the basis of complicated words sentence.For multiple single vowel voice Study of recognition, at present research pair As being all based on normal voice, commonly used linear prediction cepstrum parameter (the Linear Prediction of characteristic parameter Cepstrum Coefficient, LPCC), Mel frequency cepstral parameter (Mel-Frequency Cepstral Coefficients, MFCC) and formant etc..However mostly it is conceived to pathological voice and just for the identification work of pathological voice Two classification of normal voice, due to most of acoustical characteristic parameters it is right/discrimination of a/ sound is nearly all higher than other vowels, state The inside and outside pathology single vowel/a/ that is typically chosen is as experiment sample, by extracting the characteristic parameter of voice sample and being entered into The identification of different classifications network progress pathological voice.Common identification feature have fundamental frequency disturbance, amplitude disturbances it is isometric when feature, MPEG-7 and multidirectional regression M DR (Multidirectional regression, MDR) etc. return feature etc..But it is applied to more It is poor to the recognition effect of multiple pathology single vowels that a normal cell sound knows another characteristic (LPCC, MFCC).
Summary of the invention
The technical problem to be solved by the invention is to provide the improvement that one kind can further increase pathological voice discrimination Multiple pathology unit voice recognition methods.
The technical scheme adopted by the invention is that: a kind of improved multiple pathology unit voice recognition methods, including walk as follows It is rapid:
1) line spectrum pairs parameter of input speech signal is calculated;
2) the adjacent differential line spectrum pairs parameter of input speech signal is calculated;
3) frequency bending is carried out to the line spectrum pairs parameter of input speech signal, obtains the Bark line spectrum pair of input speech signal Parameter;
4) carrying out feature to the Bark line spectrum pairs parameter of input speech signal enhances to obtain enhanced Bark line spectrum pairs parameter;
5) the enhanced Bark line spectrum pairs parameter of input speech signal is input in deep neural network classifier and is carried out The identification of multiple pathology single vowels.
Step 1) includes:
(1.1) Signal Pretreatment, including DC processing and sub-frame processing are carried out;
(1.2) it for every frame voice signal, is calculated according to the model order p=12 of setting using Lai Wenxun-Du Bin auto-correlation Method calculates 12 rank linear predictor coefficient ai
(1.3) the linear predictor coefficient a being calculated by (1.2)iLinear prediction inverse filter system function is calculated, It is as follows:
In formula, A (z) indicates linear prediction inverse filter system function;P indicates model order;aiIndicate linear prediction system Number;
(1.4) the p+1 rank symmetric and anti-symmetric multinomial of linear prediction inverse filter system function A (z) is calculated:
P (z)=A (z)+z-(p+1)A(z-1)(2)
In formula, P (z) indicates the p+1 rank symmetric polynomial of A (z), and A (z) indicates linear prediction inverse filter system function;p Indicate model order;
Q (z)=A (z)-z-(p+1)A(z-1)(3)
In formula, Q (z) indicates that the p+1 rank antisymmetry multinomial of A (z), A (z) indicate linear prediction inverse filter system letter Number;P indicates model order;
(1.5) line spectrum pairs parameter of 12 rank input speech signals is calculated by P (z) and Q (z):
In formula, H (e) it is linear predication spectrum amplitude, eIt is the frequency representation form of z, P (e) it is A (e) p+1 Rank symmetric polynomial, Q (e) it is A (e) p+1 rank antisymmetry multinomial, cos θiWith cos ωiIt is LSP coefficient in cosine domain Expression, θiAnd ωiIt is the corresponding line spectral frequencies of Line Spectral Pair coefficients of input speech signal, Π is to tire out to multiply symbol.
Step 2) is to calculate according to the following formula:
DALi=li+1-liI=1,2 ... M (M < N) (5)
In formula, DALiIt is the i-th rank adjacent differential line spectrum pairs parameter, li+1I+1 rank line spectrum pairs parameter, liI-th rank line spectrum pair Parameter, M are adjacent differential line spectrum pairs parameter maximum orders, and N is line spectrum pairs parameter maximum order.
The bending of frequency described in step 3) is using following formula:
- 0.53 (6) Bark=26.81/ (1+ (1960/f))
In formula, Bark indicates Bark frequency;F indicates linear frequency.
Step 4) is that the mode of bidirectional iteration is taken to be adjusted jth rank Bark line spectrum pairs parameter, j=2 ..., N-1, Original Bark line spectrum pairs parameter is directly updated after adjustment, and jth rank Bark line spectrum pairs parameter adjusted is used under adjustment In the Bark line spectrum pairs parameter of single order, the Bark line spectrum pairs parameter of present frame is set as { b1, b2,...bN}N, N is Bark line spectrum Pair order, the coefficient of the adjacent differential line spectrum pairs parameter of present frame is bi+1-bi, i=1,2 ..., N-1;Specific iterative formula It is as follows:
ci=η (bi+1-bi), η < 1, i=2,3 ..., N-1 (8)
(1) forward direction iteration: from j=2 to j=N-1, forward direction adjusts jth rank Bark line spectrum pairs parameter;
(2) backward iteration: backward to adjust jth rank Bark line spectrum pairs parameter from j=N-1 to j=2;
(3) it is averaged: the preceding Bark line spectrum pairs parameter obtained to iteration and backward iteration is averaged to obtain enhanced bar Gram line spectrum pairs parameter;
In formula, η controls the degree of formant enhancing, and η is smaller, and reinforcing effect is more obvious.
Step 5) is to randomly select 75% from every kind of single vowel data set in SVD pathological voice database first to instruct Practice collection, 25% makees test set, guarantees to be evenly distributed in sorter network training and the every class voice data satisfaction of test phase, then will Wriggling polyp pathological voice/a/ ,/i/, enhanced bar of 12 ranks of/u/ and normal voice/a/ ,/i/ ,/u/ this 6 kinds of single vowel voices Gram line spectrum pairs parameter, which is input in deep neural network, to be identified, network parameter setting are as follows: 2 layers of hidden layer, every layer of 100 mind It through member, selects ReLU function as activation primitive, selects Softmax function by the defeated of neural network in identification model the last layer Become a probability distribution, and then Optimum Classification result out.
Multiple pathology unit voice recognition methods of a modification of the present invention, have the following beneficial effects:
1) present invention guarantees that more traditional MFCC, LPCC feature of improved multiple pathology unit voice recognition methods has preferably Discrimination proposes a kind of popularity feature E-BLSP suitable for the identification of multiple pathology single vowels.The E-BLSP newly proposed is special Sign realize to normal/a/ ,/i/ ,/u/ and pathology/a/ ,/i/ ,/u/6 kind single vowel high discrimination;
2) E-BLSP feature proposed by the present invention is higher than pathology/a/ sound, and traditional pathology to pathology/i/ sound discrimination Voice identification is all based on single vowel/a/ greatly, this improves new thinking for the identifying and diagnosing of pathological voice, also to be subsequent to unit The voice reparation of sound and more complicated words sentence provides Research foundation.
Detailed description of the invention
Fig. 1 is a kind of structural representation of improved multiple pathology unit voice recognition methods of the invention;
Fig. 2 a is 11 rank DAL parameter box figure of normal cell sound/a/;
Fig. 2 b is 11 rank DAL parameter box figure of pathology single vowel/a/;
Fig. 2 c is 11 rank DAL parameter box figure of normal cell sound/i/;
Fig. 2 d is 11 rank DAL parameter box figure of pathology single vowel/i/;
Fig. 2 e is 11 rank DAL parameter box figure of normal cell sound/u/;
Fig. 2 f is 11 rank DAL parameter box figure of pathology single vowel/u/;
Fig. 3 a is the schematic diagram of 12 rank LSP parameter of the embodiment of the present invention;
Fig. 3 b is the schematic diagram of 12 rank BLSP parameter of the embodiment of the present invention;
Fig. 4 a is the three-dimensional spectrum diagram of 12 rank BLSP parameter of the embodiment of the present invention;
Fig. 4 b is the three-dimensional spectrum diagram of 12 rank E-BLSP parameter of the embodiment of the present invention.
Specific embodiment
It is made in detail below with reference to multiple pathology unit voice recognition methods of the embodiment and attached drawing to a modification of the present invention It describes in detail bright.
As shown in Figure 1, multiple pathology unit voice recognition methods of a modification of the present invention, include the following steps:
1) line spectrum pair (Line Spectrum Pair, LSP) parameter of input speech signal is calculated;Include:
(1.1) Signal Pretreatment, including DC processing and sub-frame processing are carried out;
(1.2) it for every frame voice signal, is calculated according to the model order p=12 of setting using Lai Wenxun-Du Bin auto-correlation Method calculates 12 rank linear predictor coefficient ai
(1.3) the linear predictor coefficient a being calculated by (1.2)iLinear prediction inverse filter system function is calculated, It is as follows:
In formula, A (z) indicates linear prediction inverse filter system function;P indicates model order;aiIndicate linear prediction system Number;
(1.4) the p+1 rank symmetric and anti-symmetric multinomial of linear prediction inverse filter system function A (z) is calculated:
P (z)=A (z)+z-(p+1)A(z-1)(2)
In formula, P (z) indicates the p+1 rank symmetric polynomial of A (z), and A (z) indicates linear prediction inverse filter system function;p Indicate model order;
Q (z)=A (z)-z-(p+1)A(z-1)(3)
In formula, Q (z) indicates that the p+1 rank antisymmetry multinomial of A (z), A (z) indicate linear prediction inverse filter system letter Number;P indicates model order;
(1.5) line spectrum pairs parameter of 12 rank input speech signals is calculated by P (z) and Q (z):
In formula, H (e) it is linear predication spectrum amplitude, eIt is the frequency representation form of z, P (e) it is A (e) p+1 Rank symmetric polynomial, Q (e) it is A (e) p+1 rank antisymmetry multinomial, cos θiWith cos ωiIt is LSP coefficient in cosine domain Expression, θiAnd ωiIt is the corresponding line spectral frequencies of Line Spectral Pair coefficients (the Linear Spectrum of input speech signal Frequency, LSF), Π is to tire out to multiply symbol.
2) adjacent differential line spectrum pair (Difference of Adjacent LSP, the DAL) ginseng of input speech signal is calculated Number;
It is to calculate according to the following formula:
DALi=li+1-liI=1,2 ... M (M < N) (5)
In formula, DALiIt is the i-th rank adjacent differential line spectrum pairs parameter, li+1I+1 rank line spectrum pairs parameter, liI-th rank line spectrum pair Parameter, M are adjacent differential line spectrum pairs parameter maximum orders, and N is line spectrum pairs parameter maximum order.
3) frequency bending is carried out to the line spectrum pairs parameter of input speech signal, obtains the Bark line spectrum pair of input speech signal (BarkLine Spectrum Pair, BLSP) parameter;
The frequency bending is using following formula:
- 0.53 (6) Bark=26.81/ (1+ (1960/f))
In formula, Bark indicates Bark frequency;F indicates linear frequency.
4) carrying out feature to the Bark line spectrum pairs parameter of input speech signal enhances to obtain enhanced Bark line spectrum pair (Enhanced-Bark Line Spectrum Pair, E-BLSP) parameter;It is to take the mode of bidirectional iteration to jth rank Bark Line spectrum pairs parameter is adjusted, j=2 ..., N-1, original Bark line spectrum pairs parameter is directly updated after adjustment, and will be after adjustment Jth rank Bark line spectrum pairs parameter use in the Bark line spectrum pairs parameter for adjusting lower single order, set the Bark line spectrum pair of present frame Parameter is { b1, b2,...bN}N, N is the order of Bark line spectrum pair, and the coefficient of the adjacent differential line spectrum pairs parameter of present frame is bi+1-bi, i=1,2 ..., N-1;Specific iterative formula is as follows:
ci=η (bi+1-bi), η < 1, i=2,3 ..., N-1 (8)
(1) forward direction iteration: from j=2 to j=N-1, forward direction adjusts jth rank Bark line spectrum pairs parameter;
(2) backward iteration: backward to adjust jth rank Bark line spectrum pairs parameter from j=N-1 to j=2;
(3) it is averaged: the preceding Bark line spectrum pairs parameter obtained to iteration and backward iteration is averaged to obtain enhanced bar Gram line spectrum pairs parameter;
In formula, η controls the degree of formant enhancing, and η is smaller, and reinforcing effect is more obvious.
5) the enhanced Bark line spectrum pairs parameter of input speech signal is input in deep neural network classifier and is carried out The identification of multiple pathology single vowels.It is to be randomly selected from every kind of single vowel data set in SVD pathological voice database first 75% makees training set, and 25% makees test set, guarantees to meet average mark in sorter network training and the every class voice data of test phase Cloth, then by wriggling polyp pathological voice/a/ ,/i/ ,/u/ and normal voice/a/ ,/i/ ,/u/ this 6 kinds of single vowel voices 12 The enhanced Bark line spectrum pairs parameter of rank, which is input in deep neural network, to be identified, network parameter setting are as follows: 2 layers of hidden layer, Every layer of 100 neuron, select ReLU function as activation primitive, select Softmax function will in identification model the last layer The output of neural network becomes a probability distribution, and then Optimum Classification result.
Specific example is given below:
1, pre-process: the time span of every frame signal is 30ms, sample frequency 8KHz in sub-frame processing, and corresponding frame length is 240, frame pipettes 80
2, when calculating linear predictor coefficient, p=12
3, linear prediction inverse filter system function A (z) can be calculated by linear predictor coefficient
4, the p+1 rank symmetric and anti-symmetric multinomial P (z) and Q (z) of A (z) are calculated
5,12 rank LSP parameters are calculated by P (z) and Q (z)
6, by 12 rank LSP parameters calculate input speech signal 11 rank DAL (Difference of Adjacent LSP, DAL) parameter
7, frequency is carried out to the LSP parameter of input speech signal to bend to obtain BLSP (the Bark Line of input speech signal Spectrum Pair, BLSP) parameter
It is the box figure of 6 kinds of unit sound signal DAL parameters of the embodiment of the present invention shown in Fig. 2 a~Fig. 2 f.Wherein, shown in Fig. 2 a It is 11 rank DAL parameter box figure of normal cell sound/a/;It is 11 rank DAL parameter box figure of pathology single vowel/a/ shown in Fig. 2 b;Figure It is 11 rank DAL parameter box figure of normal cell sound/i/ shown in 2c;It is 11 rank DAL parameter box of pathology single vowel/i/ shown in Fig. 2 d Figure;It is 11 rank DAL parameter box figure of normal cell sound/u/ shown in Fig. 2 e;It is 11 rank DAL of pathology single vowel/u/ shown in Fig. 2 f Parameter box figure.
By Fig. 2 a~Fig. 2 f it is found that for normal/a/, tri- kinds of/i/ ,/u/ unit sound signals, preceding 7 rank DAL data distribution Rectangle frame difference is larger, has preferable discrimination to three kinds of single vowels;For tri- kinds of pathology/a/ ,/i/ ,/u/ unit sound signals, Preceding 7 rank DAL data are distributed more uniform than normal voice.For pathology/a/ sound, rear 4 rank DAL parameter and normal/a/ cent cloth It is completely different, and the rear 4 rank DAL data distribution of pathology/i/ sound and/u/ sound has more intersection, and it is poor to distinguish effect.Due to DAL low order parameter is higher than high band in view of DAL parameter low-frequency range discrimination to induction signal low frequency part, the embodiment of the present invention Characteristic and the domain Bark more can actual response human ear feeling that signal is generated, using the domain Bark change of scale to the LSP of extraction into Row non-linear frequency bends to obtain BLSP parameter, and Warping function is:
- 0.53 (6) Bark=26.81/ (1+ (1960/f))
In formula, Bark indicates Bark frequency;F indicates linear frequency.
It is the schematic diagram of 12 rank LSP parameter of the embodiment of the present invention and 12 rank BLSP parameters shown in Fig. 3 a~Fig. 3 b.With Fig. 3 a It compares, Fig. 3 b is exaggerated signal low frequency part, has compressed high frequency section, improves normal and the polynary sound of pathology discrimination.
8, carrying out feature to the BLSP parameter of input speech signal enhances to obtain E-BLSP (Enhanced-Bark Line SpectrumPair, E-BLSP) parameter: η controls the degree of formant enhancing, and η is smaller, and reinforcing effect is more obvious.The present invention is real It applies a η and takes 0.4.
It is that the three-dimensional frequency spectrums of 12 rank BLSP parameter of the embodiment of the present invention and 12 rank E-BLSP parameters shows shown in Fig. 4 a~Fig. 4 b It is intended to.Fig. 4 b is compared with Fig. 4 a, and amplitude greatly improves at formant frequency, and broadening effect is inhibited, and greatly strengthens normal With the discrimination of the polynary sound of pathology.
9, the E-BLSP parameter of input speech signal is input to the identification that multiple pathology single vowels are carried out in DNN classifier
The embodiment of the present invention randomly selects 75% from every kind of single vowel data set first and makees training set, and 25% tests Collection, guarantee it is trained and the every class voice data satisfaction of test phase being evenly distributed in sorter network, then by 6 kinds of single vowel voices 12 rank E-BLSP parameters are input in DNN (Deep Neural Network, DNN) network and are identified.Network parameter is arranged such as Under: 2 layers of hidden layer, every layer of 100 neuron.
The embodiment of the present invention is in terms of selection unit sound voice source signal, using the responsible recording of voice study institute, Sa Lan university SVD (Saarbrucken Voice Database, SVD) pathological voice database, comprising sustained vowel/a/ ,/i/ and/u/ Normal and various pathological voice signal, sample rate is unified for 50KHz, and resolution ratio is 16.Therefrom choose wriggling polyp pathology Each three kinds of sustained vowel/a/ ,/the i/ ,/u/ of voice and normal voice are tested, and sample rate is uniformly reduced to 8KHz.Every class voice Total sample number is 180, includes 4 kinds of different tones (normal, low, high and low-high-low).
The evaluation of the embodiment of the present invention mainly has accuracy rate and AUC two indices.Accuracy rate is defined as by correctly classification case The percentage of example, ROC (Receiver Operating Characteristic, ROC) curve are reflection sensibility and specificities The overall target of continuous variable can disclose the correlation of sensibility and specificity, AUC (Area Under with composition method Curve, AUC) it is defined as the area surrounded under ROC curve with reference axis, value range between 0.5 and 1, get over by the value of AUC Greatly, classifying quality is better.In order to guarantee the accuracy and popularity of experiment, every kind of feature combination experiment does 10 times, is made even As last classification results.
As can be seen from Table 1: feature of the invention to the discriminations of multiple pathology single vowels than using traditional MFCC and LPCC high.Highest accuracy rate is up to 97.3600%, AUC up to 0.9894.
Table 1

Claims (6)

1. a kind of improved multiple pathology unit voice recognition methods, which comprises the steps of:
1) line spectrum pairs parameter of input speech signal is calculated;
2) the adjacent differential line spectrum pairs parameter of input speech signal is calculated;
3) frequency bending is carried out to the line spectrum pairs parameter of input speech signal, obtains the Bark line spectrum pair ginseng of input speech signal Number;
4) carrying out feature to the Bark line spectrum pairs parameter of input speech signal enhances to obtain enhanced Bark line spectrum pairs parameter;
5) the enhanced Bark line spectrum pairs parameter of input speech signal is input in deep neural network classifier carry out it is multiple The identification of pathology single vowel.
2. the improved multiple pathology unit voice recognition methods of one kind according to claim 1, which is characterized in that step 1) packet It includes:
(1.1) Signal Pretreatment, including DC processing and sub-frame processing are carried out;
(1.2) for every frame voice signal, Lai Wenxun-Du Bin auto-correlation algorithm meter is used according to the model order p=12 of setting Calculate 12 rank linear predictor coefficient ai
(1.3) the linear predictor coefficient a being calculated by (1.2)iLinear prediction inverse filter system function is calculated, as follows:
In formula, A (z) indicates linear prediction inverse filter system function;P indicates model order;aiIndicate linear predictor coefficient;
(1.4) the p+1 rank symmetric and anti-symmetric multinomial of linear prediction inverse filter system function A (z) is calculated:
P (z)=A (z)+z-(p+1)A(z-1) (2)
In formula, P (z) indicates the p+1 rank symmetric polynomial of A (z), and A (z) indicates linear prediction inverse filter system function;P is indicated Model order;
Q (z)=A (z)-z-(p+1)A(z-1) (3)
In formula, Q (z) indicates that the p+1 rank antisymmetry multinomial of A (z), A (z) indicate linear prediction inverse filter system function;P table Representation model order;
(1.5) line spectrum pairs parameter of 12 rank input speech signals is calculated by P (z) and Q (z):
In formula, H (e) it is linear predication spectrum amplitude, eIt is the frequency representation form of z, P (e) it is A (e) p+1 rank pair Claim multinomial, Q (e) it is A (e) p+1 rank antisymmetry multinomial, cos θiWith cos ωiIt is table of the LSP coefficient in cosine domain Show, θiAnd ωiIt is the corresponding line spectral frequencies of Line Spectral Pair coefficients of input speech signal, Π is to tire out to multiply symbol.
3. the improved multiple pathology unit voice recognition methods of one kind according to claim 1, which is characterized in that step 2) is It calculates according to the following formula:
DALi=li+1-liI=1,2 ... M (M < N) (5)
In formula, DALiIt is the i-th rank adjacent differential line spectrum pairs parameter, li+1I+1 rank line spectrum pairs parameter, liI-th rank line spectrum pairs parameter, M is adjacent differential line spectrum pairs parameter maximum order, and N is line spectrum pairs parameter maximum order.
4. the improved multiple pathology unit voice recognition methods of one kind according to claim 1, which is characterized in that step 3) institute The frequency bending stated is using following formula:
- 0.53 (6) Bark=26.81/ (1+ (1960/f))
In formula, Bark indicates Bark frequency;F indicates linear frequency.
5. the improved multiple pathology unit voice recognition methods of one kind according to claim 1, which is characterized in that step 4) is The mode of bidirectional iteration is taken to be adjusted jth rank Bark line spectrum pairs parameter, j=2 ..., N-1 directly update former after adjustment The Bark line spectrum pairs parameter come, and jth rank Bark line spectrum pairs parameter adjusted is used to the Bark line spectrum pair for adjusting lower single order In parameter, the Bark line spectrum pairs parameter of present frame is set as { b1, b2,...bN}N, N is the order of Bark line spectrum pair, present frame The coefficient of adjacent differential line spectrum pairs parameter is bi+1-bi, i=1,2 ..., N-1;Specific iterative formula is as follows:
ci=η (bi+1-bi), η < 1, i=2,3 ..., N-1 (8)
(1) forward direction iteration: from j=2 to j=N-1, forward direction adjusts jth rank Bark line spectrum pairs parameter;
(2) backward iteration: backward to adjust jth rank Bark line spectrum pairs parameter from j=N-1 to j=2;
(3) it is averaged: the preceding Bark line spectrum pairs parameter obtained to iteration and backward iteration is averaged to obtain enhanced Bark line Spectrum is to parameter;
In formula, η controls the degree of formant enhancing, and η is smaller, and reinforcing effect is more obvious.
6. the improved multiple pathology unit voice recognition methods of one kind according to claim 1, which is characterized in that step 5) is 75% is randomly selected from every kind of single vowel data set in SVD pathological voice database first and makees training set, and 25% tests Collection guarantees to be evenly distributed in sorter network training and the every class voice data satisfaction of test phase, then by wriggling polyp pathology throat Sound/a/ ,/i/, the enhanced Bark line spectrum pairs parameter of 12 ranks of/u/ and normal voice/a/ ,/i/ ,/u/ this 6 kinds of single vowel voices are defeated Enter and is identified into deep neural network, network parameter setting are as follows: 2 layers of hidden layer, every layer of 100 neuron select ReLU Function selects Softmax function that the output of neural network is become one generally as activation primitive, in identification model the last layer Rate distribution, and then Optimum Classification result.
CN201910233952.0A 2019-03-26 2019-03-26 Improved method for identifying multiple pathological unit tones Active CN110070894B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910233952.0A CN110070894B (en) 2019-03-26 2019-03-26 Improved method for identifying multiple pathological unit tones

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910233952.0A CN110070894B (en) 2019-03-26 2019-03-26 Improved method for identifying multiple pathological unit tones

Publications (2)

Publication Number Publication Date
CN110070894A true CN110070894A (en) 2019-07-30
CN110070894B CN110070894B (en) 2021-08-03

Family

ID=67366671

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910233952.0A Active CN110070894B (en) 2019-03-26 2019-03-26 Improved method for identifying multiple pathological unit tones

Country Status (1)

Country Link
CN (1) CN110070894B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0774750A2 (en) * 1995-11-15 1997-05-21 Nokia Mobile Phones Ltd. Determination of line spectrum frequencies for use in a radiotelephone
US20040042622A1 (en) * 2002-08-29 2004-03-04 Mutsumi Saito Speech Processing apparatus and mobile communication terminal
US7257535B2 (en) * 1999-07-26 2007-08-14 Lucent Technologies Inc. Parametric speech codec for representing synthetic speech in the presence of background noise
CN101527141A (en) * 2009-03-10 2009-09-09 苏州大学 Method of converting whispered voice into normal voice based on radial group neutral network
CN103730130A (en) * 2013-12-20 2014-04-16 中国科学院深圳先进技术研究院 Detection method and system for pathological voice
CN106710604A (en) * 2016-12-07 2017-05-24 天津大学 Formant enhancement apparatus and method for improving speech intelligibility
CN107705801A (en) * 2016-08-05 2018-02-16 中国科学院自动化研究所 The training method and Speech bandwidth extension method of Speech bandwidth extension model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0774750A2 (en) * 1995-11-15 1997-05-21 Nokia Mobile Phones Ltd. Determination of line spectrum frequencies for use in a radiotelephone
US7257535B2 (en) * 1999-07-26 2007-08-14 Lucent Technologies Inc. Parametric speech codec for representing synthetic speech in the presence of background noise
US20040042622A1 (en) * 2002-08-29 2004-03-04 Mutsumi Saito Speech Processing apparatus and mobile communication terminal
CN101527141A (en) * 2009-03-10 2009-09-09 苏州大学 Method of converting whispered voice into normal voice based on radial group neutral network
CN103730130A (en) * 2013-12-20 2014-04-16 中国科学院深圳先进技术研究院 Detection method and system for pathological voice
CN107705801A (en) * 2016-08-05 2018-02-16 中国科学院自动化研究所 The training method and Speech bandwidth extension method of Speech bandwidth extension model
CN106710604A (en) * 2016-12-07 2017-05-24 天津大学 Formant enhancement apparatus and method for improving speech intelligibility

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
GHULAM MUHAMMAD ET AL.: "《Voice Pathology Detection Using Vocal Tract Area》", 《IEEE 2013 EUROPEAN MODELLING SYMPOSIUM》 *
HUI YE ET AL.: "《Quality-enhanced voice morphing using maximum likelihood transformations》", 《IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING ( VOLUME: 14, ISSUE: 4, JULY 2006)》 *
周佳秦等: "《采用线谱对分段定值偏移进行病理嗓音共振峰修正》", 《信息化研究》 *
彭策等: "《嗓音分析在疾病诊断中的应用》", 《生物医学工程学杂志》 *
薛隆基等: "《改进人工神经网络的病理嗓音共振峰修复》", 《电子器件》 *

Also Published As

Publication number Publication date
CN110070894B (en) 2021-08-03

Similar Documents

Publication Publication Date Title
CN110400579B (en) Speech emotion recognition based on direction self-attention mechanism and bidirectional long-time and short-time network
CN104732977B (en) A kind of online spoken language pronunciation quality evaluating method and system
Joshy et al. Automated dysarthria severity classification: A study on acoustic features and deep learning techniques
CN111798874A (en) Voice emotion recognition method and system
Zhang Music style classification algorithm based on music feature extraction and deep neural network
Vashkevich et al. Classification of ALS patients based on acoustic analysis of sustained vowel phonations
CN110827857B (en) Speech emotion recognition method based on spectral features and ELM
Almaadeed et al. Text-independent speaker identification using vowel formants
Xu et al. Parkinson’s disease detection based on spectrogram-deep convolutional generative adversarial network sample augmentation
Hwang et al. Mel-spectrogram augmentation for sequence to sequence voice conversion
Chen et al. Mandarin emotion recognition combining acoustic and emotional point information
Illa et al. The impact of speaking rate on acoustic-to-articulatory inversion
Sadeghi et al. Optimal MFCC features extraction by differential evolution algorithm for speaker recognition
Faúndez-Zanuy Speaker recognition by means of a combination of linear and nonlinear predictive models
Bak et al. Fastpitchformant: Source-filter based decomposed modeling for speech synthesis
Srinivasan et al. Classification of Normal and Pathological Voice using GA and SVM
Dubuisson et al. On the use of the correlation between acoustic descriptors for the normal/pathological voices discrimination
Mousavi et al. Persian classical music instrument recognition (PCMIR) using a novel Persian music database
Karthikeyan Adaptive boosted random forest-support vector machine based classification scheme for speaker identification
Chadha et al. Optimal feature extraction and selection techniques for speech processing: A review
Kamaruddin et al. Features extraction for speech emotion
Qadri et al. Speech emotion recognition using feature fusion of TEO and MFCC on multilingual databases
Vieira et al. Combining entropy measures and cepstral analysis for pathological voices assessment
CN110070894A (en) A kind of improved multiple pathology unit voice recognition methods
Bhaskar et al. Analysis of language identification performance based on gender and hierarchial grouping approaches

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant