CN110070894A - A kind of improved multiple pathology unit voice recognition methods - Google Patents
A kind of improved multiple pathology unit voice recognition methods Download PDFInfo
- Publication number
- CN110070894A CN110070894A CN201910233952.0A CN201910233952A CN110070894A CN 110070894 A CN110070894 A CN 110070894A CN 201910233952 A CN201910233952 A CN 201910233952A CN 110070894 A CN110070894 A CN 110070894A
- Authority
- CN
- China
- Prior art keywords
- line spectrum
- parameter
- bark
- spectrum pairs
- rank
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000007170 pathology Effects 0.000 title claims abstract description 44
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000001228 spectrum Methods 0.000 claims abstract description 93
- 238000013528 artificial neural network Methods 0.000 claims abstract description 11
- 238000005452 bending Methods 0.000 claims abstract description 7
- 230000001575 pathological effect Effects 0.000 claims description 14
- 238000012360 testing method Methods 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 7
- 230000003595 spectral effect Effects 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- 208000037062 Polyps Diseases 0.000 claims description 4
- 230000002708 enhancing effect Effects 0.000 claims description 4
- 230000003014 reinforcing effect Effects 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 3
- 241000208340 Araliaceae Species 0.000 claims description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 claims description 2
- 235000008434 ginseng Nutrition 0.000 claims description 2
- 238000011160 research Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 14
- 210000004027 cell Anatomy 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 239000004744 fabric Substances 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 230000002459 sustained effect Effects 0.000 description 2
- 208000011293 voice disease Diseases 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Telephonic Communication Services (AREA)
Abstract
A kind of improved multiple pathology unit voice recognition methods, comprising: calculate the line spectrum pairs parameter of input speech signal;Calculate the adjacent differential line spectrum pairs parameter of input speech signal;Frequency bending is carried out to the line spectrum pairs parameter of input speech signal, obtains the Bark line spectrum pairs parameter of input speech signal;Carrying out feature to the Bark line spectrum pairs parameter of input speech signal enhances to obtain enhanced Bark line spectrum pairs parameter;The enhanced Bark line spectrum pairs parameter of input speech signal is input to the identification that multiple pathology single vowels are carried out in deep neural network classifier.The present invention has better discrimination, to be subsequent to single vowel and the voice reparation of more complicated words sentence provides Research foundation.
Description
Technical field
The present invention relates to a kind of pathology unit voice recognition methods.Know more particularly to a kind of improved multiple pathology single vowels
Other method
Background technique
Voice is that language transmits most direct mode, therefore voice quality directly affects the daily communication effect of people
Rate.Voice Disorders are suffered from the U.S. about 7,500,000 people according to statistics, wherein the voice illness rate of profession of teaching personage is
57.7%, non-profession of teaching is 28.8%.In addition, about 2200 people are diagnosed with laryngocarcinoma every year in Britain.Voice
Ambiguous can greatly reduce people's lives quality, therefore be identified that then reparation seems especially heavy to pathological voice
It wants.
Voice Disorders can be treated by drug and physics mode, but the no thoroughness treated can be to the expression of illness language person
It has an impact, therefore identification is carried out to pathological voice using Noninvasive repair mode and repairs the key for becoming scholars and studying.
The identification reparation of single vowel voice is the basis of complicated words sentence.For multiple single vowel voice Study of recognition, at present research pair
As being all based on normal voice, commonly used linear prediction cepstrum parameter (the Linear Prediction of characteristic parameter
Cepstrum Coefficient, LPCC), Mel frequency cepstral parameter (Mel-Frequency Cepstral
Coefficients, MFCC) and formant etc..However mostly it is conceived to pathological voice and just for the identification work of pathological voice
Two classification of normal voice, due to most of acoustical characteristic parameters it is right/discrimination of a/ sound is nearly all higher than other vowels, state
The inside and outside pathology single vowel/a/ that is typically chosen is as experiment sample, by extracting the characteristic parameter of voice sample and being entered into
The identification of different classifications network progress pathological voice.Common identification feature have fundamental frequency disturbance, amplitude disturbances it is isometric when feature,
MPEG-7 and multidirectional regression M DR (Multidirectional regression, MDR) etc. return feature etc..But it is applied to more
It is poor to the recognition effect of multiple pathology single vowels that a normal cell sound knows another characteristic (LPCC, MFCC).
Summary of the invention
The technical problem to be solved by the invention is to provide the improvement that one kind can further increase pathological voice discrimination
Multiple pathology unit voice recognition methods.
The technical scheme adopted by the invention is that: a kind of improved multiple pathology unit voice recognition methods, including walk as follows
It is rapid:
1) line spectrum pairs parameter of input speech signal is calculated;
2) the adjacent differential line spectrum pairs parameter of input speech signal is calculated;
3) frequency bending is carried out to the line spectrum pairs parameter of input speech signal, obtains the Bark line spectrum pair of input speech signal
Parameter;
4) carrying out feature to the Bark line spectrum pairs parameter of input speech signal enhances to obtain enhanced Bark line spectrum pairs parameter;
5) the enhanced Bark line spectrum pairs parameter of input speech signal is input in deep neural network classifier and is carried out
The identification of multiple pathology single vowels.
Step 1) includes:
(1.1) Signal Pretreatment, including DC processing and sub-frame processing are carried out;
(1.2) it for every frame voice signal, is calculated according to the model order p=12 of setting using Lai Wenxun-Du Bin auto-correlation
Method calculates 12 rank linear predictor coefficient ai;
(1.3) the linear predictor coefficient a being calculated by (1.2)iLinear prediction inverse filter system function is calculated,
It is as follows:
In formula, A (z) indicates linear prediction inverse filter system function;P indicates model order;aiIndicate linear prediction system
Number;
(1.4) the p+1 rank symmetric and anti-symmetric multinomial of linear prediction inverse filter system function A (z) is calculated:
P (z)=A (z)+z-(p+1)A(z-1)(2)
In formula, P (z) indicates the p+1 rank symmetric polynomial of A (z), and A (z) indicates linear prediction inverse filter system function;p
Indicate model order;
Q (z)=A (z)-z-(p+1)A(z-1)(3)
In formula, Q (z) indicates that the p+1 rank antisymmetry multinomial of A (z), A (z) indicate linear prediction inverse filter system letter
Number;P indicates model order;
(1.5) line spectrum pairs parameter of 12 rank input speech signals is calculated by P (z) and Q (z):
In formula, H (ejω) it is linear predication spectrum amplitude, ejωIt is the frequency representation form of z, P (ejω) it is A (ejω) p+1
Rank symmetric polynomial, Q (ejω) it is A (ejω) p+1 rank antisymmetry multinomial, cos θiWith cos ωiIt is LSP coefficient in cosine domain
Expression, θiAnd ωiIt is the corresponding line spectral frequencies of Line Spectral Pair coefficients of input speech signal, Π is to tire out to multiply symbol.
Step 2) is to calculate according to the following formula:
DALi=li+1-liI=1,2 ... M (M < N) (5)
In formula, DALiIt is the i-th rank adjacent differential line spectrum pairs parameter, li+1I+1 rank line spectrum pairs parameter, liI-th rank line spectrum pair
Parameter, M are adjacent differential line spectrum pairs parameter maximum orders, and N is line spectrum pairs parameter maximum order.
The bending of frequency described in step 3) is using following formula:
- 0.53 (6) Bark=26.81/ (1+ (1960/f))
In formula, Bark indicates Bark frequency;F indicates linear frequency.
Step 4) is that the mode of bidirectional iteration is taken to be adjusted jth rank Bark line spectrum pairs parameter, j=2 ..., N-1,
Original Bark line spectrum pairs parameter is directly updated after adjustment, and jth rank Bark line spectrum pairs parameter adjusted is used under adjustment
In the Bark line spectrum pairs parameter of single order, the Bark line spectrum pairs parameter of present frame is set as { b1, b2,...bN}N, N is Bark line spectrum
Pair order, the coefficient of the adjacent differential line spectrum pairs parameter of present frame is bi+1-bi, i=1,2 ..., N-1;Specific iterative formula
It is as follows:
ci=η (bi+1-bi), η < 1, i=2,3 ..., N-1 (8)
(1) forward direction iteration: from j=2 to j=N-1, forward direction adjusts jth rank Bark line spectrum pairs parameter;
(2) backward iteration: backward to adjust jth rank Bark line spectrum pairs parameter from j=N-1 to j=2;
(3) it is averaged: the preceding Bark line spectrum pairs parameter obtained to iteration and backward iteration is averaged to obtain enhanced bar
Gram line spectrum pairs parameter;
In formula, η controls the degree of formant enhancing, and η is smaller, and reinforcing effect is more obvious.
Step 5) is to randomly select 75% from every kind of single vowel data set in SVD pathological voice database first to instruct
Practice collection, 25% makees test set, guarantees to be evenly distributed in sorter network training and the every class voice data satisfaction of test phase, then will
Wriggling polyp pathological voice/a/ ,/i/, enhanced bar of 12 ranks of/u/ and normal voice/a/ ,/i/ ,/u/ this 6 kinds of single vowel voices
Gram line spectrum pairs parameter, which is input in deep neural network, to be identified, network parameter setting are as follows: 2 layers of hidden layer, every layer of 100 mind
It through member, selects ReLU function as activation primitive, selects Softmax function by the defeated of neural network in identification model the last layer
Become a probability distribution, and then Optimum Classification result out.
Multiple pathology unit voice recognition methods of a modification of the present invention, have the following beneficial effects:
1) present invention guarantees that more traditional MFCC, LPCC feature of improved multiple pathology unit voice recognition methods has preferably
Discrimination proposes a kind of popularity feature E-BLSP suitable for the identification of multiple pathology single vowels.The E-BLSP newly proposed is special
Sign realize to normal/a/ ,/i/ ,/u/ and pathology/a/ ,/i/ ,/u/6 kind single vowel high discrimination;
2) E-BLSP feature proposed by the present invention is higher than pathology/a/ sound, and traditional pathology to pathology/i/ sound discrimination
Voice identification is all based on single vowel/a/ greatly, this improves new thinking for the identifying and diagnosing of pathological voice, also to be subsequent to unit
The voice reparation of sound and more complicated words sentence provides Research foundation.
Detailed description of the invention
Fig. 1 is a kind of structural representation of improved multiple pathology unit voice recognition methods of the invention;
Fig. 2 a is 11 rank DAL parameter box figure of normal cell sound/a/;
Fig. 2 b is 11 rank DAL parameter box figure of pathology single vowel/a/;
Fig. 2 c is 11 rank DAL parameter box figure of normal cell sound/i/;
Fig. 2 d is 11 rank DAL parameter box figure of pathology single vowel/i/;
Fig. 2 e is 11 rank DAL parameter box figure of normal cell sound/u/;
Fig. 2 f is 11 rank DAL parameter box figure of pathology single vowel/u/;
Fig. 3 a is the schematic diagram of 12 rank LSP parameter of the embodiment of the present invention;
Fig. 3 b is the schematic diagram of 12 rank BLSP parameter of the embodiment of the present invention;
Fig. 4 a is the three-dimensional spectrum diagram of 12 rank BLSP parameter of the embodiment of the present invention;
Fig. 4 b is the three-dimensional spectrum diagram of 12 rank E-BLSP parameter of the embodiment of the present invention.
Specific embodiment
It is made in detail below with reference to multiple pathology unit voice recognition methods of the embodiment and attached drawing to a modification of the present invention
It describes in detail bright.
As shown in Figure 1, multiple pathology unit voice recognition methods of a modification of the present invention, include the following steps:
1) line spectrum pair (Line Spectrum Pair, LSP) parameter of input speech signal is calculated;Include:
(1.1) Signal Pretreatment, including DC processing and sub-frame processing are carried out;
(1.2) it for every frame voice signal, is calculated according to the model order p=12 of setting using Lai Wenxun-Du Bin auto-correlation
Method calculates 12 rank linear predictor coefficient ai;
(1.3) the linear predictor coefficient a being calculated by (1.2)iLinear prediction inverse filter system function is calculated,
It is as follows:
In formula, A (z) indicates linear prediction inverse filter system function;P indicates model order;aiIndicate linear prediction system
Number;
(1.4) the p+1 rank symmetric and anti-symmetric multinomial of linear prediction inverse filter system function A (z) is calculated:
P (z)=A (z)+z-(p+1)A(z-1)(2)
In formula, P (z) indicates the p+1 rank symmetric polynomial of A (z), and A (z) indicates linear prediction inverse filter system function;p
Indicate model order;
Q (z)=A (z)-z-(p+1)A(z-1)(3)
In formula, Q (z) indicates that the p+1 rank antisymmetry multinomial of A (z), A (z) indicate linear prediction inverse filter system letter
Number;P indicates model order;
(1.5) line spectrum pairs parameter of 12 rank input speech signals is calculated by P (z) and Q (z):
In formula, H (ejω) it is linear predication spectrum amplitude, ejωIt is the frequency representation form of z, P (ejω) it is A (ejω) p+1
Rank symmetric polynomial, Q (ejω) it is A (ejω) p+1 rank antisymmetry multinomial, cos θiWith cos ωiIt is LSP coefficient in cosine domain
Expression, θiAnd ωiIt is the corresponding line spectral frequencies of Line Spectral Pair coefficients (the Linear Spectrum of input speech signal
Frequency, LSF), Π is to tire out to multiply symbol.
2) adjacent differential line spectrum pair (Difference of Adjacent LSP, the DAL) ginseng of input speech signal is calculated
Number;
It is to calculate according to the following formula:
DALi=li+1-liI=1,2 ... M (M < N) (5)
In formula, DALiIt is the i-th rank adjacent differential line spectrum pairs parameter, li+1I+1 rank line spectrum pairs parameter, liI-th rank line spectrum pair
Parameter, M are adjacent differential line spectrum pairs parameter maximum orders, and N is line spectrum pairs parameter maximum order.
3) frequency bending is carried out to the line spectrum pairs parameter of input speech signal, obtains the Bark line spectrum pair of input speech signal
(BarkLine Spectrum Pair, BLSP) parameter;
The frequency bending is using following formula:
- 0.53 (6) Bark=26.81/ (1+ (1960/f))
In formula, Bark indicates Bark frequency;F indicates linear frequency.
4) carrying out feature to the Bark line spectrum pairs parameter of input speech signal enhances to obtain enhanced Bark line spectrum pair
(Enhanced-Bark Line Spectrum Pair, E-BLSP) parameter;It is to take the mode of bidirectional iteration to jth rank Bark
Line spectrum pairs parameter is adjusted, j=2 ..., N-1, original Bark line spectrum pairs parameter is directly updated after adjustment, and will be after adjustment
Jth rank Bark line spectrum pairs parameter use in the Bark line spectrum pairs parameter for adjusting lower single order, set the Bark line spectrum pair of present frame
Parameter is { b1, b2,...bN}N, N is the order of Bark line spectrum pair, and the coefficient of the adjacent differential line spectrum pairs parameter of present frame is
bi+1-bi, i=1,2 ..., N-1;Specific iterative formula is as follows:
ci=η (bi+1-bi), η < 1, i=2,3 ..., N-1 (8)
(1) forward direction iteration: from j=2 to j=N-1, forward direction adjusts jth rank Bark line spectrum pairs parameter;
(2) backward iteration: backward to adjust jth rank Bark line spectrum pairs parameter from j=N-1 to j=2;
(3) it is averaged: the preceding Bark line spectrum pairs parameter obtained to iteration and backward iteration is averaged to obtain enhanced bar
Gram line spectrum pairs parameter;
In formula, η controls the degree of formant enhancing, and η is smaller, and reinforcing effect is more obvious.
5) the enhanced Bark line spectrum pairs parameter of input speech signal is input in deep neural network classifier and is carried out
The identification of multiple pathology single vowels.It is to be randomly selected from every kind of single vowel data set in SVD pathological voice database first
75% makees training set, and 25% makees test set, guarantees to meet average mark in sorter network training and the every class voice data of test phase
Cloth, then by wriggling polyp pathological voice/a/ ,/i/ ,/u/ and normal voice/a/ ,/i/ ,/u/ this 6 kinds of single vowel voices 12
The enhanced Bark line spectrum pairs parameter of rank, which is input in deep neural network, to be identified, network parameter setting are as follows: 2 layers of hidden layer,
Every layer of 100 neuron, select ReLU function as activation primitive, select Softmax function will in identification model the last layer
The output of neural network becomes a probability distribution, and then Optimum Classification result.
Specific example is given below:
1, pre-process: the time span of every frame signal is 30ms, sample frequency 8KHz in sub-frame processing, and corresponding frame length is
240, frame pipettes 80
2, when calculating linear predictor coefficient, p=12
3, linear prediction inverse filter system function A (z) can be calculated by linear predictor coefficient
4, the p+1 rank symmetric and anti-symmetric multinomial P (z) and Q (z) of A (z) are calculated
5,12 rank LSP parameters are calculated by P (z) and Q (z)
6, by 12 rank LSP parameters calculate input speech signal 11 rank DAL (Difference of Adjacent LSP,
DAL) parameter
7, frequency is carried out to the LSP parameter of input speech signal to bend to obtain BLSP (the Bark Line of input speech signal
Spectrum Pair, BLSP) parameter
It is the box figure of 6 kinds of unit sound signal DAL parameters of the embodiment of the present invention shown in Fig. 2 a~Fig. 2 f.Wherein, shown in Fig. 2 a
It is 11 rank DAL parameter box figure of normal cell sound/a/;It is 11 rank DAL parameter box figure of pathology single vowel/a/ shown in Fig. 2 b;Figure
It is 11 rank DAL parameter box figure of normal cell sound/i/ shown in 2c;It is 11 rank DAL parameter box of pathology single vowel/i/ shown in Fig. 2 d
Figure;It is 11 rank DAL parameter box figure of normal cell sound/u/ shown in Fig. 2 e;It is 11 rank DAL of pathology single vowel/u/ shown in Fig. 2 f
Parameter box figure.
By Fig. 2 a~Fig. 2 f it is found that for normal/a/, tri- kinds of/i/ ,/u/ unit sound signals, preceding 7 rank DAL data distribution
Rectangle frame difference is larger, has preferable discrimination to three kinds of single vowels;For tri- kinds of pathology/a/ ,/i/ ,/u/ unit sound signals,
Preceding 7 rank DAL data are distributed more uniform than normal voice.For pathology/a/ sound, rear 4 rank DAL parameter and normal/a/ cent cloth
It is completely different, and the rear 4 rank DAL data distribution of pathology/i/ sound and/u/ sound has more intersection, and it is poor to distinguish effect.Due to
DAL low order parameter is higher than high band in view of DAL parameter low-frequency range discrimination to induction signal low frequency part, the embodiment of the present invention
Characteristic and the domain Bark more can actual response human ear feeling that signal is generated, using the domain Bark change of scale to the LSP of extraction into
Row non-linear frequency bends to obtain BLSP parameter, and Warping function is:
- 0.53 (6) Bark=26.81/ (1+ (1960/f))
In formula, Bark indicates Bark frequency;F indicates linear frequency.
It is the schematic diagram of 12 rank LSP parameter of the embodiment of the present invention and 12 rank BLSP parameters shown in Fig. 3 a~Fig. 3 b.With Fig. 3 a
It compares, Fig. 3 b is exaggerated signal low frequency part, has compressed high frequency section, improves normal and the polynary sound of pathology discrimination.
8, carrying out feature to the BLSP parameter of input speech signal enhances to obtain E-BLSP (Enhanced-Bark Line
SpectrumPair, E-BLSP) parameter: η controls the degree of formant enhancing, and η is smaller, and reinforcing effect is more obvious.The present invention is real
It applies a η and takes 0.4.
It is that the three-dimensional frequency spectrums of 12 rank BLSP parameter of the embodiment of the present invention and 12 rank E-BLSP parameters shows shown in Fig. 4 a~Fig. 4 b
It is intended to.Fig. 4 b is compared with Fig. 4 a, and amplitude greatly improves at formant frequency, and broadening effect is inhibited, and greatly strengthens normal
With the discrimination of the polynary sound of pathology.
9, the E-BLSP parameter of input speech signal is input to the identification that multiple pathology single vowels are carried out in DNN classifier
The embodiment of the present invention randomly selects 75% from every kind of single vowel data set first and makees training set, and 25% tests
Collection, guarantee it is trained and the every class voice data satisfaction of test phase being evenly distributed in sorter network, then by 6 kinds of single vowel voices
12 rank E-BLSP parameters are input in DNN (Deep Neural Network, DNN) network and are identified.Network parameter is arranged such as
Under: 2 layers of hidden layer, every layer of 100 neuron.
The embodiment of the present invention is in terms of selection unit sound voice source signal, using the responsible recording of voice study institute, Sa Lan university
SVD (Saarbrucken Voice Database, SVD) pathological voice database, comprising sustained vowel/a/ ,/i/ and/u/
Normal and various pathological voice signal, sample rate is unified for 50KHz, and resolution ratio is 16.Therefrom choose wriggling polyp pathology
Each three kinds of sustained vowel/a/ ,/the i/ ,/u/ of voice and normal voice are tested, and sample rate is uniformly reduced to 8KHz.Every class voice
Total sample number is 180, includes 4 kinds of different tones (normal, low, high and low-high-low).
The evaluation of the embodiment of the present invention mainly has accuracy rate and AUC two indices.Accuracy rate is defined as by correctly classification case
The percentage of example, ROC (Receiver Operating Characteristic, ROC) curve are reflection sensibility and specificities
The overall target of continuous variable can disclose the correlation of sensibility and specificity, AUC (Area Under with composition method
Curve, AUC) it is defined as the area surrounded under ROC curve with reference axis, value range between 0.5 and 1, get over by the value of AUC
Greatly, classifying quality is better.In order to guarantee the accuracy and popularity of experiment, every kind of feature combination experiment does 10 times, is made even
As last classification results.
As can be seen from Table 1: feature of the invention to the discriminations of multiple pathology single vowels than using traditional MFCC and
LPCC high.Highest accuracy rate is up to 97.3600%, AUC up to 0.9894.
Table 1
Claims (6)
1. a kind of improved multiple pathology unit voice recognition methods, which comprises the steps of:
1) line spectrum pairs parameter of input speech signal is calculated;
2) the adjacent differential line spectrum pairs parameter of input speech signal is calculated;
3) frequency bending is carried out to the line spectrum pairs parameter of input speech signal, obtains the Bark line spectrum pair ginseng of input speech signal
Number;
4) carrying out feature to the Bark line spectrum pairs parameter of input speech signal enhances to obtain enhanced Bark line spectrum pairs parameter;
5) the enhanced Bark line spectrum pairs parameter of input speech signal is input in deep neural network classifier carry out it is multiple
The identification of pathology single vowel.
2. the improved multiple pathology unit voice recognition methods of one kind according to claim 1, which is characterized in that step 1) packet
It includes:
(1.1) Signal Pretreatment, including DC processing and sub-frame processing are carried out;
(1.2) for every frame voice signal, Lai Wenxun-Du Bin auto-correlation algorithm meter is used according to the model order p=12 of setting
Calculate 12 rank linear predictor coefficient ai;
(1.3) the linear predictor coefficient a being calculated by (1.2)iLinear prediction inverse filter system function is calculated, as follows:
In formula, A (z) indicates linear prediction inverse filter system function;P indicates model order;aiIndicate linear predictor coefficient;
(1.4) the p+1 rank symmetric and anti-symmetric multinomial of linear prediction inverse filter system function A (z) is calculated:
P (z)=A (z)+z-(p+1)A(z-1) (2)
In formula, P (z) indicates the p+1 rank symmetric polynomial of A (z), and A (z) indicates linear prediction inverse filter system function;P is indicated
Model order;
Q (z)=A (z)-z-(p+1)A(z-1) (3)
In formula, Q (z) indicates that the p+1 rank antisymmetry multinomial of A (z), A (z) indicate linear prediction inverse filter system function;P table
Representation model order;
(1.5) line spectrum pairs parameter of 12 rank input speech signals is calculated by P (z) and Q (z):
In formula, H (ejω) it is linear predication spectrum amplitude, ejωIt is the frequency representation form of z, P (ejω) it is A (ejω) p+1 rank pair
Claim multinomial, Q (ejω) it is A (ejω) p+1 rank antisymmetry multinomial, cos θiWith cos ωiIt is table of the LSP coefficient in cosine domain
Show, θiAnd ωiIt is the corresponding line spectral frequencies of Line Spectral Pair coefficients of input speech signal, Π is to tire out to multiply symbol.
3. the improved multiple pathology unit voice recognition methods of one kind according to claim 1, which is characterized in that step 2) is
It calculates according to the following formula:
DALi=li+1-liI=1,2 ... M (M < N) (5)
In formula, DALiIt is the i-th rank adjacent differential line spectrum pairs parameter, li+1I+1 rank line spectrum pairs parameter, liI-th rank line spectrum pairs parameter,
M is adjacent differential line spectrum pairs parameter maximum order, and N is line spectrum pairs parameter maximum order.
4. the improved multiple pathology unit voice recognition methods of one kind according to claim 1, which is characterized in that step 3) institute
The frequency bending stated is using following formula:
- 0.53 (6) Bark=26.81/ (1+ (1960/f))
In formula, Bark indicates Bark frequency;F indicates linear frequency.
5. the improved multiple pathology unit voice recognition methods of one kind according to claim 1, which is characterized in that step 4) is
The mode of bidirectional iteration is taken to be adjusted jth rank Bark line spectrum pairs parameter, j=2 ..., N-1 directly update former after adjustment
The Bark line spectrum pairs parameter come, and jth rank Bark line spectrum pairs parameter adjusted is used to the Bark line spectrum pair for adjusting lower single order
In parameter, the Bark line spectrum pairs parameter of present frame is set as { b1, b2,...bN}N, N is the order of Bark line spectrum pair, present frame
The coefficient of adjacent differential line spectrum pairs parameter is bi+1-bi, i=1,2 ..., N-1;Specific iterative formula is as follows:
ci=η (bi+1-bi), η < 1, i=2,3 ..., N-1 (8)
(1) forward direction iteration: from j=2 to j=N-1, forward direction adjusts jth rank Bark line spectrum pairs parameter;
(2) backward iteration: backward to adjust jth rank Bark line spectrum pairs parameter from j=N-1 to j=2;
(3) it is averaged: the preceding Bark line spectrum pairs parameter obtained to iteration and backward iteration is averaged to obtain enhanced Bark line
Spectrum is to parameter;
In formula, η controls the degree of formant enhancing, and η is smaller, and reinforcing effect is more obvious.
6. the improved multiple pathology unit voice recognition methods of one kind according to claim 1, which is characterized in that step 5) is
75% is randomly selected from every kind of single vowel data set in SVD pathological voice database first and makees training set, and 25% tests
Collection guarantees to be evenly distributed in sorter network training and the every class voice data satisfaction of test phase, then by wriggling polyp pathology throat
Sound/a/ ,/i/, the enhanced Bark line spectrum pairs parameter of 12 ranks of/u/ and normal voice/a/ ,/i/ ,/u/ this 6 kinds of single vowel voices are defeated
Enter and is identified into deep neural network, network parameter setting are as follows: 2 layers of hidden layer, every layer of 100 neuron select ReLU
Function selects Softmax function that the output of neural network is become one generally as activation primitive, in identification model the last layer
Rate distribution, and then Optimum Classification result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910233952.0A CN110070894B (en) | 2019-03-26 | 2019-03-26 | Improved method for identifying multiple pathological unit tones |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910233952.0A CN110070894B (en) | 2019-03-26 | 2019-03-26 | Improved method for identifying multiple pathological unit tones |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110070894A true CN110070894A (en) | 2019-07-30 |
CN110070894B CN110070894B (en) | 2021-08-03 |
Family
ID=67366671
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910233952.0A Active CN110070894B (en) | 2019-03-26 | 2019-03-26 | Improved method for identifying multiple pathological unit tones |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110070894B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0774750A2 (en) * | 1995-11-15 | 1997-05-21 | Nokia Mobile Phones Ltd. | Determination of line spectrum frequencies for use in a radiotelephone |
US20040042622A1 (en) * | 2002-08-29 | 2004-03-04 | Mutsumi Saito | Speech Processing apparatus and mobile communication terminal |
US7257535B2 (en) * | 1999-07-26 | 2007-08-14 | Lucent Technologies Inc. | Parametric speech codec for representing synthetic speech in the presence of background noise |
CN101527141A (en) * | 2009-03-10 | 2009-09-09 | 苏州大学 | Method of converting whispered voice into normal voice based on radial group neutral network |
CN103730130A (en) * | 2013-12-20 | 2014-04-16 | 中国科学院深圳先进技术研究院 | Detection method and system for pathological voice |
CN106710604A (en) * | 2016-12-07 | 2017-05-24 | 天津大学 | Formant enhancement apparatus and method for improving speech intelligibility |
CN107705801A (en) * | 2016-08-05 | 2018-02-16 | 中国科学院自动化研究所 | The training method and Speech bandwidth extension method of Speech bandwidth extension model |
-
2019
- 2019-03-26 CN CN201910233952.0A patent/CN110070894B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0774750A2 (en) * | 1995-11-15 | 1997-05-21 | Nokia Mobile Phones Ltd. | Determination of line spectrum frequencies for use in a radiotelephone |
US7257535B2 (en) * | 1999-07-26 | 2007-08-14 | Lucent Technologies Inc. | Parametric speech codec for representing synthetic speech in the presence of background noise |
US20040042622A1 (en) * | 2002-08-29 | 2004-03-04 | Mutsumi Saito | Speech Processing apparatus and mobile communication terminal |
CN101527141A (en) * | 2009-03-10 | 2009-09-09 | 苏州大学 | Method of converting whispered voice into normal voice based on radial group neutral network |
CN103730130A (en) * | 2013-12-20 | 2014-04-16 | 中国科学院深圳先进技术研究院 | Detection method and system for pathological voice |
CN107705801A (en) * | 2016-08-05 | 2018-02-16 | 中国科学院自动化研究所 | The training method and Speech bandwidth extension method of Speech bandwidth extension model |
CN106710604A (en) * | 2016-12-07 | 2017-05-24 | 天津大学 | Formant enhancement apparatus and method for improving speech intelligibility |
Non-Patent Citations (5)
Title |
---|
GHULAM MUHAMMAD ET AL.: "《Voice Pathology Detection Using Vocal Tract Area》", 《IEEE 2013 EUROPEAN MODELLING SYMPOSIUM》 * |
HUI YE ET AL.: "《Quality-enhanced voice morphing using maximum likelihood transformations》", 《IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING ( VOLUME: 14, ISSUE: 4, JULY 2006)》 * |
周佳秦等: "《采用线谱对分段定值偏移进行病理嗓音共振峰修正》", 《信息化研究》 * |
彭策等: "《嗓音分析在疾病诊断中的应用》", 《生物医学工程学杂志》 * |
薛隆基等: "《改进人工神经网络的病理嗓音共振峰修复》", 《电子器件》 * |
Also Published As
Publication number | Publication date |
---|---|
CN110070894B (en) | 2021-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110400579B (en) | Speech emotion recognition based on direction self-attention mechanism and bidirectional long-time and short-time network | |
CN104732977B (en) | A kind of online spoken language pronunciation quality evaluating method and system | |
Joshy et al. | Automated dysarthria severity classification: A study on acoustic features and deep learning techniques | |
CN111798874A (en) | Voice emotion recognition method and system | |
Zhang | Music style classification algorithm based on music feature extraction and deep neural network | |
Vashkevich et al. | Classification of ALS patients based on acoustic analysis of sustained vowel phonations | |
CN110827857B (en) | Speech emotion recognition method based on spectral features and ELM | |
Almaadeed et al. | Text-independent speaker identification using vowel formants | |
Xu et al. | Parkinson’s disease detection based on spectrogram-deep convolutional generative adversarial network sample augmentation | |
Hwang et al. | Mel-spectrogram augmentation for sequence to sequence voice conversion | |
Chen et al. | Mandarin emotion recognition combining acoustic and emotional point information | |
Illa et al. | The impact of speaking rate on acoustic-to-articulatory inversion | |
Sadeghi et al. | Optimal MFCC features extraction by differential evolution algorithm for speaker recognition | |
Faúndez-Zanuy | Speaker recognition by means of a combination of linear and nonlinear predictive models | |
Bak et al. | Fastpitchformant: Source-filter based decomposed modeling for speech synthesis | |
Srinivasan et al. | Classification of Normal and Pathological Voice using GA and SVM | |
Dubuisson et al. | On the use of the correlation between acoustic descriptors for the normal/pathological voices discrimination | |
Mousavi et al. | Persian classical music instrument recognition (PCMIR) using a novel Persian music database | |
Karthikeyan | Adaptive boosted random forest-support vector machine based classification scheme for speaker identification | |
Chadha et al. | Optimal feature extraction and selection techniques for speech processing: A review | |
Kamaruddin et al. | Features extraction for speech emotion | |
Qadri et al. | Speech emotion recognition using feature fusion of TEO and MFCC on multilingual databases | |
Vieira et al. | Combining entropy measures and cepstral analysis for pathological voices assessment | |
CN110070894A (en) | A kind of improved multiple pathology unit voice recognition methods | |
Bhaskar et al. | Analysis of language identification performance based on gender and hierarchial grouping approaches |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |