CN110070894B - Improved method for identifying multiple pathological unit tones - Google Patents
Improved method for identifying multiple pathological unit tones Download PDFInfo
- Publication number
- CN110070894B CN110070894B CN201910233952.0A CN201910233952A CN110070894B CN 110070894 B CN110070894 B CN 110070894B CN 201910233952 A CN201910233952 A CN 201910233952A CN 110070894 B CN110070894 B CN 110070894B
- Authority
- CN
- China
- Prior art keywords
- line spectrum
- order
- spectrum pair
- parameters
- voice signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000001575 pathological effect Effects 0.000 title claims abstract description 48
- 238000000034 method Methods 0.000 title claims abstract description 20
- 238000001228 spectrum Methods 0.000 claims abstract description 92
- 238000013528 artificial neural network Methods 0.000 claims abstract description 11
- 230000007170 pathology Effects 0.000 claims abstract description 9
- 238000012360 testing method Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 8
- 230000000694 effects Effects 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 7
- 238000012935 Averaging Methods 0.000 claims description 6
- 230000003595 spectral effect Effects 0.000 claims description 6
- 238000009432 framing Methods 0.000 claims description 4
- 210000002569 neuron Anatomy 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000011160 research Methods 0.000 abstract description 6
- 230000006870 function Effects 0.000 description 18
- 238000010586 diagram Methods 0.000 description 16
- 208000037062 Polyps Diseases 0.000 description 3
- 230000002572 peristaltic effect Effects 0.000 description 3
- 230000008439 repair process Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000005452 bending Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 208000011293 voice disease Diseases 0.000 description 2
- 206010023825 Laryngeal cancer Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 206010023841 laryngeal neoplasm Diseases 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Telephonic Communication Services (AREA)
Abstract
An improved multiple pathology unit tone identification method, comprising: calculating line spectrum pair parameters of an input voice signal; calculating adjacent differential line spectrum pair parameters of the input voice signal; performing frequency warping on the line spectrum pair parameters of the input voice signal to obtain the bark line spectrum pair parameters of the input voice signal; performing characteristic enhancement on the Barker line spectrum pair parameters of the input voice signal to obtain enhanced Barker line spectrum pair parameters; and inputting the enhanced bucking line spectrum pair parameters of the input voice signal into a deep neural network classifier to identify a plurality of pathological unit tones. The method has better recognition rate and provides a research foundation for the subsequent voice restoration of unit voices and more complex words and sentences.
Description
Technical Field
The invention relates to a pathological unit tone identification method. And more particularly to an improved method of multiple pathological unit tone identification.
Background
Voice is the most direct way of language transmission, so the sound quality of voice directly affects the daily communication efficiency of people. About 750 million people in the united states are statistically afflicted with voice disease, with a prevalence of 57.7% for educational professionals and 28.8% for non-educational professionals. Furthermore, in the uk, approximately 2200 people are diagnosed with laryngeal cancer each year. The unclear voice can greatly reduce the life quality of people, so that the pathological voice is recognized and repaired, which is particularly important.
The voice disease can be treated by medicines and physical modes, but the expression of a patient is influenced by the treatment imperfectness, so that the identification and repair of pathological voice by adopting a non-invasive repair mode become the key of the research of researchers. The recognition and restoration of the unit voice is the basis of complex words and sentences. For the multiple unit voice recognition research, the current research objects are based on normal voice, and commonly used characteristic parameters are Linear Prediction Cepstrum Coeffient (LPCC), Mel-Frequency Cepstrum parameters (MFCC), formants and the like. However, the identification work aiming at pathological voices mostly focuses on two categories of pathological voices and normal voices, and because the identification rate of most acoustic characteristic parameter pairs/a/voice is almost higher than that of other vowels, pathological unit voice/a/is generally selected at home and abroad as an experimental sample, and the characteristic parameters of the voice sample are extracted and input into different classification networks to identify the pathological voices. The commonly used identification features include time-length features such as fundamental frequency disturbance and amplitude disturbance, and regression features such as MPEG-7 and Multidirectional regression (MDR). But the features (LPCC, MFCC) applied to the recognition of multiple normal single tones are less effective in recognizing multiple pathological single tones.
Disclosure of Invention
The technical problem to be solved by the invention is to provide an improved method for identifying a plurality of pathological unit voices, which can further improve the identification rate of pathological voices.
The technical scheme adopted by the invention is as follows: an improved method for identifying multiple pathological unit tones, comprising the steps of:
1) calculating line spectrum pair parameters of an input voice signal;
2) calculating adjacent differential line spectrum pair parameters of the input voice signal;
3) performing frequency warping on the line spectrum pair parameters of the input voice signal to obtain the bark line spectrum pair parameters of the input voice signal;
the frequency warping adopts the following formula:
Bark=26.81/(1+(1960/f))-0.53 (6)
wherein Bark represents Bark frequency; f represents a linear frequency;
4) according to the adjacent differential line spectrum pair parameters, carrying out feature enhancement on the Barker line spectrum pair parameters of the input voice signal to obtain enhanced Barker line spectrum pair parameters;
5) and inputting the enhanced bucking line spectrum pair parameters of the input voice signal into a deep neural network classifier to identify a plurality of pathological unit tones.
The step 1) comprises the following steps:
(1.1) performing signal preprocessing, including direct current removing processing and framing processing;
(1.2) for each frame of voice signal, calculating a 12-order linear prediction coefficient a by adopting a Levenson-Dubin autocorrelation algorithm according to the set model order p-12i;
(1.3) Linear prediction coefficient a calculated from (1.2)iThe linear prediction inverse filter system function is calculated as follows:
wherein A (z) represents a linear prediction inverse filter system function; p represents the model order; a isiRepresenting linear prediction coefficients;
(1.4) calculating p +1 order symmetric and antisymmetric polynomials for the linear prediction inverse filter system function a (z):
P(z)=A(z)+z-(p+1)A(z-1) (2)
wherein P (z) represents a symmetric polynomial of order p +1 of A (z), and A (z) represents a linear prediction inverse filter system function; p represents the model order;
Q(z)=A(z)-z-(p+1)A(z-1) (3)
wherein Q (z) represents an antisymmetric polynomial of order p +1 of A (z), and A (z) represents a linear prediction inverse filter system function; p represents the model order;
(1.5) calculating line spectrum pair parameters of the 12 th order input speech signal from P (z) and Q (z):
in the formula, H (e)jω) Is a linear prediction of the spectral amplitude, ejωIs a frequency representation of z, P (e)jω) Is A (e)jω) P +1 order symmetric polynomial of (a), (b), (c), (d), (e)jω) Is A (e)jω) P +1 order antisymmetric polynomial of (cos θ)iAnd cos omegaiIs that the LSP coefficients are in the cosine domainIs represented by thetaiAnd ωiIs the line spectrum frequency corresponding to the line spectrum pair coefficient of the input voice signal, pi is the accumulative multiplication sign.
Step 2) is calculated according to the following formula:
DALi=li+1-li,i=1,2,...M(M<N) (5)
in the formula, DALiIs the ith order adjacent differential line spectral pair parameter, li+1Line spectrum pair parameter of i +1 st order, liAnd the ith order line spectrum pair parameter, M is the maximum order of the adjacent differential line spectrum pair parameter, and N is the maximum order of the line spectrum pair parameter.
Step 4) adjusting the j-th-order bark line spectrum pair parameter in a bidirectional iteration mode according to the adjacent difference line spectrum, wherein j is 2, the original bark line spectrum pair parameter is directly updated after adjustment, the adjusted j-th-order bark line spectrum pair parameter is used for adjusting the next-order bark line spectrum pair parameter, and the bark line spectrum pair parameter of the current frame is set to be { b } b1,b2,...bN}NN is the maximum order of the line spectrum pair parameter, and the coefficient of the adjacent differential line spectrum pair parameter of the current frame is bi+1- b i1,2, 1, N-1; the specific iterative formula is as follows:
ci=η(bi+1-bi),η<1,i=2,3,...,N-1 (8)
(1) forward iteration: adjusting j-th-order buckline spectrum pair parameters in a forward direction from j-2 to j-N-1;
(2) backward iteration: adjusting j-th-order buckline spectrum pair parameters from j-N-1 to j-2;
(3) averaging: averaging the Barker line spectrum pair parameters obtained by the forward iteration and the backward iteration to obtain enhanced Barker line spectrum pair parameters;
in the formula, eta controls the enhancement degree of the formant, and the smaller eta is, the more obvious the enhancement effect is.
Step 5) firstly, randomly selecting 75% of each unit voice data set in the SVD pathological voice database as a training set and 25% of each unit voice data set as a testing set to ensure that each type of voice data meets the average distribution in the stage of training and testing the classification network, and then inputting the pathological voice/a/,/i/,/u/of the peristaltic polyp and the 12-order enhanced Barker line spectrum of the 6 unit voice into the deep neural network for identification, wherein the network parameters are set as follows: and 2, hidden layers are formed, each layer is provided with 100 neurons, a ReLU function is selected as an activation function, a Softmax function is selected in the last layer of the recognition model to change the output of the neural network into probability distribution, and then the classification result is optimized.
The improved method for identifying the multiple pathological unit tones has the following beneficial effects:
1) the invention ensures that the improved method for identifying the multiple pathological unit tones has better identification rate compared with the traditional MFCC and LPCC characteristics, and provides the broad characteristic E-BLSP suitable for identifying the multiple pathological unit tones. The newly proposed E-BLSP feature achieves high recognition rate for normal/a/,/i/,/u/and pathology/a/,/i/,/u/6 unit tones;
2) the E-BLSP provided by the invention has a higher recognition rate of pathology/i/voice than pathology/a/voice, and the traditional pathological voice recognition is mostly based on unit voice/a/, so that a new thought is improved for the recognition and diagnosis of pathological voice, and a research basis is provided for the voice repair of unit voice and more complex words and sentences.
Drawings
FIG. 1 is a schematic structural diagram of an improved multiple pathology unit tone identification method of the present invention;
FIG. 2a is a diagram of the 11-step DAL parameter box for normal single tones/a/;
FIG. 2b is a 11-order DAL parameter box diagram of pathological unit tones/a/;
FIG. 2c is a diagram of the 11-step DAL parameter box for normal unit tones/i/;
FIG. 2d is a 11-order DAL parameter box diagram of pathological unit tones/i/;
FIG. 2e is a diagram of the 11-order DAL parameter box for normal single tones/u/;
FIG. 2f is a 11-order DAL parameter box plot of pathological unit tones/u/;
fig. 3a is a schematic diagram of 12-order LSP parameters according to an embodiment of the present invention;
FIG. 3b is a diagram illustrating the BLSP parameter of 12 th order according to an embodiment of the present invention;
FIG. 4a is a schematic three-dimensional spectrum of a BLSP parameter of 12 th order according to an embodiment of the present invention;
FIG. 4b is a schematic three-dimensional spectrum of the 12 th order E-BLSP parameter of the present invention.
Detailed Description
The following describes an improved multiple pathological unit tone recognition method according to the present invention in detail with reference to the accompanying drawings and examples.
As shown in fig. 1, the improved method for recognizing multiple pathological unit tones of the present invention comprises the following steps:
1) calculating Line Spectrum Pair (LSP) parameters of an input voice signal; the method comprises the following steps:
(1.1) performing signal preprocessing, including direct current removing processing and framing processing;
(1.2) for each frame of voice signal, calculating a 12-order linear prediction coefficient a by adopting a Levenson-Dubin autocorrelation algorithm according to the set model order p-12i;
(1.3) Linear prediction coefficient a calculated from (1.2)iThe linear prediction inverse filter system function is calculated as follows:
wherein A (z) represents a linear prediction inverse filter system function; p represents the model order; a isiRepresenting linear prediction coefficients;
(1.4) calculating p +1 order symmetric and antisymmetric polynomials for the linear prediction inverse filter system function a (z):
P(z)=A(z)+z-(p+1)A(z-1) (2)
wherein P (z) represents a symmetric polynomial of order p +1 of A (z), and A (z) represents a linear prediction inverse filter system function; p represents the model order;
Q(z)=A(z)-z-(p+1)A(z-1) (3)
wherein Q (z) represents an antisymmetric polynomial of order p +1 of A (z), and A (z) represents a linear prediction inverse filter system function; p represents the model order;
(1.5) calculating line spectrum pair parameters of the 12 th order input speech signal from P (z) and Q (z):
in the formula, H (e)jω) Is a linear prediction of the spectral amplitude, ejωIs a frequency representation of z, P (e)jω) Is A (e)jω) P +1 order symmetric polynomial of (a), (b), (c), (d), (e)jω) Is A (e)jω) P +1 order antisymmetric polynomial of (cos θ)iAnd cos omegaiIs a representation of the LSP coefficients in the cosine domain, θiAnd ωiIs the Line Spectrum Frequency (LSF) corresponding to the line Spectrum pair coefficient of the input speech signal, Π is the multiplicative sign.
2) Calculating parameters of Adjacent Difference line spectrum pairs (DAL) of the input voice signals;
is calculated according to the following formula:
DALi=li+1-li,i=1,2,...M(M<N) (5)
in the formula, DALiIs the ith order adjacent differential line spectral pair parameter, li+1Line spectrum pair parameter of i +1 st order, liAnd the ith order line spectrum pair parameter, M is the maximum order of the adjacent differential line spectrum pair parameter, and N is the maximum order of the line spectrum pair parameter.
3) Performing frequency warping on Line Spectrum Pair parameters of an input voice signal to obtain Bark Line Spectrum Pair (BLSP) parameters of the input voice signal;
the frequency warping adopts the following formula:
Bark=26.81/(1+(1960/f))-0.53 (6)
wherein Bark represents Bark frequency; f represents a linear frequency.
4) Inputting speech signals according to adjacent differential line spectrum pairsThe parameters of the Barker Line Spectrum Pair are subjected to characteristic enhancement to obtain Enhanced Barker Line Spectrum Pair (E-BLSP) parameters; adjusting the parameters of a j-th-order bark line spectrum pair by adopting a bidirectional iteration mode according to adjacent differential line spectrums, wherein j is 2, 1, and N-1, directly updating the original bark line spectrum pair parameters after adjustment, using the adjusted j-th-order bark line spectrum pair parameters into the bark line spectrum pair parameters for adjusting the next-order, and setting the bark line spectrum pair parameters of the current frame as { b } b1,b2,...bN}NN is the maximum order of the line spectrum pair parameter, and the coefficient of the adjacent differential line spectrum pair parameter of the current frame is bi+1- b i1,2, 1, N-1; the specific iterative formula is as follows:
ci=η(bi+1-bi),η<1,i=2,3,...,N-1 (8)
(1) forward iteration: adjusting j-th-order buckline spectrum pair parameters in a forward direction from j-2 to j-N-1;
(2) backward iteration: adjusting j-th-order buckline spectrum pair parameters from j-N-1 to j-2;
(3) averaging: averaging the Barker line spectrum pair parameters obtained by the forward iteration and the backward iteration to obtain enhanced Barker line spectrum pair parameters;
in the formula, eta controls the enhancement degree of the formant, and the smaller eta is, the more obvious the enhancement effect is.
5) And inputting the enhanced bucking line spectrum pair parameters of the input voice signal into a deep neural network classifier to identify a plurality of pathological unit tones. Firstly, randomly selecting 75% of each unit voice data set in an SVD pathological voice database as a training set and 25% of each unit voice data set as a testing set to ensure that each type of voice data meets the average distribution in the stage of classification network training and testing, then inputting the parameters of the pathological voice/a/,/i/,/u/of the peristaltic polyp and the 12-order enhanced Barker line spectrum of the 6 unit voice into a deep neural network for identification, and setting the network parameters as follows: and 2, hidden layers are formed, each layer is provided with 100 neurons, a ReLU function is selected as an activation function, a Softmax function is selected in the last layer of the recognition model to change the output of the neural network into probability distribution, and then the classification result is optimized.
Specific examples are given below:
1. pretreatment: the time length of each frame signal in the framing processing is 30ms, the sampling frequency is 8KHz, the corresponding frame length is 240, and the frame shifting is 80
2. When calculating the linear prediction coefficient, p is 12
3. The linear prediction inverse filter system function A (z) can be obtained by calculating the linear prediction coefficient
4. Calculating p +1 order symmetric and antisymmetric polynomials P (z) and Q (z) for A (z)
5. Calculating 12 th order LSP parameters from P (z) and Q (z)
6. Computing 11-order DAL (Difference of Adjacent LSP, DAL) parameters of an input speech signal from 12-order LSP parameters
7. Performing frequency warping on LSP parameters of an input voice signal to obtain BLSP (Bar Link Spectrum Pair, BLSP) parameters of the input voice signal
Fig. 2a to 2f are box diagrams of DAL parameters of unit tone signals DAL according to embodiment 6 of the present invention. Wherein FIG. 2a is a diagram showing the 11-step DAL parameter box of normal single tone/a/; FIG. 2b is a diagram showing the 11-step DAL parameter box for pathological unit tone/a/; FIG. 2c is a diagram showing the 11-step DAL parameter box for normal single tones/i/; FIG. 2d is a diagram showing the 11-step DAL parameter box for pathological unit tones/i/; FIG. 2e is a diagram showing the 11-order DAL parameter box for normal single tones/u/; FIG. 2f shows the 11-order DAL parameter box for pathological unit tones/u/.
As can be seen from fig. 2a to 2f, for the normal/a/,/i/,/u/three unit tones, the rectangular frames of the first 7 th-order DAL data distribution have large differences, and have better discrimination for the three unit tones; for the pathology/a/,/i/,/u/three unit sound signals, the DAL data of the first 7 th order are distributed more uniformly than the normal voice. For pathology/a/voice, the later 4-order DAL parameters are completely different from the normal/a/voice distribution, and the later 4-order DAL data distribution of pathology/i/voice and/u/voice has more overlapped parts and poor distinguishing effect. Because the DAL low-order parameters correspond to the low-frequency part of the signal, the embodiment of the invention considers that the low-frequency section division of the DAL parameters is higher than the high-frequency section and the Bark domain can more truly reflect the feeling of the human ear to the signal, the BLSP parameters are obtained by carrying out nonlinear frequency bending on the extracted LSP by adopting the Bark domain transformation scale, and the bending function is as follows:
Bark=26.81/(1+(1960/f))-0.53 (6)
wherein Bark represents Bark frequency; f represents a linear frequency.
Fig. 3a to fig. 3b are schematic diagrams illustrating a 12-order LSP parameter and a 12-order BLSP parameter according to an embodiment of the present invention. Compared to fig. 3a, fig. 3b amplifies the low frequency part of the signal, compresses the high frequency part, and improves the discrimination between normal and pathological vowels.
8. Performing feature enhancement on BLSP parameters of an input voice signal to obtain E-BLSP (Enhanced-Bar Line Spectrum Pair, E-BLSP) parameters: eta controls the enhancement degree of the resonance peak, and the smaller eta is, the more obvious the enhancement effect is. In the example η of the present invention, 0.4 was used.
Fig. 4 a-4 b are schematic three-dimensional frequency spectrum diagrams of the BLSP parameter of 12 th order and the E-BLSP parameter of 12 th order according to the embodiment of the present invention. Compared with fig. 4a, the amplitude of fig. 4b is greatly improved at the resonance peak frequency, the broadening effect is suppressed, and the discrimination between normal and pathological vowels is greatly enhanced.
9. Inputting E-BLSP parameters of input speech signal into DNN classifier for recognizing multiple pathological unit tones
The embodiment of the invention firstly randomly selects 75% of each unit voice data set as a training set and 25% of each unit voice data set as a testing set to ensure that each type of voice data meets the average distribution in the training and testing stages of a classification Network, and then inputs 12-order E-BLSP parameters of 6 types of unit voice into a DNN (Deep Neural Network, DNN) Network for identification. The network parameters are set as follows: 2 layers of hidden layers, 100 neurons per layer.
In the embodiment of the invention, in the aspect of selecting the unit Voice source signals, an SVD (Saarbrucken Voice Database) pathological Voice Database which is recorded by the university of Saelan and is responsible for the Voice research institute is adopted, the Database comprises normal and various pathological Voice signals of continuous vowels/a/,/i/and/u/, the sampling rate is unified to be 50KHz, and the resolution is 16 bits. Three continuous vowels/a/,/i/,/u/, of pathological voice and normal voice of the peristaltic polyp are selected from the three continuous vowels, and the sampling rate is uniformly reduced to 8 KHz. The total number of speech samples per class is 180, containing 4 different tones (normal, low, high, low-high-low).
The evaluation of the embodiment of the invention mainly comprises two indexes of accuracy and AUC. The accuracy is defined as the percentage of cases which are correctly classified, an ROC (Receiver Operating Characteristic) Curve is a comprehensive index reflecting continuous variables of sensitivity and specificity, the correlation between the sensitivity and the specificity can be revealed by a composition method, AUC (Area Under Curve, AUC) is defined as the Area enclosed by coordinate axes Under the ROC Curve, the value range is between 0.5 and 1, and the larger the value of AUC is, the better the classification effect is. In order to ensure the accuracy and the universality of the experiment, each characteristic combination experiment is carried out for 10 times, and the average is taken as the final classification result.
As can be seen from table 1: the recognition rate of the characteristics of the invention on a plurality of pathological unit tones is higher than that of the conventional MFCC and LPCC. The highest accuracy can reach 97.3600%, and the AUC can reach 0.9894.
TABLE 1
Claims (5)
1. An improved method for recognizing multiple pathological unit tones, comprising the steps of:
1) calculating line spectrum pair parameters of an input voice signal;
2) calculating adjacent differential line spectrum pair parameters of the input voice signal;
3) performing frequency warping on the line spectrum pair parameters of the input voice signal to obtain the bark line spectrum pair parameters of the input voice signal; the frequency warping adopts the following formula:
Bark=26.81/(1+(1960/f))-0.53 (6)
wherein Bark represents Bark frequency; f represents a linear frequency;
4) according to the adjacent differential line spectrum pair parameters, carrying out feature enhancement on the Barker line spectrum pair parameters of the input voice signal to obtain enhanced Barker line spectrum pair parameters;
5) and inputting the enhanced bucking line spectrum pair parameters of the input voice signal into a deep neural network classifier to identify a plurality of pathological unit tones.
2. An improved multiple pathology unit tone identification method according to claim 1, wherein step 1) comprises:
(1.1) performing signal preprocessing, including direct current removing processing and framing processing;
(1.2) for each frame of voice signal, calculating a 12-order linear prediction coefficient a by adopting a Levenson-Dubin autocorrelation algorithm according to the set model order p-12i;
(1.3) Linear prediction coefficient a calculated from (1.2)iThe linear prediction inverse filter system function is calculated as follows:
wherein A (z) represents a linear prediction inverse filter system function; p represents the model order; a isiRepresenting linear prediction coefficients;
(1.4) calculating p +1 order symmetric and antisymmetric polynomials for the linear prediction inverse filter system function a (z):
P(z)=A(z)+z-(p+1)A(z-1) (2)
wherein P (z) represents a symmetric polynomial of order p +1 of A (z), and A (z) represents a linear prediction inverse filter system function; p represents the model order;
Q(z)=A(z)-z-(p+1)A(z-1) (3)
wherein Q (z) represents an antisymmetric polynomial of order p +1 of A (z), and A (z) represents a linear prediction inverse filter system function; p represents the model order;
(1.5) calculating line spectrum pair parameters of the 12 th order input speech signal from P (z) and Q (z):
in the formula, H (e)jω) Is a linear prediction of the spectral amplitude, ejωIs a frequency representation of z, P (e)jω) Is A (e)jω) P +1 order symmetric polynomial of (a), (b), (c), (d), (e)jω) Is A (e)jω) P +1 order antisymmetric polynomial of (cos θ)iAnd cos omegaiIs a representation of the LSP coefficients in the cosine domain, θiAnd ωiIs the line spectrum frequency corresponding to the line spectrum pair coefficient of the input voice signal, pi is the accumulative multiplication sign.
3. An improved method for multiple pathological unit tone recognition according to claim 1, wherein step 2) is calculated according to the following formula:
DALi=li+1-li,i=1,2,...M(M<N) (5)
in the formula, DALiIs the ith order adjacent differential line spectral pair parameter, li+1Line spectrum pair parameter of i +1 st order, liAnd the ith order line spectrum pair parameter, M is the maximum order of the adjacent differential line spectrum pair parameter, and N is the maximum order of the line spectrum pair parameter.
4. The improved multiple pathological unit tone identification method according to claim 1, wherein in step 4), the j-th-order bark line spectrum pair parameter is adjusted by bidirectional iteration according to the adjacent difference line spectrum pair parameters, wherein j is 21,b2,...bN}NN is a line spectrum referenceThe maximum order of the number, the coefficient of the adjacent differential line spectrum pair parameter of the current frame is bi+1-bi1,2, 1, N-1; the specific iterative formula is as follows:
ci=η(bi+1-bi),η<1,i=2,3,...,N-1 (8)
(1) forward iteration: adjusting j-th-order buckline spectrum pair parameters in a forward direction from j-2 to j-N-1;
(2) backward iteration: adjusting j-th-order buckline spectrum pair parameters from j-N-1 to j-2;
(3) averaging: averaging the Barker line spectrum pair parameters obtained by the forward iteration and the backward iteration to obtain enhanced Barker line spectrum pair parameters;
in the formula, eta controls the enhancement degree of the formant, and the smaller eta is, the more obvious the enhancement effect is.
5. The improved multiple pathological unit voice recognition method of claim 1, wherein step 5) is to randomly select 75% of each unit voice data set in the SVD pathological voice database as a training set and 25% of each unit voice data set as a testing set to ensure that each class voice data satisfies an average distribution during the training and testing phases of the classification network, and then input the parameters of the pathological voice/a/,/i/,/u/and the 12 th order enhanced barker line spectrum of the 6 unit voice voices/a/,/i/,/u/to the deep neural network for recognition, wherein the network parameters are set as: and 2, hidden layers are formed, each layer is provided with 100 neurons, a ReLU function is selected as an activation function, a Softmax function is selected in the last layer of the recognition model to change the output of the neural network into probability distribution, and then the classification result is optimized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910233952.0A CN110070894B (en) | 2019-03-26 | 2019-03-26 | Improved method for identifying multiple pathological unit tones |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910233952.0A CN110070894B (en) | 2019-03-26 | 2019-03-26 | Improved method for identifying multiple pathological unit tones |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110070894A CN110070894A (en) | 2019-07-30 |
CN110070894B true CN110070894B (en) | 2021-08-03 |
Family
ID=67366671
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910233952.0A Active CN110070894B (en) | 2019-03-26 | 2019-03-26 | Improved method for identifying multiple pathological unit tones |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110070894B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0774750A2 (en) * | 1995-11-15 | 1997-05-21 | Nokia Mobile Phones Ltd. | Determination of line spectrum frequencies for use in a radiotelephone |
US20040042622A1 (en) * | 2002-08-29 | 2004-03-04 | Mutsumi Saito | Speech Processing apparatus and mobile communication terminal |
US7257535B2 (en) * | 1999-07-26 | 2007-08-14 | Lucent Technologies Inc. | Parametric speech codec for representing synthetic speech in the presence of background noise |
CN103730130A (en) * | 2013-12-20 | 2014-04-16 | 中国科学院深圳先进技术研究院 | Detection method and system for pathological voice |
CN106710604A (en) * | 2016-12-07 | 2017-05-24 | 天津大学 | Formant enhancement apparatus and method for improving speech intelligibility |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101527141B (en) * | 2009-03-10 | 2011-06-22 | 苏州大学 | Method of converting whispered voice into normal voice based on radial group neutral network |
CN107705801B (en) * | 2016-08-05 | 2020-10-02 | 中国科学院自动化研究所 | Training method of voice bandwidth extension model and voice bandwidth extension method |
-
2019
- 2019-03-26 CN CN201910233952.0A patent/CN110070894B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0774750A2 (en) * | 1995-11-15 | 1997-05-21 | Nokia Mobile Phones Ltd. | Determination of line spectrum frequencies for use in a radiotelephone |
US7257535B2 (en) * | 1999-07-26 | 2007-08-14 | Lucent Technologies Inc. | Parametric speech codec for representing synthetic speech in the presence of background noise |
US20040042622A1 (en) * | 2002-08-29 | 2004-03-04 | Mutsumi Saito | Speech Processing apparatus and mobile communication terminal |
CN103730130A (en) * | 2013-12-20 | 2014-04-16 | 中国科学院深圳先进技术研究院 | Detection method and system for pathological voice |
CN106710604A (en) * | 2016-12-07 | 2017-05-24 | 天津大学 | Formant enhancement apparatus and method for improving speech intelligibility |
Non-Patent Citations (5)
Title |
---|
《Quality-enhanced voice morphing using maximum likelihood transformations》;Hui Ye et al.;《IEEE Transactions on Audio, Speech, and Language Processing ( Volume: 14, Issue: 4, July 2006)》;20061231;全文 * |
《Voice Pathology Detection Using Vocal Tract Area》;Ghulam Muhammad et al.;《IEEE 2013 European Modelling Symposium》;20141231;全文 * |
《嗓音分析在疾病诊断中的应用》;彭策等;《生物医学工程学杂志》;20071231;全文 * |
《改进人工神经网络的病理嗓音共振峰修复》;薛隆基等;《电子器件》;20190228;全文 * |
《采用线谱对分段定值偏移进行病理嗓音共振峰修正》;周佳秦等;《信息化研究》;20160430;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110070894A (en) | 2019-07-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113012720B (en) | Depression detection method by multi-voice feature fusion under spectral subtraction noise reduction | |
CN107274888B (en) | Emotional voice recognition method based on octave signal strength and differentiated feature subset | |
CN104050965A (en) | English phonetic pronunciation quality evaluation system with emotion recognition function and method thereof | |
CN110827857B (en) | Speech emotion recognition method based on spectral features and ELM | |
Pawar et al. | Review of various stages in speaker recognition system, performance measures and recognition toolkits | |
Zhang et al. | Multiple vowels repair based on pitch extraction and line spectrum pair feature for voice disorder | |
Ling et al. | Attention-Based Convolutional Neural Network for ASV Spoofing Detection. | |
CN106782599A (en) | The phonetics transfer method of post filtering is exported based on Gaussian process | |
Ghezaiel et al. | Hybrid network for end-to-end text-independent speaker identification | |
Nawas et al. | Speaker recognition using random forest | |
Cheng et al. | DNN-based speech enhancement with self-attention on feature dimension | |
Woubie et al. | Voice-quality Features for Deep Neural Network Based Speaker Verification Systems | |
Karthikeyan | Adaptive boosted random forest-support vector machine based classification scheme for speaker identification | |
CN110070894B (en) | Improved method for identifying multiple pathological unit tones | |
Singh et al. | Novel feature extraction algorithm using DWT and temporal statistical techniques for word dependent speaker’s recognition | |
CN112466276A (en) | Speech synthesis system training method and device and readable storage medium | |
Nasr et al. | Text-independent speaker recognition using deep neural networks | |
Boualoulou et al. | Speech analysis for the detection of Parkinson’s disease by combined use of empirical mode decomposition, Mel frequency cepstral coefficients, and the K-nearest neighbor classifier | |
CN105741853A (en) | Digital speech perception hash method based on formant frequency | |
Neto et al. | Feature estimation for vocal fold edema detection using short-term cepstral analysis | |
Velayuthapandian et al. | A focus module-based lightweight end-to-end CNN framework for voiceprint recognition | |
Zailan et al. | Comparative analysis of LPC and MFCC for male speaker recognition in text-independent context | |
Thirumuru et al. | Application of non-negative frequency-weighted energy operator for vowel region detection | |
Zi et al. | Joint filter combination-based central difference feature extraction and attention-enhanced Dense-Res2Block network for short-utterance speaker recognition | |
CN113299295A (en) | Training method and device for voiceprint coding network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |