CN110070894B - Improved method for identifying multiple pathological unit tones - Google Patents

Improved method for identifying multiple pathological unit tones Download PDF

Info

Publication number
CN110070894B
CN110070894B CN201910233952.0A CN201910233952A CN110070894B CN 110070894 B CN110070894 B CN 110070894B CN 201910233952 A CN201910233952 A CN 201910233952A CN 110070894 B CN110070894 B CN 110070894B
Authority
CN
China
Prior art keywords
line spectrum
order
spectrum pair
parameters
voice signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910233952.0A
Other languages
Chinese (zh)
Other versions
CN110070894A (en
Inventor
张涛
武雅琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201910233952.0A priority Critical patent/CN110070894B/en
Publication of CN110070894A publication Critical patent/CN110070894A/en
Application granted granted Critical
Publication of CN110070894B publication Critical patent/CN110070894B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Telephonic Communication Services (AREA)

Abstract

An improved multiple pathology unit tone identification method, comprising: calculating line spectrum pair parameters of an input voice signal; calculating adjacent differential line spectrum pair parameters of the input voice signal; performing frequency warping on the line spectrum pair parameters of the input voice signal to obtain the bark line spectrum pair parameters of the input voice signal; performing characteristic enhancement on the Barker line spectrum pair parameters of the input voice signal to obtain enhanced Barker line spectrum pair parameters; and inputting the enhanced bucking line spectrum pair parameters of the input voice signal into a deep neural network classifier to identify a plurality of pathological unit tones. The method has better recognition rate and provides a research foundation for the subsequent voice restoration of unit voices and more complex words and sentences.

Description

Improved method for identifying multiple pathological unit tones
Technical Field
The invention relates to a pathological unit tone identification method. And more particularly to an improved method of multiple pathological unit tone identification.
Background
Voice is the most direct way of language transmission, so the sound quality of voice directly affects the daily communication efficiency of people. About 750 million people in the united states are statistically afflicted with voice disease, with a prevalence of 57.7% for educational professionals and 28.8% for non-educational professionals. Furthermore, in the uk, approximately 2200 people are diagnosed with laryngeal cancer each year. The unclear voice can greatly reduce the life quality of people, so that the pathological voice is recognized and repaired, which is particularly important.
The voice disease can be treated by medicines and physical modes, but the expression of a patient is influenced by the treatment imperfectness, so that the identification and repair of pathological voice by adopting a non-invasive repair mode become the key of the research of researchers. The recognition and restoration of the unit voice is the basis of complex words and sentences. For the multiple unit voice recognition research, the current research objects are based on normal voice, and commonly used characteristic parameters are Linear Prediction Cepstrum Coeffient (LPCC), Mel-Frequency Cepstrum parameters (MFCC), formants and the like. However, the identification work aiming at pathological voices mostly focuses on two categories of pathological voices and normal voices, and because the identification rate of most acoustic characteristic parameter pairs/a/voice is almost higher than that of other vowels, pathological unit voice/a/is generally selected at home and abroad as an experimental sample, and the characteristic parameters of the voice sample are extracted and input into different classification networks to identify the pathological voices. The commonly used identification features include time-length features such as fundamental frequency disturbance and amplitude disturbance, and regression features such as MPEG-7 and Multidirectional regression (MDR). But the features (LPCC, MFCC) applied to the recognition of multiple normal single tones are less effective in recognizing multiple pathological single tones.
Disclosure of Invention
The technical problem to be solved by the invention is to provide an improved method for identifying a plurality of pathological unit voices, which can further improve the identification rate of pathological voices.
The technical scheme adopted by the invention is as follows: an improved method for identifying multiple pathological unit tones, comprising the steps of:
1) calculating line spectrum pair parameters of an input voice signal;
2) calculating adjacent differential line spectrum pair parameters of the input voice signal;
3) performing frequency warping on the line spectrum pair parameters of the input voice signal to obtain the bark line spectrum pair parameters of the input voice signal;
the frequency warping adopts the following formula:
Bark=26.81/(1+(1960/f))-0.53 (6)
wherein Bark represents Bark frequency; f represents a linear frequency;
4) according to the adjacent differential line spectrum pair parameters, carrying out feature enhancement on the Barker line spectrum pair parameters of the input voice signal to obtain enhanced Barker line spectrum pair parameters;
5) and inputting the enhanced bucking line spectrum pair parameters of the input voice signal into a deep neural network classifier to identify a plurality of pathological unit tones.
The step 1) comprises the following steps:
(1.1) performing signal preprocessing, including direct current removing processing and framing processing;
(1.2) for each frame of voice signal, calculating a 12-order linear prediction coefficient a by adopting a Levenson-Dubin autocorrelation algorithm according to the set model order p-12i
(1.3) Linear prediction coefficient a calculated from (1.2)iThe linear prediction inverse filter system function is calculated as follows:
Figure GDA0002958569990000021
wherein A (z) represents a linear prediction inverse filter system function; p represents the model order; a isiRepresenting linear prediction coefficients;
(1.4) calculating p +1 order symmetric and antisymmetric polynomials for the linear prediction inverse filter system function a (z):
P(z)=A(z)+z-(p+1)A(z-1) (2)
wherein P (z) represents a symmetric polynomial of order p +1 of A (z), and A (z) represents a linear prediction inverse filter system function; p represents the model order;
Q(z)=A(z)-z-(p+1)A(z-1) (3)
wherein Q (z) represents an antisymmetric polynomial of order p +1 of A (z), and A (z) represents a linear prediction inverse filter system function; p represents the model order;
(1.5) calculating line spectrum pair parameters of the 12 th order input speech signal from P (z) and Q (z):
Figure GDA0002958569990000022
in the formula, H (e)) Is a linear prediction of the spectral amplitude, eIs a frequency representation of z, P (e)) Is A (e)) P +1 order symmetric polynomial of (a), (b), (c), (d), (e)) Is A (e)) P +1 order antisymmetric polynomial of (cos θ)iAnd cos omegaiIs that the LSP coefficients are in the cosine domainIs represented by thetaiAnd ωiIs the line spectrum frequency corresponding to the line spectrum pair coefficient of the input voice signal, pi is the accumulative multiplication sign.
Step 2) is calculated according to the following formula:
DALi=li+1-li,i=1,2,...M(M<N) (5)
in the formula, DALiIs the ith order adjacent differential line spectral pair parameter, li+1Line spectrum pair parameter of i +1 st order, liAnd the ith order line spectrum pair parameter, M is the maximum order of the adjacent differential line spectrum pair parameter, and N is the maximum order of the line spectrum pair parameter.
Step 4) adjusting the j-th-order bark line spectrum pair parameter in a bidirectional iteration mode according to the adjacent difference line spectrum, wherein j is 2, the original bark line spectrum pair parameter is directly updated after adjustment, the adjusted j-th-order bark line spectrum pair parameter is used for adjusting the next-order bark line spectrum pair parameter, and the bark line spectrum pair parameter of the current frame is set to be { b } b1,b2,...bN}NN is the maximum order of the line spectrum pair parameter, and the coefficient of the adjacent differential line spectrum pair parameter of the current frame is bi+1- b i1,2, 1, N-1; the specific iterative formula is as follows:
Figure GDA0002958569990000023
ci=η(bi+1-bi),η<1,i=2,3,...,N-1 (8)
(1) forward iteration: adjusting j-th-order buckline spectrum pair parameters in a forward direction from j-2 to j-N-1;
(2) backward iteration: adjusting j-th-order buckline spectrum pair parameters from j-N-1 to j-2;
(3) averaging: averaging the Barker line spectrum pair parameters obtained by the forward iteration and the backward iteration to obtain enhanced Barker line spectrum pair parameters;
in the formula, eta controls the enhancement degree of the formant, and the smaller eta is, the more obvious the enhancement effect is.
Step 5) firstly, randomly selecting 75% of each unit voice data set in the SVD pathological voice database as a training set and 25% of each unit voice data set as a testing set to ensure that each type of voice data meets the average distribution in the stage of training and testing the classification network, and then inputting the pathological voice/a/,/i/,/u/of the peristaltic polyp and the 12-order enhanced Barker line spectrum of the 6 unit voice into the deep neural network for identification, wherein the network parameters are set as follows: and 2, hidden layers are formed, each layer is provided with 100 neurons, a ReLU function is selected as an activation function, a Softmax function is selected in the last layer of the recognition model to change the output of the neural network into probability distribution, and then the classification result is optimized.
The improved method for identifying the multiple pathological unit tones has the following beneficial effects:
1) the invention ensures that the improved method for identifying the multiple pathological unit tones has better identification rate compared with the traditional MFCC and LPCC characteristics, and provides the broad characteristic E-BLSP suitable for identifying the multiple pathological unit tones. The newly proposed E-BLSP feature achieves high recognition rate for normal/a/,/i/,/u/and pathology/a/,/i/,/u/6 unit tones;
2) the E-BLSP provided by the invention has a higher recognition rate of pathology/i/voice than pathology/a/voice, and the traditional pathological voice recognition is mostly based on unit voice/a/, so that a new thought is improved for the recognition and diagnosis of pathological voice, and a research basis is provided for the voice repair of unit voice and more complex words and sentences.
Drawings
FIG. 1 is a schematic structural diagram of an improved multiple pathology unit tone identification method of the present invention;
FIG. 2a is a diagram of the 11-step DAL parameter box for normal single tones/a/;
FIG. 2b is a 11-order DAL parameter box diagram of pathological unit tones/a/;
FIG. 2c is a diagram of the 11-step DAL parameter box for normal unit tones/i/;
FIG. 2d is a 11-order DAL parameter box diagram of pathological unit tones/i/;
FIG. 2e is a diagram of the 11-order DAL parameter box for normal single tones/u/;
FIG. 2f is a 11-order DAL parameter box plot of pathological unit tones/u/;
fig. 3a is a schematic diagram of 12-order LSP parameters according to an embodiment of the present invention;
FIG. 3b is a diagram illustrating the BLSP parameter of 12 th order according to an embodiment of the present invention;
FIG. 4a is a schematic three-dimensional spectrum of a BLSP parameter of 12 th order according to an embodiment of the present invention;
FIG. 4b is a schematic three-dimensional spectrum of the 12 th order E-BLSP parameter of the present invention.
Detailed Description
The following describes an improved multiple pathological unit tone recognition method according to the present invention in detail with reference to the accompanying drawings and examples.
As shown in fig. 1, the improved method for recognizing multiple pathological unit tones of the present invention comprises the following steps:
1) calculating Line Spectrum Pair (LSP) parameters of an input voice signal; the method comprises the following steps:
(1.1) performing signal preprocessing, including direct current removing processing and framing processing;
(1.2) for each frame of voice signal, calculating a 12-order linear prediction coefficient a by adopting a Levenson-Dubin autocorrelation algorithm according to the set model order p-12i
(1.3) Linear prediction coefficient a calculated from (1.2)iThe linear prediction inverse filter system function is calculated as follows:
Figure GDA0002958569990000041
wherein A (z) represents a linear prediction inverse filter system function; p represents the model order; a isiRepresenting linear prediction coefficients;
(1.4) calculating p +1 order symmetric and antisymmetric polynomials for the linear prediction inverse filter system function a (z):
P(z)=A(z)+z-(p+1)A(z-1) (2)
wherein P (z) represents a symmetric polynomial of order p +1 of A (z), and A (z) represents a linear prediction inverse filter system function; p represents the model order;
Q(z)=A(z)-z-(p+1)A(z-1) (3)
wherein Q (z) represents an antisymmetric polynomial of order p +1 of A (z), and A (z) represents a linear prediction inverse filter system function; p represents the model order;
(1.5) calculating line spectrum pair parameters of the 12 th order input speech signal from P (z) and Q (z):
Figure GDA0002958569990000042
in the formula, H (e)) Is a linear prediction of the spectral amplitude, eIs a frequency representation of z, P (e)) Is A (e)) P +1 order symmetric polynomial of (a), (b), (c), (d), (e)) Is A (e)) P +1 order antisymmetric polynomial of (cos θ)iAnd cos omegaiIs a representation of the LSP coefficients in the cosine domain, θiAnd ωiIs the Line Spectrum Frequency (LSF) corresponding to the line Spectrum pair coefficient of the input speech signal, Π is the multiplicative sign.
2) Calculating parameters of Adjacent Difference line spectrum pairs (DAL) of the input voice signals;
is calculated according to the following formula:
DALi=li+1-li,i=1,2,...M(M<N) (5)
in the formula, DALiIs the ith order adjacent differential line spectral pair parameter, li+1Line spectrum pair parameter of i +1 st order, liAnd the ith order line spectrum pair parameter, M is the maximum order of the adjacent differential line spectrum pair parameter, and N is the maximum order of the line spectrum pair parameter.
3) Performing frequency warping on Line Spectrum Pair parameters of an input voice signal to obtain Bark Line Spectrum Pair (BLSP) parameters of the input voice signal;
the frequency warping adopts the following formula:
Bark=26.81/(1+(1960/f))-0.53 (6)
wherein Bark represents Bark frequency; f represents a linear frequency.
4) Inputting speech signals according to adjacent differential line spectrum pairsThe parameters of the Barker Line Spectrum Pair are subjected to characteristic enhancement to obtain Enhanced Barker Line Spectrum Pair (E-BLSP) parameters; adjusting the parameters of a j-th-order bark line spectrum pair by adopting a bidirectional iteration mode according to adjacent differential line spectrums, wherein j is 2, 1, and N-1, directly updating the original bark line spectrum pair parameters after adjustment, using the adjusted j-th-order bark line spectrum pair parameters into the bark line spectrum pair parameters for adjusting the next-order, and setting the bark line spectrum pair parameters of the current frame as { b } b1,b2,...bN}NN is the maximum order of the line spectrum pair parameter, and the coefficient of the adjacent differential line spectrum pair parameter of the current frame is bi+1- b i1,2, 1, N-1; the specific iterative formula is as follows:
Figure GDA0002958569990000051
ci=η(bi+1-bi),η<1,i=2,3,...,N-1 (8)
(1) forward iteration: adjusting j-th-order buckline spectrum pair parameters in a forward direction from j-2 to j-N-1;
(2) backward iteration: adjusting j-th-order buckline spectrum pair parameters from j-N-1 to j-2;
(3) averaging: averaging the Barker line spectrum pair parameters obtained by the forward iteration and the backward iteration to obtain enhanced Barker line spectrum pair parameters;
in the formula, eta controls the enhancement degree of the formant, and the smaller eta is, the more obvious the enhancement effect is.
5) And inputting the enhanced bucking line spectrum pair parameters of the input voice signal into a deep neural network classifier to identify a plurality of pathological unit tones. Firstly, randomly selecting 75% of each unit voice data set in an SVD pathological voice database as a training set and 25% of each unit voice data set as a testing set to ensure that each type of voice data meets the average distribution in the stage of classification network training and testing, then inputting the parameters of the pathological voice/a/,/i/,/u/of the peristaltic polyp and the 12-order enhanced Barker line spectrum of the 6 unit voice into a deep neural network for identification, and setting the network parameters as follows: and 2, hidden layers are formed, each layer is provided with 100 neurons, a ReLU function is selected as an activation function, a Softmax function is selected in the last layer of the recognition model to change the output of the neural network into probability distribution, and then the classification result is optimized.
Specific examples are given below:
1. pretreatment: the time length of each frame signal in the framing processing is 30ms, the sampling frequency is 8KHz, the corresponding frame length is 240, and the frame shifting is 80
2. When calculating the linear prediction coefficient, p is 12
3. The linear prediction inverse filter system function A (z) can be obtained by calculating the linear prediction coefficient
4. Calculating p +1 order symmetric and antisymmetric polynomials P (z) and Q (z) for A (z)
5. Calculating 12 th order LSP parameters from P (z) and Q (z)
6. Computing 11-order DAL (Difference of Adjacent LSP, DAL) parameters of an input speech signal from 12-order LSP parameters
7. Performing frequency warping on LSP parameters of an input voice signal to obtain BLSP (Bar Link Spectrum Pair, BLSP) parameters of the input voice signal
Fig. 2a to 2f are box diagrams of DAL parameters of unit tone signals DAL according to embodiment 6 of the present invention. Wherein FIG. 2a is a diagram showing the 11-step DAL parameter box of normal single tone/a/; FIG. 2b is a diagram showing the 11-step DAL parameter box for pathological unit tone/a/; FIG. 2c is a diagram showing the 11-step DAL parameter box for normal single tones/i/; FIG. 2d is a diagram showing the 11-step DAL parameter box for pathological unit tones/i/; FIG. 2e is a diagram showing the 11-order DAL parameter box for normal single tones/u/; FIG. 2f shows the 11-order DAL parameter box for pathological unit tones/u/.
As can be seen from fig. 2a to 2f, for the normal/a/,/i/,/u/three unit tones, the rectangular frames of the first 7 th-order DAL data distribution have large differences, and have better discrimination for the three unit tones; for the pathology/a/,/i/,/u/three unit sound signals, the DAL data of the first 7 th order are distributed more uniformly than the normal voice. For pathology/a/voice, the later 4-order DAL parameters are completely different from the normal/a/voice distribution, and the later 4-order DAL data distribution of pathology/i/voice and/u/voice has more overlapped parts and poor distinguishing effect. Because the DAL low-order parameters correspond to the low-frequency part of the signal, the embodiment of the invention considers that the low-frequency section division of the DAL parameters is higher than the high-frequency section and the Bark domain can more truly reflect the feeling of the human ear to the signal, the BLSP parameters are obtained by carrying out nonlinear frequency bending on the extracted LSP by adopting the Bark domain transformation scale, and the bending function is as follows:
Bark=26.81/(1+(1960/f))-0.53 (6)
wherein Bark represents Bark frequency; f represents a linear frequency.
Fig. 3a to fig. 3b are schematic diagrams illustrating a 12-order LSP parameter and a 12-order BLSP parameter according to an embodiment of the present invention. Compared to fig. 3a, fig. 3b amplifies the low frequency part of the signal, compresses the high frequency part, and improves the discrimination between normal and pathological vowels.
8. Performing feature enhancement on BLSP parameters of an input voice signal to obtain E-BLSP (Enhanced-Bar Line Spectrum Pair, E-BLSP) parameters: eta controls the enhancement degree of the resonance peak, and the smaller eta is, the more obvious the enhancement effect is. In the example η of the present invention, 0.4 was used.
Fig. 4 a-4 b are schematic three-dimensional frequency spectrum diagrams of the BLSP parameter of 12 th order and the E-BLSP parameter of 12 th order according to the embodiment of the present invention. Compared with fig. 4a, the amplitude of fig. 4b is greatly improved at the resonance peak frequency, the broadening effect is suppressed, and the discrimination between normal and pathological vowels is greatly enhanced.
9. Inputting E-BLSP parameters of input speech signal into DNN classifier for recognizing multiple pathological unit tones
The embodiment of the invention firstly randomly selects 75% of each unit voice data set as a training set and 25% of each unit voice data set as a testing set to ensure that each type of voice data meets the average distribution in the training and testing stages of a classification Network, and then inputs 12-order E-BLSP parameters of 6 types of unit voice into a DNN (Deep Neural Network, DNN) Network for identification. The network parameters are set as follows: 2 layers of hidden layers, 100 neurons per layer.
In the embodiment of the invention, in the aspect of selecting the unit Voice source signals, an SVD (Saarbrucken Voice Database) pathological Voice Database which is recorded by the university of Saelan and is responsible for the Voice research institute is adopted, the Database comprises normal and various pathological Voice signals of continuous vowels/a/,/i/and/u/, the sampling rate is unified to be 50KHz, and the resolution is 16 bits. Three continuous vowels/a/,/i/,/u/, of pathological voice and normal voice of the peristaltic polyp are selected from the three continuous vowels, and the sampling rate is uniformly reduced to 8 KHz. The total number of speech samples per class is 180, containing 4 different tones (normal, low, high, low-high-low).
The evaluation of the embodiment of the invention mainly comprises two indexes of accuracy and AUC. The accuracy is defined as the percentage of cases which are correctly classified, an ROC (Receiver Operating Characteristic) Curve is a comprehensive index reflecting continuous variables of sensitivity and specificity, the correlation between the sensitivity and the specificity can be revealed by a composition method, AUC (Area Under Curve, AUC) is defined as the Area enclosed by coordinate axes Under the ROC Curve, the value range is between 0.5 and 1, and the larger the value of AUC is, the better the classification effect is. In order to ensure the accuracy and the universality of the experiment, each characteristic combination experiment is carried out for 10 times, and the average is taken as the final classification result.
As can be seen from table 1: the recognition rate of the characteristics of the invention on a plurality of pathological unit tones is higher than that of the conventional MFCC and LPCC. The highest accuracy can reach 97.3600%, and the AUC can reach 0.9894.
TABLE 1
Figure GDA0002958569990000061
Figure GDA0002958569990000071

Claims (5)

1. An improved method for recognizing multiple pathological unit tones, comprising the steps of:
1) calculating line spectrum pair parameters of an input voice signal;
2) calculating adjacent differential line spectrum pair parameters of the input voice signal;
3) performing frequency warping on the line spectrum pair parameters of the input voice signal to obtain the bark line spectrum pair parameters of the input voice signal; the frequency warping adopts the following formula:
Bark=26.81/(1+(1960/f))-0.53 (6)
wherein Bark represents Bark frequency; f represents a linear frequency;
4) according to the adjacent differential line spectrum pair parameters, carrying out feature enhancement on the Barker line spectrum pair parameters of the input voice signal to obtain enhanced Barker line spectrum pair parameters;
5) and inputting the enhanced bucking line spectrum pair parameters of the input voice signal into a deep neural network classifier to identify a plurality of pathological unit tones.
2. An improved multiple pathology unit tone identification method according to claim 1, wherein step 1) comprises:
(1.1) performing signal preprocessing, including direct current removing processing and framing processing;
(1.2) for each frame of voice signal, calculating a 12-order linear prediction coefficient a by adopting a Levenson-Dubin autocorrelation algorithm according to the set model order p-12i
(1.3) Linear prediction coefficient a calculated from (1.2)iThe linear prediction inverse filter system function is calculated as follows:
Figure FDA0002958569980000011
wherein A (z) represents a linear prediction inverse filter system function; p represents the model order; a isiRepresenting linear prediction coefficients;
(1.4) calculating p +1 order symmetric and antisymmetric polynomials for the linear prediction inverse filter system function a (z):
P(z)=A(z)+z-(p+1)A(z-1) (2)
wherein P (z) represents a symmetric polynomial of order p +1 of A (z), and A (z) represents a linear prediction inverse filter system function; p represents the model order;
Q(z)=A(z)-z-(p+1)A(z-1) (3)
wherein Q (z) represents an antisymmetric polynomial of order p +1 of A (z), and A (z) represents a linear prediction inverse filter system function; p represents the model order;
(1.5) calculating line spectrum pair parameters of the 12 th order input speech signal from P (z) and Q (z):
Figure FDA0002958569980000012
in the formula, H (e)) Is a linear prediction of the spectral amplitude, eIs a frequency representation of z, P (e)) Is A (e)) P +1 order symmetric polynomial of (a), (b), (c), (d), (e)) Is A (e)) P +1 order antisymmetric polynomial of (cos θ)iAnd cos omegaiIs a representation of the LSP coefficients in the cosine domain, θiAnd ωiIs the line spectrum frequency corresponding to the line spectrum pair coefficient of the input voice signal, pi is the accumulative multiplication sign.
3. An improved method for multiple pathological unit tone recognition according to claim 1, wherein step 2) is calculated according to the following formula:
DALi=li+1-li,i=1,2,...M(M<N) (5)
in the formula, DALiIs the ith order adjacent differential line spectral pair parameter, li+1Line spectrum pair parameter of i +1 st order, liAnd the ith order line spectrum pair parameter, M is the maximum order of the adjacent differential line spectrum pair parameter, and N is the maximum order of the line spectrum pair parameter.
4. The improved multiple pathological unit tone identification method according to claim 1, wherein in step 4), the j-th-order bark line spectrum pair parameter is adjusted by bidirectional iteration according to the adjacent difference line spectrum pair parameters, wherein j is 21,b2,...bN}NN is a line spectrum referenceThe maximum order of the number, the coefficient of the adjacent differential line spectrum pair parameter of the current frame is bi+1-bi1,2, 1, N-1; the specific iterative formula is as follows:
Figure FDA0002958569980000021
ci=η(bi+1-bi),η<1,i=2,3,...,N-1 (8)
(1) forward iteration: adjusting j-th-order buckline spectrum pair parameters in a forward direction from j-2 to j-N-1;
(2) backward iteration: adjusting j-th-order buckline spectrum pair parameters from j-N-1 to j-2;
(3) averaging: averaging the Barker line spectrum pair parameters obtained by the forward iteration and the backward iteration to obtain enhanced Barker line spectrum pair parameters;
in the formula, eta controls the enhancement degree of the formant, and the smaller eta is, the more obvious the enhancement effect is.
5. The improved multiple pathological unit voice recognition method of claim 1, wherein step 5) is to randomly select 75% of each unit voice data set in the SVD pathological voice database as a training set and 25% of each unit voice data set as a testing set to ensure that each class voice data satisfies an average distribution during the training and testing phases of the classification network, and then input the parameters of the pathological voice/a/,/i/,/u/and the 12 th order enhanced barker line spectrum of the 6 unit voice voices/a/,/i/,/u/to the deep neural network for recognition, wherein the network parameters are set as: and 2, hidden layers are formed, each layer is provided with 100 neurons, a ReLU function is selected as an activation function, a Softmax function is selected in the last layer of the recognition model to change the output of the neural network into probability distribution, and then the classification result is optimized.
CN201910233952.0A 2019-03-26 2019-03-26 Improved method for identifying multiple pathological unit tones Active CN110070894B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910233952.0A CN110070894B (en) 2019-03-26 2019-03-26 Improved method for identifying multiple pathological unit tones

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910233952.0A CN110070894B (en) 2019-03-26 2019-03-26 Improved method for identifying multiple pathological unit tones

Publications (2)

Publication Number Publication Date
CN110070894A CN110070894A (en) 2019-07-30
CN110070894B true CN110070894B (en) 2021-08-03

Family

ID=67366671

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910233952.0A Active CN110070894B (en) 2019-03-26 2019-03-26 Improved method for identifying multiple pathological unit tones

Country Status (1)

Country Link
CN (1) CN110070894B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0774750A2 (en) * 1995-11-15 1997-05-21 Nokia Mobile Phones Ltd. Determination of line spectrum frequencies for use in a radiotelephone
US20040042622A1 (en) * 2002-08-29 2004-03-04 Mutsumi Saito Speech Processing apparatus and mobile communication terminal
US7257535B2 (en) * 1999-07-26 2007-08-14 Lucent Technologies Inc. Parametric speech codec for representing synthetic speech in the presence of background noise
CN103730130A (en) * 2013-12-20 2014-04-16 中国科学院深圳先进技术研究院 Detection method and system for pathological voice
CN106710604A (en) * 2016-12-07 2017-05-24 天津大学 Formant enhancement apparatus and method for improving speech intelligibility

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101527141B (en) * 2009-03-10 2011-06-22 苏州大学 Method of converting whispered voice into normal voice based on radial group neutral network
CN107705801B (en) * 2016-08-05 2020-10-02 中国科学院自动化研究所 Training method of voice bandwidth extension model and voice bandwidth extension method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0774750A2 (en) * 1995-11-15 1997-05-21 Nokia Mobile Phones Ltd. Determination of line spectrum frequencies for use in a radiotelephone
US7257535B2 (en) * 1999-07-26 2007-08-14 Lucent Technologies Inc. Parametric speech codec for representing synthetic speech in the presence of background noise
US20040042622A1 (en) * 2002-08-29 2004-03-04 Mutsumi Saito Speech Processing apparatus and mobile communication terminal
CN103730130A (en) * 2013-12-20 2014-04-16 中国科学院深圳先进技术研究院 Detection method and system for pathological voice
CN106710604A (en) * 2016-12-07 2017-05-24 天津大学 Formant enhancement apparatus and method for improving speech intelligibility

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
《Quality-enhanced voice morphing using maximum likelihood transformations》;Hui Ye et al.;《IEEE Transactions on Audio, Speech, and Language Processing ( Volume: 14, Issue: 4, July 2006)》;20061231;全文 *
《Voice Pathology Detection Using Vocal Tract Area》;Ghulam Muhammad et al.;《IEEE 2013 European Modelling Symposium》;20141231;全文 *
《嗓音分析在疾病诊断中的应用》;彭策等;《生物医学工程学杂志》;20071231;全文 *
《改进人工神经网络的病理嗓音共振峰修复》;薛隆基等;《电子器件》;20190228;全文 *
《采用线谱对分段定值偏移进行病理嗓音共振峰修正》;周佳秦等;《信息化研究》;20160430;全文 *

Also Published As

Publication number Publication date
CN110070894A (en) 2019-07-30

Similar Documents

Publication Publication Date Title
CN113012720B (en) Depression detection method by multi-voice feature fusion under spectral subtraction noise reduction
CN107274888B (en) Emotional voice recognition method based on octave signal strength and differentiated feature subset
CN104050965A (en) English phonetic pronunciation quality evaluation system with emotion recognition function and method thereof
CN110827857B (en) Speech emotion recognition method based on spectral features and ELM
Pawar et al. Review of various stages in speaker recognition system, performance measures and recognition toolkits
Zhang et al. Multiple vowels repair based on pitch extraction and line spectrum pair feature for voice disorder
Ling et al. Attention-Based Convolutional Neural Network for ASV Spoofing Detection.
CN106782599A (en) The phonetics transfer method of post filtering is exported based on Gaussian process
Ghezaiel et al. Hybrid network for end-to-end text-independent speaker identification
Nawas et al. Speaker recognition using random forest
Cheng et al. DNN-based speech enhancement with self-attention on feature dimension
Woubie et al. Voice-quality Features for Deep Neural Network Based Speaker Verification Systems
Karthikeyan Adaptive boosted random forest-support vector machine based classification scheme for speaker identification
CN110070894B (en) Improved method for identifying multiple pathological unit tones
Singh et al. Novel feature extraction algorithm using DWT and temporal statistical techniques for word dependent speaker’s recognition
CN112466276A (en) Speech synthesis system training method and device and readable storage medium
Nasr et al. Text-independent speaker recognition using deep neural networks
Boualoulou et al. Speech analysis for the detection of Parkinson’s disease by combined use of empirical mode decomposition, Mel frequency cepstral coefficients, and the K-nearest neighbor classifier
CN105741853A (en) Digital speech perception hash method based on formant frequency
Neto et al. Feature estimation for vocal fold edema detection using short-term cepstral analysis
Velayuthapandian et al. A focus module-based lightweight end-to-end CNN framework for voiceprint recognition
Zailan et al. Comparative analysis of LPC and MFCC for male speaker recognition in text-independent context
Thirumuru et al. Application of non-negative frequency-weighted energy operator for vowel region detection
Zi et al. Joint filter combination-based central difference feature extraction and attention-enhanced Dense-Res2Block network for short-utterance speaker recognition
CN113299295A (en) Training method and device for voiceprint coding network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant